What is normalization?
The rules of normalization guide us to decide:
- the number of tables in our database, and
- which table each attribute belongs in.
We evaluate database design by how well it minimizes data redundancy, and the occurrence of anomalies.
Step 0: Create initial table
- The initial table must have a
PRIMARY KEY
identified, after considering all candidates.- When picking a PK, think of whether this will always be unique for every instance of your entity (in other words, for every row in your table).
- If a single attribute doesn’t work, you may need a combination of 2+ attributes.
- Repeating groups are listed in parentheses. Repeating groups are attributes that can have multiple values within the view.
- One way to think about repeating groups of attributes: are there attributes that, for a single instance of our entity, can have more than one value at the same time? e.g. a
Customer
entity can have more than one payment record on their account: ifCustomerID
is the PK for this table, thenPaymentID
would be a repeating group for that PK.
- One way to think about repeating groups of attributes: are there attributes that, for a single instance of our entity, can have more than one value at the same time? e.g. a
Step 1: Apply the rules of 1NF
- A table must contain only atomic attributes.
- That means, if we have any composite attributes, break ‘em down!
- e.g.
Name
becomesFirstName
andLastName
,Address
gets split up intoStreetAddress
,City
Province
,PostalCode
, etc.
-
A table cannot contain any repeating groups of attributes.
If a repeating group of attributes exists: 1. Move the repeating group(s) of attributes to a new table. 2. Duplicate the PK of the original table and place in the new table as a FK. 3. Designate the cardinality of the relationship between the tables. 4. Designate a PK for the new table.
Step 2: Apply the rules of 2NF
-
All non-key attributes in a table must fully depend on the entire primary key of the table.
A violation can only occur in a table with a composite primary key.
- That means: if we don’t have a composite PK, we’re done with 2NF!
A non-key attribute is fully dependent on a PK attribute if: for each value of the key attribute there can be one value for the non-key attribute.
An attribute that violates 2NF is referred to as a partial dependency.
- In other words: are there any attributes that rely on just part of the PK, rather than the combination of all the attributes in it?
If a partial dependency exists: 1. Move the partially dependent attribute(s) to a new table. 2. Duplicate the part of the PK it’s dependent on, and place in new table as a PK (it’s now an FK in the original table). 3. Designate the cardinality of the relationship between the tables.
Step 3: Apply the rules of 3NF
-
A non-key attribute cannot be fully dependent on another non-key attribute.
A non-key attribute that is fully dependent on another non-key attribute is termed a transitive dependency.
- In other words: is there a non-key attribute that depends on another non-key attribute, instead of depending on the PK?
If a transitive dependency exists: 1. Move the transitively dependent attribute(s) to a new table. 2. Duplicate the non-key attribute it’s dependent on, and place in new table as PK (it’s now a FK in the original table). 3. Designate the cardinality of the relationship between the tables.