What is Normalization in Database Design?
Normalization is the process of organizing a relational database in a way that reduces redundancy and dependency by splitting large tables into smaller ones and ensuring relationships between them are maintained. This process helps to minimize the chances of anomalies like insert, update, and delete anomalies.
Normalization involves several steps known as normal forms (NF). The higher the normal form, the more normalized the database is. Below is a detailed breakdown of the first four normal forms:
1. First Normal Form (1NF)
A table is in 1NF if:
-
All columns contain atomic (indivisible) values.
-
Each column contains only one value per row.
-
Each column has a unique name.
-
The order in which data is stored does not matter.
Example of 1NF
Consider a table storing customer orders:
OrderID | CustomerName | ItemsOrdered |
---|---|---|
1 | Alice | Apple, Banana |
2 | Bob | Orange, Grapes, Mango |
In the above table, the ItemsOrdered column contains multiple values (i.e., a list of items). This violates the rule of atomicity.
To convert this into 1NF, we would split the items into separate rows:
OrderID | CustomerName | Item |
---|---|---|
1 | Alice | Apple |
1 | Alice | Banana |
2 | Bob | Orange |
2 | Bob | Grapes |
2 | Bob | Mango |
Now, each row has atomic values, and the table is in 1NF.
2. Second Normal Form (2NF)
A table is in 2NF if:
-
It is in 1NF.
-
All non-key columns are fully dependent on the primary key. There should be no partial dependency, meaning that no non-key column should depend on just part of a composite primary key.
Example of 2NF
Consider a table with a composite primary key (OrderID, Item):
OrderID | Item | CustomerName | Quantity |
---|---|---|---|
1 | Apple | Alice | 10 |
1 | Banana | Alice | 5 |
2 | Orange | Bob | 8 |
In this table, CustomerName depends only on OrderID (and not on Item), while Quantity depends on both OrderID and Item. This is a case of partial dependency.
To convert the table into 2NF, we remove the partial dependency by splitting the table into two:
Orders Table:
OrderID | CustomerName |
---|---|
1 | Alice |
2 | Bob |
OrderItems Table:
OrderID | Item | Quantity |
---|---|---|
1 | Apple | 10 |
1 | Banana | 5 |
2 | Orange | 8 |
Now, the CustomerName depends only on the OrderID, and the Quantity depends on both OrderID and Item. This table is in 2NF.
3. Third Normal Form (3NF)
A table is in 3NF if:
-
It is in 2NF.
-
There is no transitive dependency, meaning no non-key column depends on another non-key column.
Example of 3NF
Consider the following table:
OrderID | CustomerName | CustomerAddress | Item | Quantity |
---|---|---|---|---|
1 | Alice | 123 Main St | Apple | 10 |
2 | Bob | 456 Elm St | Orange | 8 |
In this table, CustomerAddress depends on CustomerName, which is a non-key column. This is a transitive dependency because CustomerAddress depends on CustomerName, which is not the primary key.
To convert this table into 3NF, we remove the transitive dependency by creating a new table for customer information:
Orders Table:
OrderID | CustomerName | Item | Quantity |
---|---|---|---|
1 | Alice | Apple | 10 |
2 | Bob | Orange | 8 |
Customers Table:
CustomerName | CustomerAddress |
---|---|
Alice | 123 Main St |
Bob | 456 Elm St |
Now, CustomerAddress depends only on CustomerName, and the table is in 3NF.
4. Boyce-Codd Normal Form (BCNF)
A table is in BCNF if:
-
It is in 3NF.
-
Every determinant (a column that determines another column) is a candidate key.
Example of BCNF
Consider the following table:
CourseID | Instructor | Room |
---|---|---|
CS101 | Dr. Smith | 101 |
CS102 | Dr. Jones | 102 |
CS101 | Dr. Brown | 101 |
In this case, Room is dependent on CourseID, but Instructor is not a candidate key (it doesn’t uniquely identify a row). This violates BCNF, since a non-candidate key (Instructor) is determining another attribute (Room).
To convert this table into BCNF, we break the table into two:
Courses Table:
CourseID | Instructor |
---|---|
CS101 | Dr. Smith |
CS102 | Dr. Jones |
CourseRooms Table:
CourseID | Room |
---|---|
CS101 | 101 |
CS102 | 102 |
Now, Instructor is not determining Room anymore, and both tables are in BCNF.
Summary of Normal Forms
Normal Form | Key Condition |
---|---|
1NF | Atomic values, no repeating groups |
2NF | 1NF + no partial dependency (non-key attributes depend on the full primary key) |
3NF | 2NF + no transitive dependency (non-key attributes don't depend on other non-key attributes) |
BCNF | 3NF + every determinant is a candidate key |
Normalization is a fundamental part of relational database design. It helps in minimizing redundancy, ensuring consistency, and improving the efficiency of database operations. However, higher normal forms may require additional tables and relationships, which can sometimes lead to performance trade-offs. Therefore, it's essential to strike a balance between normalization and performance needs when designing your database.