The practice of normalization improves data integrity in the table by getting rid of redundant data. Additionally, normalization aids in database organization. It is a multi-step procedure that eliminates duplicate data from relational tables and puts the data into tabular form.
In order to guarantee that database integrity constraints correctly execute their dependencies, normalization arranges a database's columns and tables. It is a methodical process of breaking down tables to get rid of unwanted features like Insertion, Update, and Deletion anomalies as well as data redundancy (repetition).
What is normalization?
In database architecture, normalization is a technique that makes sure data is stored in a way that minimizes redundancy and boosts efficiency. In order to ensure that the data is properly arranged and related in a way that avoids duplication, the aim is to divide huge tables into smaller ones.
The procedure entails creating associations between the tables and splitting a database into two or more tables. It increases database efficiency and guarantees consistent and manageable data.
Advantages of Normalization
Minimization of Data Redundancy: Normalization helps prevent the storage of duplicate data by dividing large tables into smaller ones.
Better Data Integrity: There is less possibility of inconsistent data in various database sections when redundancy is decreased.
Effective Querying: It is simpler to query and index smaller tables with fewer superfluous columns.
Faster Updates: Normalized databases require fewer changes in fewer locations when data needs to be updated.
Normalization Types (Normal Forms)
Usually, normalization is carried out in a series of steps known as normal forms (NF). Every normal form tackles a distinct issue, and each one builds upon the one before it. The typical forms are:
1. First Normal Form (1NF)
According to 1NF, each column's values must be atomic or indivisible. Every record must have a unique identity, and each column must only have one value per row.
A table is in 1NF if:
Atomic (indivisible) values are contained in each column.
There is only one sort of value in each column.
There are no duplicate rows because every row is distinct.
There are no arrays or recurring groups in any of the columns.
Example of 1NF:
Student_ID | Name | Subjects |
1 | John | Math, English |
2 | Alice | Science, History |
Problem: The atomicity requirement is broken by the "Subjects" column's various values (Math, English, etc.) in the same cell.
Solution (1NF):
Student_ID | Name | Subject |
1 | John | Math |
1 | John | English |
2 | Alice | Science |
2 | Alice | History |
At this point, the table satisfies 1NF as each column includes atomic data.
2. Second Normal Form (2NF)
1NF is built upon by 2NF. A table must first be in 1NF and then remove partial dependencies in order to reach 2NF. When non-key attributes rely on only a portion of a composite primary key, this is known as partial reliance.
Remove partial dependency: All non-key columns must rely on the complete primary key if a table contains a composite primary key, meaning the primary key is composed of many columns.
A table is in 2NF if:
It's in 1NF.
There is no partial dependency; all non-key attributes rely on the primary key in order to function.
Example of 2NF:
StudentID | CourseID | CourseName | StudentName |
1 | 101 | Math | Alice |
2 | 102 | Science | Bob |
In this case, the complete composite key (StudentID, CourseID) is not necessary for CourseName; only CourseID is.
Solution (2NF):
Separate the data into three tables:
Table 1: Students
StudentID | StudentName |
1 | Alice |
2 | Bob |
Table 2: Courses
CourseID | CourseName |
101 | Math |
102 | Science |
Table 3: Enrollment
StudentID | CourseID |
1 | 101 |
2 | 102 |
Now, all non-key attributes are fully dependent on the primary key.
3. Third Normal Form (3NF)
2NF is built upon by 3NF. If a table has no transitive dependencies and is in 2NF, it is in 3NF. When a non-key column depends on another non-key column instead of the primary key, this is known as a transitive dependency.
Remove Transitive Dependency: No non-key attribute should rely on any other non-key attributes; instead, it should solely rely on the primary key.
A table is in 3NF if:
It's in 2NF.
There isn't transitive dependency, which states that one non-key characteristic shouldn't rely on another.
Example of 3NF:
Student_ID | Student_Name | Teacher_Name | Teacher_Age |
1 | John | Mr. A | 45 |
2 | Alice | Mrs. B | 40 |
In this case, Teacher_Age is not directly dependent on the primary key (Student_ID), but rather on Teacher_Name. This is against 3NF.
Solution (3NF):
Separate into two tables:
1. Student:
Student_ID | Student_Name |
1 | John |
2 | Alice |
2. Teacher:
Teacher_Name | Teacher_Age | Student_ID |
Mr. A | 45 | 1 |
Mrs. B | 40 | 2 |
The table now satisfies 3NF since there is no transitive dependency.
Boyce-Codd Normal Form (BCNF)
A more stringent variant of 3NF is BCNF. If a table is in 3NF and has a super key on the left side of each non-trivial functional dependency (a super key uniquely identifies a row in a table), then the table is in BCNF.
A table is in BCNF if:
It is in 3NF.
X is a superkey for each functional dependence X→ Y.
Example of BCNF:
StudentID | CourseID | TeacherID |
1 | 101 | T01 |
2 | 102 | T02 |
1 | 102 | T02 |
Although CourseID is not a superkey in this case, it does determine TeacherID.
Normalized Tables (BCNF): Table 1: Course-Teacher Relationship
CourseID | TeacherID |
101 | T01 |
102 | T02 |
Table 2: Student-Course Relationship
StudentID | CourseID |
1 | 101 |
1 | 102 |
Now, all dependencies are satisfied, and every determinant is a superkey.
4. Fourth Normal Form (4NF)
In 4NF, we remove multi-valued dependencies (MVDs), by dividing the table into smaller tables. This guarantees that there is only one independent relationship between each table.
Understanding Multi-Valued Dependency (MVD):
An MVD is when two or more characteristics in a table are separately related to one another and do not depend on one another. Redundancy and irregularities during data changes may result from needless duplication.
A table is in 4NF if:
It is in Boyce-Codd Normal Form, or BCNF.
There are no multi-valued dependencies (MVDs) in it.
Example of 4NF:
StudentID | Course | Hobby |
1 | Math | Reading |
1 | Math | Painting |
1 | Science | Reading |
1 | Science | Painting |
Here:
Course and Hobby are independently linked to StudentID.
Duplicate rows result from the lack of a link between Course and Hobby.
Problems with the Table:
Every row for every course needs to be updated if a student chooses a new hobby.
Anomalies may be introduced by adding or removing data.
Solution: Normalized Tables (4NF): Table 1: Student-Course Relationship
StudentID | Course |
1 | Math |
1 | Science |
Table 2: Student-Hobby Relationship
StudentID | Hobby |
1 | Reading |
1 | Painting |
5. Fifth Normal Form (5NF)
The main goal of 5NF is to prevent data from being further subdivided into smaller tables without adding redundancy. It manages intricate relationships that 4NF is unable to depict.
Understanding Join Dependency (JD):
A join dependency arises when a table may be divided into smaller tables that can be reunited to the original database without causing any data loss. In 5NF, the table's candidate keys determine each join dependence.
A table is in 5NF (also known as Projection-Join Normal Form or PJNF) if:
It is in 4NF.
It gets rid of join dependencies (JD), which can cause redundant data.
Example of 5NF:
ProjectID | EmployeeID | TaskID |
P1 | E1 | T1 |
P1 | E1 | T2 |
P1 | E2 | T1 |
P1 | E2 | T2 |
Here:
Projects (ProjectID) are paired with employees (EmployeeID).
Projects (ProjectID) are linked to tasks (TaskID).
EmployeeID and TaskID do not directly relate to one another.
Problems with the Table:
Redundancy results from the table's mixing of linkages between tasks, personnel, and projects.
Rows for every employee must be repeated whenever a new job is introduced for a project.
Normalized Tables (5NF): Table 1: Project-Employee Relationship
ProjectID | EmployeeID |
P1 | E1 |
P1 | E2 |
Table 2: Project-Task Relationship
ProjectID | TaskID |
P1 | T1 |
P1 | T2 |
Table 3: Employee-Task Relationship
EmployeeID | TaskID |
E1 | T1 |
E1 | T2 |
E2 | T1 |
E2 | T2 |
Example of Normalization Process
A thorough example covering all normal forms may be found here. We begin with a table that contains a bookstore’s data. The bookstore tracks customers, their orders, books purchased, and customer addresses.
Initial Table (Unnormalized Form)
OrderID | CustomerName | Address | Books | Authors |
001 | Alice | 123 Main St | "Book A, Book B" | "Author X, Author Y" |
002 | Bob | 456 Elm St | "Book A" | "Author X" |
First Normal Form (1NF)
Ensure atomic values by separating multiple values into rows.
OrderID | CustomerName | Address | Book | Author |
001 | Alice | 123 Main St | Book A | Author X |
001 | Alice | 123 Main St | Book B | Author Y |
002 | Bob | 456 Elm St | Book A | Author X |
Second Normal Form (2NF)
Remove partial dependencies. Create separate tables for customers and orders.
Tables:
Customers:
CustomerID | CustomerName | Address |
C001 | Alice | 123 Main St |
C002 | Bob | 456 Elm St |
Orders:
OrderID | CustomerID |
001 | C001 |
002 | C002 |
Books in Orders:
OrderID | Book | Author |
001 | Book A | Author X |
001 | Book B | Author Y |
002 | Book A | Author X |
Third Normal Form (3NF)
Remove transitive dependencies. Separate book and author information.
Tables:
Books:
BookID | Book | Author |
B001 | Book A | Author X |
B002 | Book B | Author Y |
Books in Orders (Updated):
OrderID | BookID |
001 | B001 |
001 | B002 |
002 | B001 |
Final Schema
1. Customers: (CustomerID, CustomerName, Address)
2. Orders: (OrderID, CustomerID)
3. Books: (BookID, Book, Author)
4. Books in Orders: (OrderID, BookID)
This schema satisfies 1NF, 2NF, and 3NF while maintaining simplicity and practical application. There's no need for 4NF or 5NF, as there are no multi-valued or complex join dependencies in this example.
Conclusion
In summary, normalization is an essential database design procedure that aids in data organization to prevent redundancy, guarantee data integrity, and boost productivity. In order to maintain the data's logical structure and optimize it for storage and querying, each level of normalization concentrates on removing particular types of dependency.
Comments