What is normalization in SQL?

Dec 10, 20246 min read

The practice of normalization improves data integrity in the table by getting rid of redundant data. Additionally, normalization aids in database organization. It is a multi-step procedure that eliminates duplicate data from relational tables and puts the data into tabular form.

In order to guarantee that database integrity constraints correctly execute their dependencies, normalization arranges a database's columns and tables. It is a methodical process of breaking down tables to get rid of unwanted features like Insertion, Update, and Deletion anomalies as well as data redundancy (repetition).

What is normalization?

In database architecture, normalization is a technique that makes sure data is stored in a way that minimizes redundancy and boosts efficiency. In order to ensure that the data is properly arranged and related in a way that avoids duplication, the aim is to divide huge tables into smaller ones.

The procedure entails creating associations between the tables and splitting a database into two or more tables. It increases database efficiency and guarantees consistent and manageable data.

Advantages of Normalization

Minimization of Data Redundancy: Normalization helps prevent the storage of duplicate data by dividing large tables into smaller ones.
Better Data Integrity: There is less possibility of inconsistent data in various database sections when redundancy is decreased.
Effective Querying: It is simpler to query and index smaller tables with fewer superfluous columns.
Faster Updates: Normalized databases require fewer changes in fewer locations when data needs to be updated.

Normalization Types (Normal Forms)

Usually, normalization is carried out in a series of steps known as normal forms (NF). Every normal form tackles a distinct issue, and each one builds upon the one before it. The typical forms are:

1. First Normal Form (1NF)

According to 1NF, each column's values must be atomic or indivisible. Every record must have a unique identity, and each column must only have one value per row.

A table is in 1NF if:

Atomic (indivisible) values are contained in each column.
There is only one sort of value in each column.
There are no duplicate rows because every row is distinct.
There are no arrays or recurring groups in any of the columns.

Example of 1NF:

Student_ID	Name	Subjects
1	John	Math, English
2	Alice	Science, History

Problem: The atomicity requirement is broken by the "Subjects" column's various values (Math, English, etc.) in the same cell.

Solution (1NF):

Student_ID	Name	Subject
1	John	Math
1	John	English
2	Alice	Science
2	Alice	History

At this point, the table satisfies 1NF as each column includes atomic data.

2. Second Normal Form (2NF)

1NF is built upon by 2NF. A table must first be in 1NF and then remove partial dependencies in order to reach 2NF. When non-key attributes rely on only a portion of a composite primary key, this is known as partial reliance.

Remove partial dependency: All non-key columns must rely on the complete primary key if a table contains a composite primary key, meaning the primary key is composed of many columns.

A table is in 2NF if:

It's in 1NF.
There is no partial dependency; all non-key attributes rely on the primary key in order to function.

Example of 2NF:

StudentID	CourseID	CourseName	StudentName
1	101	Math	Alice
2	102	Science	Bob

In this case, the complete composite key (StudentID, CourseID) is not necessary for CourseName; only CourseID is.

Solution (2NF):

Separate the data into three tables:

Table 1: Students

StudentID	StudentName
1	Alice
2	Bob

Table 2: Courses

CourseID	CourseName
101	Math
102	Science

Table 3: Enrollment

StudentID	CourseID
1	101
2	102

Now, all non-key attributes are fully dependent on the primary key.

3. Third Normal Form (3NF)

2NF is built upon by 3NF. If a table has no transitive dependencies and is in 2NF, it is in 3NF. When a non-key column depends on another non-key column instead of the primary key, this is known as a transitive dependency.

Remove Transitive Dependency: No non-key attribute should rely on any other non-key attributes; instead, it should solely rely on the primary key.

A table is in 3NF if:

It's in 2NF.
There isn't transitive dependency, which states that one non-key characteristic shouldn't rely on another.

Example of 3NF:

Student_ID	Student_Name	Teacher_Name	Teacher_Age
1	John	Mr. A	45
2	Alice	Mrs. B	40

In this case, Teacher_Age is not directly dependent on the primary key (Student_ID), but rather on Teacher_Name. This is against 3NF.

Solution (3NF):

Separate into two tables:

1. Student:

Student_ID	Student_Name
1	John
2	Alice

2. Teacher:

Teacher_Name	Teacher_Age	Student_ID
Mr. A	45	1
Mrs. B	40	2

The table now satisfies 3NF since there is no transitive dependency.

Boyce-Codd Normal Form (BCNF)

A more stringent variant of 3NF is BCNF. If a table is in 3NF and has a super key on the left side of each non-trivial functional dependency (a super key uniquely identifies a row in a table), then the table is in BCNF.

A table is in BCNF if:

It is in 3NF.
X is a superkey for each functional dependence X→ Y.

Example of BCNF:

StudentID	CourseID	TeacherID
1	101	T01
2	102	T02
1	102	T02

Although CourseID is not a superkey in this case, it does determine TeacherID.

Normalized Tables (BCNF): Table 1: Course-Teacher Relationship

CourseID	TeacherID
101	T01
102	T02

Table 2: Student-Course Relationship

StudentID	CourseID
1	101
1	102

Now, all dependencies are satisfied, and every determinant is a superkey.

4. Fourth Normal Form (4NF)

In 4NF, we remove multi-valued dependencies (MVDs), by dividing the table into smaller tables. This guarantees that there is only one independent relationship between each table.

Understanding Multi-Valued Dependency (MVD):

An MVD is when two or more characteristics in a table are separately related to one another and do not depend on one another. Redundancy and irregularities during data changes may result from needless duplication.

A table is in 4NF if:

It is in Boyce-Codd Normal Form, or BCNF.
There are no multi-valued dependencies (MVDs) in it.

Example of 4NF:

StudentID	Course	Hobby
1	Math	Reading
1	Math	Painting
1	Science	Reading
1	Science	Painting

Here:

Course and Hobby are independently linked to StudentID.
Duplicate rows result from the lack of a link between Course and Hobby.

Problems with the Table:

Every row for every course needs to be updated if a student chooses a new hobby.
Anomalies may be introduced by adding or removing data.

Solution: Normalized Tables (4NF): Table 1: Student-Course Relationship

StudentID	Course
1	Math
1	Science

Table 2: Student-Hobby Relationship

StudentID	Hobby
1	Reading
1	Painting

5. Fifth Normal Form (5NF)

The main goal of 5NF is to prevent data from being further subdivided into smaller tables without adding redundancy. It manages intricate relationships that 4NF is unable to depict.

Understanding Join Dependency (JD):

A join dependency arises when a table may be divided into smaller tables that can be reunited to the original database without causing any data loss. In 5NF, the table's candidate keys determine each join dependence.

A table is in 5NF (also known as Projection-Join Normal Form or PJNF) if:

It is in 4NF.
It gets rid of join dependencies (JD), which can cause redundant data.

Example of 5NF:

ProjectID	EmployeeID	TaskID
P1	E1	T1
P1	E1	T2
P1	E2	T1
P1	E2	T2

Here:

Projects (ProjectID) are paired with employees (EmployeeID).
Projects (ProjectID) are linked to tasks (TaskID).
EmployeeID and TaskID do not directly relate to one another.

Problems with the Table:

Redundancy results from the table's mixing of linkages between tasks, personnel, and projects.
Rows for every employee must be repeated whenever a new job is introduced for a project.

Normalized Tables (5NF): Table 1: Project-Employee Relationship

ProjectID	EmployeeID
P1	E1
P1	E2

Table 2: Project-Task Relationship

ProjectID	TaskID
P1	T1
P1	T2

Table 3: Employee-Task Relationship

EmployeeID	TaskID
E1	T1
E1	T2
E2	T1
E2	T2

Example of Normalization Process

A thorough example covering all normal forms may be found here. We begin with a table that contains a bookstore’s data. The bookstore tracks customers, their orders, books purchased, and customer addresses.

Initial Table (Unnormalized Form)

OrderID	CustomerName	Address	Books	Authors
001	Alice	123 Main St	"Book A, Book B"	"Author X, Author Y"
002	Bob	456 Elm St	"Book A"	"Author X"

First Normal Form (1NF)

Ensure atomic values by separating multiple values into rows.

OrderID	CustomerName	Address	Book	Author
001	Alice	123 Main St	Book A	Author X
001	Alice	123 Main St	Book B	Author Y
002	Bob	456 Elm St	Book A	Author X

Second Normal Form (2NF)

Remove partial dependencies. Create separate tables for customers and orders.

Tables:

Customers:

CustomerID	CustomerName	Address
C001	Alice	123 Main St
C002	Bob	456 Elm St

Orders:

OrderID	CustomerID
001	C001
002	C002

Books in Orders:

OrderID	Book	Author
001	Book A	Author X
001	Book B	Author Y
002	Book A	Author X

Third Normal Form (3NF)

Remove transitive dependencies. Separate book and author information.

Tables:

Books:

BookID	Book	Author
B001	Book A	Author X
B002	Book B	Author Y

Books in Orders (Updated):

OrderID	BookID
001	B001
001	B002
002	B001

Final Schema

1. Customers: (CustomerID, CustomerName, Address)

2. Orders: (OrderID, CustomerID)

3. Books: (BookID, Book, Author)

4. Books in Orders: (OrderID, BookID)

This schema satisfies 1NF, 2NF, and 3NF while maintaining simplicity and practical application. There's no need for 4NF or 5NF, as there are no multi-valued or complex join dependencies in this example.

Conclusion

In summary, normalization is an essential database design procedure that aids in data organization to prevent redundancy, guarantee data integrity, and boost productivity. In order to maintain the data's logical structure and optimize it for storage and querying, each level of normalization concentrates on removing particular types of dependency.

Upto 70% on All Courses 🚀 Limited Time Offer, Register Now🥳

IOTA Academy