top of page

What is normalization in SQL?

Writer's picture: IOTA ACADEMYIOTA ACADEMY

The practice of normalization improves data integrity in the table by getting rid of redundant data. Additionally, normalization aids in database organization. It is a multi-step procedure that eliminates duplicate data from relational tables and puts the data into tabular form.

In order to guarantee that database integrity constraints correctly execute their dependencies, normalization arranges a database's columns and tables. It is a methodical process of breaking down tables to get rid of unwanted features like Insertion, Update, and Deletion anomalies as well as data redundancy (repetition).


Normalization in SQL

What is normalization?


In database architecture, normalization is a technique that makes sure data is stored in a way that minimizes redundancy and boosts efficiency. In order to ensure that the data is properly arranged and related in a way that avoids duplication, the aim is to divide huge tables into smaller ones.

The procedure entails creating associations between the tables and splitting a database into two or more tables. It increases database efficiency and guarantees consistent and manageable data.


Advantages of Normalization


  • Minimization of Data Redundancy: Normalization helps prevent the storage of duplicate data by dividing large tables into smaller ones.


  • Better Data Integrity: There is less possibility of inconsistent data in various database sections when redundancy is decreased.


  • Effective Querying: It is simpler to query and index smaller tables with fewer superfluous columns.


  • Faster Updates: Normalized databases require fewer changes in fewer locations when data needs to be updated.


Normalization Types (Normal Forms)


Usually, normalization is carried out in a series of steps known as normal forms (NF). Every normal form tackles a distinct issue, and each one builds upon the one before it. The typical forms are:


1.  First Normal Form (1NF)


According to 1NF, each column's values must be atomic or indivisible. Every record must have a unique identity, and each column must only have one value per row.


A table is in 1NF if:

  • Atomic (indivisible) values are contained in each column.

  • There is only one sort of value in each column.

  • There are no duplicate rows because every row is distinct.

  • There are no arrays or recurring groups in any of the columns.


Example of 1NF:

Student_ID

Name

Subjects

1

John

Math, English

2

Alice

Science, History

Problem: The atomicity requirement is broken by the "Subjects" column's various values (Math, English, etc.) in the same cell.


Solution (1NF):

Student_ID

Name

Subject

1

John

Math

1

John

English

2

Alice

Science

2

Alice

History

At this point, the table satisfies 1NF as each column includes atomic data.


2.  Second Normal Form (2NF)


1NF is built upon by 2NF. A table must first be in 1NF and then remove partial dependencies in order to reach 2NF. When non-key attributes rely on only a portion of a composite primary key, this is known as partial reliance.


Remove partial dependency: All non-key columns must rely on the complete primary key if a table contains a composite primary key, meaning the primary key is composed of many columns.


A table is in 2NF if:

  • It's in 1NF.

  • There is no partial dependency; all non-key attributes rely on the primary key in order to function.


Example of 2NF:

StudentID

CourseID

CourseName

StudentName

1

101

Math

Alice

2

102

Science

Bob

In this case, the complete composite key (StudentID, CourseID) is not necessary for CourseName; only CourseID is.


Solution (2NF):

Separate the data into three tables:


Table 1: Students

StudentID

StudentName

1

Alice

2

Bob

Table 2: Courses

CourseID

CourseName

101

Math

102

Science

Table 3: Enrollment

StudentID

CourseID

1

101

2

102

Now, all non-key attributes are fully dependent on the primary key.


3.  Third Normal Form (3NF)


2NF is built upon by 3NF. If a table has no transitive dependencies and is in 2NF, it is in 3NF. When a non-key column depends on another non-key column instead of the primary key, this is known as a transitive dependency.


Remove Transitive Dependency: No non-key attribute should rely on any other non-key attributes; instead, it should solely rely on the primary key.


A table is in 3NF if:

  • It's in 2NF.

  • There isn't transitive dependency, which states that one non-key characteristic shouldn't rely on another.


Example of 3NF:

Student_ID

Student_Name

Teacher_Name

Teacher_Age

1

John

Mr. A

45

2

Alice

Mrs. B

40

In this case, Teacher_Age is not directly dependent on the primary key (Student_ID), but rather on Teacher_Name. This is against 3NF.


Solution (3NF):

Separate into two tables:


1.      Student:

Student_ID

Student_Name

1

John

2

Alice

2.      Teacher:

Teacher_Name

Teacher_Age

Student_ID

Mr. A

45

1

Mrs. B

40

2

The table now satisfies 3NF since there is no transitive dependency.


Boyce-Codd Normal Form (BCNF)


A more stringent variant of 3NF is BCNF. If a table is in 3NF and has a super key on the left side of each non-trivial functional dependency (a super key uniquely identifies a row in a table), then the table is in BCNF.


A table is in BCNF if:

  • It is in 3NF.

  • X is a superkey for each functional dependence X→ Y.


Example of BCNF:

StudentID

CourseID

TeacherID

1

101

T01

2

102

T02

1

102

T02

Although CourseID is not a superkey in this case, it does determine TeacherID.


Normalized Tables (BCNF): Table 1: Course-Teacher Relationship

CourseID

TeacherID

101

T01

102

T02

Table 2: Student-Course Relationship

StudentID

CourseID

1

101

1

102

Now, all dependencies are satisfied, and every determinant is a superkey.


4.  Fourth Normal Form (4NF)


In 4NF, we remove multi-valued dependencies (MVDs), by dividing the table into smaller tables. This guarantees that there is only one independent relationship between each table.


Understanding Multi-Valued Dependency (MVD):


An MVD is when two or more characteristics in a table are separately related to one another and do not depend on one another. Redundancy and irregularities during data changes may result from needless duplication.


A table is in 4NF if:

  • It is in Boyce-Codd Normal Form, or BCNF.

  • There are no multi-valued dependencies (MVDs) in it.


Example of 4NF:

StudentID

Course

Hobby

1

Math

Reading

1

Math

Painting

1

Science

Reading

1

Science

Painting

Here:

  • Course and Hobby are independently linked to StudentID.

  • Duplicate rows result from the lack of a link between Course and Hobby.


Problems with the Table:

  • Every row for every course needs to be updated if a student chooses a new hobby.

  • Anomalies may be introduced by adding or removing data.


Solution: Normalized Tables (4NF): Table 1: Student-Course Relationship

StudentID

Course

1

Math

1

Science

Table 2: Student-Hobby Relationship

StudentID

Hobby

1

Reading

1

Painting

5.  Fifth Normal Form (5NF)


The main goal of 5NF is to prevent data from being further subdivided into smaller tables without adding redundancy. It manages intricate relationships that 4NF is unable to depict.


Understanding Join Dependency (JD):

A join dependency arises when a table may be divided into smaller tables that can be reunited to the original database without causing any data loss. In 5NF, the table's candidate keys determine each join dependence.


A table is in 5NF (also known as Projection-Join Normal Form or PJNF) if:

  • It is in 4NF.

  • It gets rid of join dependencies (JD), which can cause redundant data.


Example of 5NF:

ProjectID

EmployeeID

TaskID

P1

E1

T1

P1

E1

T2

P1

E2

T1

P1

E2

T2

Here:

  • Projects (ProjectID) are paired with employees (EmployeeID).

  • Projects (ProjectID) are linked to tasks (TaskID).

  • EmployeeID and TaskID do not directly relate to one another.


Problems with the Table:

  • Redundancy results from the table's mixing of linkages between tasks, personnel, and projects.

  • Rows for every employee must be repeated whenever a new job is introduced for a project.


Normalized Tables (5NF): Table 1: Project-Employee Relationship

ProjectID

EmployeeID

P1

E1

P1

E2

Table 2: Project-Task Relationship

ProjectID

TaskID

P1

T1

P1

T2

Table 3: Employee-Task Relationship

EmployeeID

TaskID

E1

T1

E1

T2

E2

T1

E2

T2

Example of Normalization Process


A thorough example covering all normal forms may be found here. We begin with a table that contains a bookstore’s data. The bookstore tracks customers, their orders, books purchased, and customer addresses.


Initial Table (Unnormalized Form)

OrderID

CustomerName

Address

Books

Authors

001

Alice

123 Main St

"Book A, Book B"

"Author X, Author Y"

002

Bob

456 Elm St

"Book A"

"Author X"

First Normal Form (1NF)


Ensure atomic values by separating multiple values into rows.

OrderID

CustomerName

Address

Book

Author

001

Alice

123 Main St

Book A

Author X

001

Alice

123 Main St

Book B

Author Y

002

Bob

456 Elm St

Book A

Author X

Second Normal Form (2NF)


Remove partial dependencies. Create separate tables for customers and orders.

Tables:


Customers:

CustomerID

CustomerName

Address

C001

Alice

123 Main St

C002

Bob

456 Elm St

Orders:

OrderID

CustomerID

001

C001

002

C002

Books in Orders:

OrderID

Book

Author

001

Book A

Author X

001

Book B

Author Y

002

Book A

Author X

Third Normal Form (3NF)


Remove transitive dependencies. Separate book and author information.

Tables:


Books:

BookID

Book

Author

B001

Book A

Author X

B002

Book B

Author Y

Books in Orders (Updated):

OrderID

BookID

001

B001

001

B002

002

B001

Final Schema


1.      Customers: (CustomerID, CustomerName, Address)

2.      Orders: (OrderID, CustomerID)

3.      Books: (BookID, Book, Author)

4.      Books in Orders: (OrderID, BookID)


This schema satisfies 1NF, 2NF, and 3NF while maintaining simplicity and practical application. There's no need for 4NF or 5NF, as there are no multi-valued or complex join dependencies in this example.


Conclusion

In summary, normalization is an essential database design procedure that aids in data organization to prevent redundancy, guarantee data integrity, and boost productivity. In order to maintain the data's logical structure and optimize it for storage and querying, each level of normalization concentrates on removing particular types of dependency.

16 views0 comments

Recent Posts

See All

Comments


bottom of page