Welcome to this SQL developer interview preparation guide!
You've come to the perfect spot if you're preparing for a SQL developer interview. This guide is designed to help you refresh your SQL skills, boost your confidence, and prepare you for your job interview.
We've prepared a collection of real-world interview questions from prominent organizations such as Google, Oracle, Amazon, and Microsoft. Each question is accompanied by a well-crafted response, saving you time in your interview preparation.
During a SQL interview, the following 20 types of questions are most frequently asked:
1. What is SQL and why should we use it?
SQL (Structured Query Language) is a standard programming language designed for managing and manipulating relational databases. It is used to create, modify, and query databases, and is widely used in businesses and organizations to store and manage large amounts of data.
Here are some specific reasons why SQL is important:
Data organization: SQL allows you to structure and organize data in tables, making it easier to understand and manage large datasets.
Data retrieval: SQL provides powerful tools for retrieving data from a database, including the ability to filter, sort, and group data, and to perform complex calculations and aggregations.
Data analysis: SQL provides a powerful language for analyzing data, including the ability to join multiple tables together, and to perform complex queries that can uncover valuable insights.
Data modification: SQL allows you to modify data in a database, including the ability to insert new records, update existing records, and delete records.
Security: SQL provides tools for securing data in a database, including the ability to restrict access to certain tables or columns, and to set up user accounts with specific permissions.
Scalability: SQL is designed to handle large datasets and can be scaled to accommodate growing data needs.
Overall, SQL is essential for managing and analyzing data in a relational database. Whether you are working with a small database or a large enterprise-level system, SQL is a powerful tool that can help you make better use of your data.
2. What do you know about Databases?
A database is an organized collection of structured data that is stored and accessed electronically. Databases are used to store and manage data in a way that allows for efficient querying, searching, and analysis.
Here are some key concepts related to databases:
Relational databases: A relational database organizes data into tables, where each table represents a different entity, such as customers, orders, or products. Tables are connected through relationships, which allows data to be efficiently queried and analyzed.
NoSQL databases: NoSQL databases are designed to handle unstructured or semi-structured data, such as documents, images, or videos. Unlike relational databases, NoSQL databases do not use tables, and instead use different data models, such as key-value, document-oriented, or graph-based models.
3. What is RDBMS? How is it different from DBMS?
In summary, RDBMS is a type of DBMS that is based on the relational data model and enforces strict rules for data integrity. It uses SQL as its query language, and is widely used in enterprise-level applications.
4. What is a Primary Key?
A primary key is a column or a set of columns in a database table that uniquely identifies each record (row) in the table. The primary key is used to ensure the integrity of the data in the table, and to enable efficient querying and indexing of the table. Here are some key characteristics of primary keys:
Uniqueness: Each value in the primary key column (or set of columns) must be unique. This means that no two records in the table can have the same value in the primary key column(s).
Non-nullability: The primary key column(s) cannot contain null values, as a null value cannot be used to uniquely identify a record.
Stability: The values in the primary key column(s) should be stable, meaning that they should not change over time. If a record's primary key value changes, it can cause problems with referential integrity, which is the relationship between tables.
Indexability: The primary key column(s) are typically indexed, which allows for efficient searching and sorting of the table.
Examples of primary keys include a customer ID in a customer table, an order number in an order table, or a product code in a product table. In some cases, a primary key may consist of multiple columns, such as a combination of a customer ID and an order number in an order history table.
5. What do you know about a Foreign Key and how does it work?
A foreign key is a column in a database table that refers to the primary key of another table. A foreign key creates a relationship between two tables, where the foreign key column in one table (the "child" table) references the primary key column in another table (the "parent" table). Here are some key characteristics of foreign keys:
Referential integrity: The use of a foreign key ensures referential integrity between the two tables. This means that the data in the child table is consistent with the data in the parent table, and that a record cannot be deleted from the parent table if there are related records in the child table.
Relationship types: Different types of relationships can be established between tables using foreign keys, including one-to-one, one-to-many, and many-to-many relationships.
Indexing: Foreign keys are typically indexed, which allows for efficient searching and sorting of the table.
Nullability: A foreign key column can contain null values, which indicates that there is no corresponding record in the parent table.
Cascade actions: Some database systems allow for cascade actions to be defined for foreign keys, such as cascading updates or deletes. This means that if a record in the parent table is updated or deleted, the corresponding records in the child table are automatically updated or deleted as well.
Example: Consider a customer table with a primary key column called "customer_id", and an order table with a foreign key column called "customer_id" that references the customer table. This establishes a one-to-many relationship between the customer table and the order table, where a single customer can have multiple orders. When a new order is added to the order table, the customer_id column must reference a valid customer_id in the customer table, ensuring referential integrity.
6. Why is Normalization important?
Normalization is important in databases because it helps to eliminate data redundancy and inconsistencies, which can lead to various problems such as data anomalies, data update anomalies, and data integrity issues.
Normalization is a process of organizing data in a database to reduce redundancy and dependency among data. It involves dividing larger tables into smaller tables and defining relationships between them. The goal of normalization is to create a database schema that is structurally sound and easy to maintain.
Normalization ensures that each piece of data is stored only once, which reduces the storage space required and also ensures that updates to the data are consistent across the database. It also makes it easier to add new data to the database and to retrieve data from the database.
There are several levels of normalization, each with its own set of rules and guidelines. The most commonly used levels of normalization are first normal form (1NF), second normal form (2NF), and third normal form (3NF). Each level of normalization builds upon the previous level, with the ultimate goal of achieving a fully normalized database that is free from data anomalies and inconsistencies.
Types of normalization:
First Normal Form or 1NF
Second Normal Form or 2NF
Third Normal Form or 3NF
Boyce Codd Normal Form or BCNF
Fourth normal form or 4NF
Fifth normal form or 5NF
Sixth normal form or 6NF (still is in theory)
7. Describe database relationships and list the different sorts of relationships.
In a relational database, relationships refer to how tables are related to each other based on their data values. Three main types of relationships can exist between tables in a database: one-to-one, one-to-many, and many-to-many.
One-to-One Relationship:
A one-to-one relationship exists between two tables when one record in the first table can only be associated with one record in the second table and vice versa. This relationship is not very common in databases because it is often more efficient to store the data in a single table. However, it can be useful in certain situations, such as when one table contains sensitive data that needs to be separated from the rest of the data.
One-to-Many Relationship:
A one-to-many relationship exists between two tables when one record in the first table can be associated with many records in the second table, but each record in the second table can only be associated with one record in the first table. This relationship is the most common type of relationship in databases and is used to store data that has a parent-child relationship, such as a customer and their orders.
Many-to-Many Relationship:
A many-to-many relationship exists between two tables when one record in the first table can be associated with many records in the second table and vice versa. This relationship requires the use of a junction table or associative entity, which contains the primary keys of both tables. This type of relationship is often used to store data that has a many-to-many relationship, such as students and courses.
In addition to these three main types of relationships, other types of relationships can exist between tables, such as self-referencing relationships, recursive relationships, and subtypes and supertypes relationships. These relationships are less common in databases but can be useful in certain situations.
8. Describe the difference between the commands TRUNCATE, DELETE and DROP.
TRUNCATE:
TRUNCATE is a command used to remove all the data from a table in a single operation. It is a faster and more efficient way of removing data than DELETE because it does not log every row deletion and does not generate undo information. TRUNCATE also resets the identity seed of the table, which means that if you insert new records after truncating, the identity column values will start from the beginning. TRUNCATE cannot be rolled back and requires the table to have no foreign key references.
DELETE:
DELETE is a command used to remove one or more rows from a table. It is a slower operation than TRUNCATE because it logs each row deletion and generates undo information. DELETE also does not reset the identity seed of the table. DELETE can be rolled back and can be used on tables with foreign key references.
DROP:
DROP is a command used to remove an entire object from the database, such as a table, view, or stored procedure. It is a permanent action and cannot be undone. When you drop a table, all data, indexes, triggers, and constraints associated with the table are also removed. If you drop a table with foreign key references in other tables, those tables will become invalid.
In summary, TRUNCATE removes all data from a table and resets the identity seed, DELETE removes one or more rows from a table, and DROP removes an entire object from the database. TRUNCATE is faster and more efficient but cannot be rolled back and requires the table to have no foreign key references, while DELETE is slower but can be rolled back and can be used on tables with foreign key references. DROP is a permanent action that removes an entire object and all associated data.
9. What exactly is a data warehouse?
A data warehouse is a large, centralized repository of integrated data from multiple sources, that has been optimized for querying, reporting and analysis. It is designed to support business intelligence (BI) activities, such as data mining, online analytical processing (OLAP), and reporting. A data warehouse is often used to consolidate and store historical data that is used to make strategic business decisions, rather than operational decisions.
The data in a data warehouse is typically organized into subject areas, such as sales, marketing, or finance, and is optimized for querying and reporting. It is usually extracted, transformed, and loaded (ETL) from various sources, such as operational databases, legacy systems, and external data sources, into a format that is suitable for analysis.
A data warehouse is characterized by several key features, including:
Integrated data: Data from different sources are integrated into a single repository, with consistent definitions and structures.
Time-variant data: The data in a data warehouse is organized by time, with historical data stored alongside current data. This enables trends and patterns to be analyzed over time.
Non-volatile data: Once data is loaded into a data warehouse, it is not updated or deleted. Instead, new data is added to the warehouse, and the old data is retained for historical analysis.
Designed for analysis: A data warehouse is optimized for querying, reporting, and analysis, with tools and technologies that allow users to easily access and analyze the data.
Data warehouses are used by organizations of all sizes and industries to improve decision-making, optimize business processes, and gain competitive advantage. They are particularly useful for organizations with large volumes of data, complex data relationships, and a need for historical analysis.
10. What does CLAUSE mean?
In SQL, a clause is a keyword or reserved word used to specify the purpose of a SQL statement or query. A clause is a component of a SQL statement that provides additional information about the operation that the statement is performing.
Some examples of commonly used SQL clauses include:
SELECT: The SELECT clause is used to retrieve data from one or more tables in a database.
FROM: The FROM clause specifies the tables from which data is being retrieved in a SELECT statement.
WHERE: The WHERE clause is used to specify conditions that must be met for a row to be included in the result set.
JOIN: The JOIN clause is used to combine rows from two or more tables based on a related column.
GROUP BY: The GROUP BY clause is used to group rows in a result set based on one or more columns.
HAVING: The HAVING clause is used to specify conditions that must be met by a group of rows, as specified by the GROUP BY clause.
ORDER BY: The ORDER BY clause is used to sort the result set in ascending or descending order based on one or more columns.
LIMIT: The LIMIT clause is used to limit the number of rows returned by a query.
Each of these clauses serves a specific purpose and can be used in combination to create more complex SQL statements and queries.
11. What are Aggregate and Scalar functions?
Aggregate functions are used to perform a calculation on a set of values and return a single value. These functions operate on a group of rows and return a single value for each group. Examples of aggregate functions include SUM, AVG, MIN, MAX, and COUNT.
For example, the following SQL statement uses the SUM aggregate function to calculate the total sales for a particular product:
Scalar functions, on the other hand, operate on a single value and return a single value. They are used to manipulate data values and perform calculations on them. Examples of scalar functions include UPPER, LOWER, LENGTH, and ROUND.
For example, the following SQL statement uses the UPPER scalar function to convert a string value to uppercase:
12. How do I view tables in SQL?
SHOW TABLES is an SQL command that can be used in MySQL to list all the tables in a particular database. Here's the syntax for using the SHOW TABLES command:
If the FROM clause is omitted, the command will list all the tables in the current database. If you want to view the tables from a specific database, replace database_name with the name of the database you want to view tables from.
Here's an example of how to use the SHOW TABLES command to list all the tables in a database:
This will display a list of all the tables in the my_database database.
Note: The SHOW TABLES command is specific to MySQL and may not work in other SQL database management systems.
13. What is the order of execution of a Query?
Finding the data we require in a database and then filtering it so that it can be processed and comprehended as quickly as feasible are the first steps in every query. It's critical to comprehend the sequence of execution because each component of the query is carried out sequentially in order to realise which outcomes are accessible.
FROM and JOIN: To determine the entire working set of data being requested, the FROM clause and subsequent JOINs are initially run.
WHERE: After obtaining the entire working set of data, the individual rows are subjected to the WHERE restrictions, and any rows that do not adhere to the constraint are removed.
GROUP BY: Following the application of the WHERE restrictions, the remaining rows are grouped based on common values in the column supplied in the GROUP BY clause.
HAVING: If the query has a GROUP BY clause, the grouped rows are subject to the constraints in the HAVING clause, and those that do not satisfy the constraint are rejected.
SELECT: The query's SELECT portion's expressions are eventually computed.
DISTINCT: Rows with duplicate values in the column designated as DISTINCT will be removed from the remaining rows.
ORDER BY: The rows are sorted by the provided data in either ascending or descending order if an order is specified by the ORDER BY clause.
LIMIT: Lastly, the query's final set of rows is returned after the rows that are outside the LIMIT-specified range are eliminated.
14. What are the different JOINS in SQL?
In SQL, there are several types of JOINs that can be used to combine data from two or more tables. The most common types of JOINs are:
INNER JOIN: Returns only the rows that have matching values in both tables based on the join condition.
LEFT JOIN (or LEFT OUTER JOIN): Returns all the rows from the left table and matching rows from the right table based on the join condition. If there is no matching row in the right table, NULL values are returned.
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all the rows from the right table and matching rows from the left table based on the join condition. If there is no matching row in the left table, NULL values are returned.
FULL JOIN (or FULL OUTER JOIN): Returns all the rows from both tables, and NULL values are returned for the columns where there is no match.
CROSS JOIN: Returns the Cartesian product of both tables, which means it returns all possible combinations of rows from both tables.
SELF JOIN: This is a special case where a table is joined with itself.
Here's an example of using INNER JOIN to combine data from two tables based on a join condition:
This will return only the rows from table1 and table2 where the id column in both tables have matching values.
Note: The type of JOIN used depends on the data being queried and the specific needs of the query.
15. What is Sub-query?
In SQL, a subquery (also known as an inner query or nested query) is a query that is nested inside another query. The subquery is executed first, and the results are then used by the outer query to perform further analysis or filtering.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
A subquery can be used in various parts of an SQL statement, such as the WHERE clause, the HAVING clause, or the FROM clause. The subquery can be used to retrieve a single value or a set of values that can be used to filter, join or aggregate data from the outer query.
Here's an example of using a subquery to retrieve the highest salary for each department:
This query returns the highest salary for each department using the MAX() aggregate function. However, if you want to see the name of the employee who has the highest salary in each department, you can use a subquery in the WHERE clause like this:
This query uses a subquery in the WHERE clause to retrieve the maximum salary for each department, and then uses the outer query to retrieve the employee name and salary that matches the maximum salary.
Subqueries can be used to create complex queries and perform advanced analysis on data in SQL. However, it's important to note that subqueries can sometimes result in slow query performance, so it's important to optimize them carefully.
16. What are DDL, DQL, DML, DCL and TCL?
In SQL, there are several types of statements that are used to perform different operations on the database. The most commonly used types of SQL statements are:
Data Definition Language (DDL): DDL statements are used to create, modify, or delete database objects such as tables, indexes, and views. Examples of DDL statements include CREATE TABLE, ALTER TABLE, and DROP TABLE.
Data Query Language (DQL): DQL statements are used to retrieve data from one or more tables in a database. The SELECT statement is the most commonly used DQL statement.
Data Manipulation Language (DML): DML statements are used to manipulate data in tables. Examples of DML statements include INSERT, UPDATE, and DELETE.
Data Control Language (DCL): DCL statements are used to control access to database objects. Examples of DCL statements include GRANT and REVOKE.
Transaction Control Language (TCL): TCL statements are used to manage transactions within a database. Examples of TCL statements include COMMIT and ROLLBACK.
Here's an example of each type of SQL statement:
DDL:
DQL:
DML:
DCL:
TCL:
17. What will be the SQL query to get the number of employees working in project “A1” from the EmployeeSalary table?
We can use the aggregate function count() with a SQL WHERE clause to receive the number of employees:-
SELECT COUNT(EmployeeID) as employee_count
FROM EmployeeSalary
WHERE Project = “A1”;
18. What will be the SQL query to get all the Employees data from the EmployeeDetails table who enlisted in 2020?
SELECT * FROM EmployeeDetails
WHERE DateOfJoining BETWEEN ‘2020/01/01’ AND ‘2020/12/31’;
19. Given a table called "sales" with columns "sale_id", "item_name", "quantity_sold", and "sale_date", write a SQL query to retrieve the total quantity sold for each item, ordered by the item with the highest total quantity sold.
SELECT item_name, SUM(quantity_sold) as total_quantity_sold FROM sales GROUP BY item_name
ORDER BY total_quantity_sold DESC;
20. Given two tables "orders" and "customers", with columns "order_id", "customer_id", "order_date", "total_amount", "customer_name", "customer_city", and "customer_email", write a SQL query to retrieve the total amount of orders made by customers who live in a specific city and have made more than 5 orders, ordered by the total amount in descending order, and limit the result to the top 10 customers.
SELECT customers.customer_name, customers.customer_city, SUM(orders.total_amount) as total_order_amount
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE customers.customer_city = 'specific_city'
GROUP BY customers.customer_name, customers.customer_city
HAVING COUNT(orders.order_id) > 5
ORDER BY total_order_amount DESC
LIMIT 10;
Comments