SQL DISTINCT
Summary: in this tutorial, you will learn how to use the SQL DISTINCT
operator to remove duplicates from a result set
Introduction to SQL DISTINCT
operator
The primary key ensures that the table has no duplicate rows. However, when you use the SELECT
statement to query a portion of the columns in a table, you may get duplicates.
To remove duplicates from a result set, you use the DISTINCT
operator in the SELECT
clause as follows:
SELECT DISTINCT
column1, column2, ...
FROM
table1;
If you use one column after the DISTINCT
operator, the database system uses that column to evaluate duplicates. In case you use two or more columns, the database system will use the combination of values in these columns for the duplication check.
To remove the duplicates, the database system first sorts the result set by every column specified in the SELECT
clause. It then scans the table from top to bottom to identify the duplicates that are next to each other. In case the result set is large, the sorting and scanning operations may reduce the performance of the query.
SQL DISTINCT
examples
We will use the employees
the table in the sample database to demonstrate how the DISTINCT
operator works.
1) Using SQL DISTINCT
on one column example
The following statement retrieves the salary data from the employees
table and sorts it in descending order.
SELECT
salary
FROM
employees
ORDER BY salary DESC;
As you see we have duplicate salary data e.g., 17,000 because two or more employees have the same salary. To remove the duplicate, you add the DISTINCT
operator to the SELECT
clause as follows:
SELECT
DISTINCT salary
FROM
employees
ORDER BY salary DESC;
Now all duplicates are removed from the result set.
2) Using SQL DISTINCT
on multiple columns example
The following statement retrieves the job id and salary from the employees
table.
SELECT
job_id,
salary
FROM
employees
ORDER BY
job_id,
salary DESC;
Now if you add the DISTINCT
operator to the SELECT
clause, the database system uses values in both job_id
and salary
columns to evaluate duplicates. It keeps only one of the highlight rows as shown in the screenshot above.
SELECT DISTINCT
job_id,
salary
FROM
employees
ORDER BY
job_id,
salary DESC;
SQL DISTINCT
and NULL
values
In the database world, NULL is special. NULL values are used as markers to indicate that the information is missing or not applicable.
For this reason, NULL cannot be compared to any value. Even NULL is not equal to itself. If you have two or more NULL values in a column, does the database system consider them as the same or distinct values?
Typically, the DISTINCT
operator treats all NULL values the same. As a result, the DISTINCT
operator keeps only one NULL
value and removes the other from the result set.
For example, the following statement returns the distinct phone numbers of employees.
SELECT DISTINCT
phone_number
FROM
employees;
Notice that it returned only one NULL value.
In this tutorial, you have learned how to use the SQL DISTINCT
operator to remove duplicate rows from a result set.
0 Comments
CAN FEEDBACK
Emoji