SQL GROUP BY
Summary: in this tutorial, you will learn how to use the SQL GROUP BY clause to group rows based on one or more columns.
Introduction to SQL GROUP BY clause
Grouping is one of the most important tasks that you have to deal with while working with databases. To group rows into groups, you use the GROUP BY clause.
The GROUP BY clause is an optional clause of the SELECT statement that combines rows into groups based on matching values in specified columns. One row is returned for each group.
You often use the GROUP BY in conjunction with an aggregate function such as MIN, MAX, AVG, SUM, or COUNT to calculate a measure that provides the information for each group.
The following illustrates the syntax of the GROUP BY clause.
SELECT
column1,
column2,
AGGREGATE_FUNCTION (column3)
FROM
table1
GROUP BY
column1,
column2;
It is not mandatory to include an aggregate function in the SELECT clause. However, if you use an aggregate function, it will calculate the summary value for each group.
If you want to filter the rows before grouping, you add a WHERE clause. However, to filter groups, you use the HAVING clause.
It is important to emphasize that the WHERE clause is applied before rows are grouped whereas the HAVING clause is applied after rows are grouped. In other words, the WHERE clause is applied to rows whereas the HAVING clause is applied to groups.
To sort the groups, you add the ORDER BY clause after the GROUP BY clause.
The columns that appear in the GROUP BY clause are called grouping columns. If a grouping column contains NULL values, all NULL values are summarized into a single group because the GROUP BY clause considers NULL values are equal.
SQL GROUP BY examples
We will use the employees
and departments
tables in the sample database to demonstrate how the GROUP BY clause works.
To find the headcount of each department, you group the employees by the department_id
column, and apply the COUNT function to each group as the following query:
SELECT
department_id,
COUNT(employee_id) headcount
FROM
employees
GROUP BY
department_id;
SQL GROUP BY with INNER JOIN example
To get the department name, you join the employees
table with the departments
table as follows:
SELECT
e.department_id,
department_name,
COUNT(employee_id) headcount
FROM
employees e
INNER JOIN departments d ON d.department_id = e.department_id
GROUP BY
e.department_id;
SQL GROUP BY with ORDER BY example
To sort the departments by headcount, you add an ORDER BY clause as the following statement:
SELECT
e.department_id,
department_name,
COUNT(employee_id) headcount
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
GROUP BY e.department_id
ORDER BY headcount DESC;
Note that you can use either the headcount
alias or the COUNT(employee_id)
in the ORDER BY clause.
SQL GROUP BY with HAVING example
To find the department whose headcount is greater than 5, you use the HAVING clause like the following query:
SELECT
e.department_id,
department_name,
COUNT(employee_id) headcount
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
GROUP BY e.department_id
HAVING headcount > 5
ORDER BY headcount DESC;
SQL GROUP BY with MIN, MAX, and AVG example
The following query returns the minimum, maximum, and average salary of employees in each department.
SELECT
e.department_id,
department_name,
MIN(salary) min_salary,
MAX(salary) max_salary,
ROUND(AVG(salary), 2) average_salary
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
GROUP BY e.department_id;
SQL GROUP BY with SUM function example
To get the total salary per department, you apply the SUM function to the salary
column and group employees by the department_id
column as follows:
SELECT
e.department_id,
department_name,
SUM(salary) total_salary
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
GROUP BY e.department_id;
SQL GROUP BY multiple columns
So far, you have seen that we have grouped all employees by one column. For example, the following clause
GROUP BY department_id
place all rows with the same values in the department_id
a column in one group.
How about grouping employees by values in both department_id
and job_id
columns?
GROUP BY department_id, job_id
This clause will group all employees with the same values in both department_id
and job_id
columns in one group.
The following statement groups row with the same values in both department_id
and job_id
columns in the same group then return the rows for each of these groups.
SELECT
e.department_id,
department_name,
e.job_id,
job_title,
COUNT(employee_id)
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
INNER JOIN
jobs j ON j.job_id = e.job_id
GROUP BY e.department_id , e.job_id;
Department 2, 3, and 5 appear more than one.
This is because these departments have employees who hold different jobs. For example, in the shipping department, there are 2 employees holding the shipping clerk job, 1 employee holding the stock clerk job, and 4 employees holding the stock manager job.
SQL GROUP BY and DISTINCT
If you use the GROUP BY
clause without an aggregate function, the GROUP BY
clause behaves like the DISTINCT operator.
The following gets the phone numbers of employees and also groups rows by the phone numbers.
SELECT
phone_number
FROM
employees
GROUP BY
phone_number;
Notice that the phone numbers are sorted.
The following statement also retrieves the phone numbers but instead of using the GROUP BY
clause, it uses the DISTINCT
operator.
SELECT DISTINCT
phone_number
FROM
employees;
The result set is the same except that the one returned by the DISTINCT
the operator is not sorted.
In this tutorial, we have shown you how to use the GROUP BY
clause to summarize rows into groups and apply the aggregate function to each group.
0 Comments
CAN FEEDBACK
Emoji