SQL GROUP BY
The GROUP BY
statement in SQL is used to arrange identical data into groups. It is typically used with aggregate functions (like COUNT
, SUM
, AVG
, MAX
, MIN
) to perform calculations on each group of data.
Key Points
Grouping Data:
- Combines rows with the same values in the specified columns into a single group.
Aggregate Functions:
- Performs calculations for each group, such as finding totals, averages, or counts.
Order of Execution:
GROUP BY
is executed afterWHERE
and beforeORDER BY
.
Columns in SELECT:
- Any column in the
SELECT
clause that is not part of an aggregate function must be in theGROUP BY
clause.
- Any column in the
Syntax
column_name
: Column(s) to group by.aggregate_function
: Function applied to each group (e.g.,SUM
,COUNT
).
Examples
1. Basic Example: GROUP BY Single Column
- Groups rows by department and counts the number of employees in each department.
2. GROUP BY Multiple Columns
- Groups rows by both department and job title, and counts the number of employees in each group.
3. GROUP BY with WHERE Clause
- Filters rows where
Salary > 30000
before grouping them by department and calculating the average salary.
4. GROUP BY with ORDER BY
- Groups rows by department calculates the total salary for each department, and sorts the results in descending order of total salary.
Use Cases
Summarizing Data:
- Calculate totals, averages, and other aggregates for specific groups.
Data Segmentation:
- Analyze subsets of data grouped by categories like departments, regions, or periods.
Counting Occurrences:
- Determine the frequency of specific values, e.g., the number of employees in each role.
GROUP BY with HAVING Clause
The HAVING
clause is used to filter grouped data after the GROUP BY
operation. It is similar to WHERE
, but works on grouped data.
Example: GROUP BY with HAVING
- Group rows by department count employees in each group, and filter to include only departments with more than 5 employees.
Common Errors
Non-Aggregated Columns in SELECT:
- The error occurs when a column in the
SELECT
clause is not part ofGROUP BY
or an aggregate function.
- The error occurs when a column in the
Incorrect Placement of HAVING:
HAVING
must come afterGROUP BY
.
Advanced Examples
1. GROUP BY with Aliased Columns
- Groups rows by the year extracted from
OrderDate
and counts orders per year.
2. GROUP BY with JOIN
- Joins
Employees
andDepartments
, groups data by department name, and counts employees in each department.
Best Practices
Include All Non-Aggregate Columns:
- Ensure all columns in the
SELECT
clause are either aggregated or included inGROUP BY
.
- Ensure all columns in the
**Avoid Using SELECT ***:
- Specify the columns explicitly to ensure clarity and proper grouping.
Use Descriptive Aliases:
- Name aggregated columns meaningfully using
AS
for better readability.
- Name aggregated columns meaningfully using
Conclusion
The GROUP BY
statement is a powerful tool for summarizing and analyzing grouped data in SQL. It works seamlessly with aggregate functions and provides valuable insights into datasets. Understanding how to combine it with HAVING
, JOIN
, and ORDER BY
unlocks its full potential.