SQL Correlated Subquery
A correlated subquery in SQL is a subquery that refers to columns from the outer query. It is evaluated once for every row processed by the outer query. This makes it more dynamic than a regular subquery but also potentially more computationally expensive.
Key Features of Correlated Subqueries
- Dependency: The subquery depends on values from the outer query for its execution.
- Row-by-Row Execution: For each row processed by the outer query, the subquery is executed.
- Dynamic: The results of the subquery can change based on the current row of the outer query.
Syntax
Example Dataset
Employees Table
EmployeeID | Name | DepartmentID | Salary |
---|---|---|---|
1 | Alice | 10 | 5000 |
2 | Bob | 20 | 6000 |
3 | Charlie | 10 | 7000 |
4 | Diana | 30 | 8000 |
5 | Eve | 20 | 9000 |
Departments Table
DepartmentID | DepartmentName |
---|---|
10 | HR |
20 | IT |
30 | Finance |
Examples
1. Find Employees Earning More Than Their Department's Average Salary
Explanation:
- The outer query retrieves employee names and salaries.
- The subquery calculates the average salary for the current employee's department.
Result:
Name | Salary |
---|---|
Charlie | 7000 |
Eve | 9000 |
2. Find Employees in Departments with More Than Two Employees
Explanation:
- The subquery counts the number of employees in the current employee's department.
- The outer query selects employees if their department has more than two employees.
Result:
Name | DepartmentID |
---|---|
Alice | 10 |
Charlie | 10 |
3. List Departments Where the Highest Salary is Greater Than $8000
Explanation:
- The subquery calculates the maximum salary for the current department.
- The outer query lists departments where the maximum salary exceeds $8000.
Result:
DepartmentID |
---|
20 |
How Correlated Subqueries Work
For each row in the outer query:
- The subquery is executed.
- The subquery uses column values from the current row of the outer query to filter or calculate results.
- The subquery result is then used in the outer query's processing.
Difference Between Regular and Correlated Subqueries
Feature | Regular Subquery | Correlated Subquery |
---|---|---|
Dependency | Independent of the outer query | Dependent on the outer query |
Execution | Executed once | Executed once for each outer row |
Use Case | Static filtering or aggregation | Dynamic filtering or row-by-row logic |
Performance | Faster (less overhead) | Slower (row-by-row execution) |
Performance Considerations
Indexes:
- Ensure proper indexing on columns used in the subquery to improve performance.
Optimization:
- Try to rewrite correlated subqueries as
JOINs
orWITH
clauses for better efficiency, if possible.
- Try to rewrite correlated subqueries as
Data Size:
- Correlated subqueries can be slow for large datasets because of the row-by-row execution.
Example: Rewrite Correlated Subquery as JOIN
Original Query (Correlated Subquery):
Optimized Query (Using JOIN and GROUP BY):
Common Use Cases
- Finding Specific Records:
- Example: Employees earning above the department average.
- Dynamic Filtering:
- Example: Products with sales above the average in their category.
- Conditional Aggregations:
- Example: Customers who placed more orders than the average for their region.
Conclusion
Correlated subqueries are powerful but can be computationally expensive due to their row-by-row evaluation. When performance is critical, consider optimizing them with JOINs
, WITH
clauses, or other techniques.