SQL Correlated Subquery

A correlated subquery in SQL is a subquery that refers to columns from the outer query. It is evaluated once for every row processed by the outer query. This makes it more dynamic than a regular subquery but also potentially more computationally expensive.

Key Features of Correlated Subqueries

Dependency: The subquery depends on values from the outer query for its execution.
Row-by-Row Execution: For each row processed by the outer query, the subquery is executed.
Dynamic: The results of the subquery can change based on the current row of the outer query.

Syntax

SELECT column1, column2, ...
FROM table1 outer_alias
WHERE column = (SELECT aggregate_function(column)
                FROM table2 inner_alias
                WHERE outer_alias.column = inner_alias.column);

Example Dataset

Employees Table

EmployeeID	Name	DepartmentID	Salary
1	Alice	10	5000
2	Bob	20	6000
3	Charlie	10	7000
4	Diana	30	8000
5	Eve	20	9000

Departments Table

DepartmentID	DepartmentName
10	HR
20	IT
30	Finance

Examples

1. Find Employees Earning More Than Their Department's Average Salary

SELECT Name, Salary
FROM Employees e
WHERE Salary > (SELECT AVG(Salary)
                FROM Employees
                WHERE DepartmentID = e.DepartmentID);

Explanation:

The outer query retrieves employee names and salaries.
The subquery calculates the average salary for the current employee's department.

Result:

Name	Salary
Charlie	7000
Eve	9000

2. Find Employees in Departments with More Than Two Employees

SELECT Name, DepartmentID
FROM Employees e
WHERE 2 < (SELECT COUNT(*)
           FROM Employees
           WHERE DepartmentID = e.DepartmentID);

Explanation:

The subquery counts the number of employees in the current employee's department.
The outer query selects employees if their department has more than two employees.

Result:

Name	DepartmentID
Alice	10
Charlie	10

3. List Departments Where the Highest Salary is Greater Than $8000

SELECT DepartmentID
FROM Departments d
WHERE 8000 < (SELECT MAX(Salary)
              FROM Employees
              WHERE DepartmentID = d.DepartmentID);

Explanation:

The subquery calculates the maximum salary for the current department.
The outer query lists departments where the maximum salary exceeds $8000.

Result:

DepartmentID
20

How Correlated Subqueries Work

For each row in the outer query:

The subquery is executed.
The subquery uses column values from the current row of the outer query to filter or calculate results.
The subquery result is then used in the outer query's processing.

Difference Between Regular and Correlated Subqueries

Feature	Regular Subquery	Correlated Subquery
Dependency	Independent of the outer query	Dependent on the outer query
Execution	Executed once	Executed once for each outer row
Use Case	Static filtering or aggregation	Dynamic filtering or row-by-row logic
Performance	Faster (less overhead)	Slower (row-by-row execution)

Performance Considerations

Indexes:
- Ensure proper indexing on columns used in the subquery to improve performance.
Optimization:
- Try to rewrite correlated subqueries as JOINs or WITH clauses for better efficiency, if possible.
Data Size:
- Correlated subqueries can be slow for large datasets because of the row-by-row execution.

Example: Rewrite Correlated Subquery as JOIN

Original Query (Correlated Subquery):

SELECT Name, Salary
FROM Employees e
WHERE Salary > (SELECT AVG(Salary)
                FROM Employees
                WHERE DepartmentID = e.DepartmentID);

Optimized Query (Using JOIN and GROUP BY):

WITH DepartmentAvg AS (
    SELECT DepartmentID, AVG(Salary) AS AvgSalary
    FROM Employees
    GROUP BY DepartmentID
)
SELECT e.Name, e.Salary
FROM Employees e
JOIN DepartmentAvg da
ON e.DepartmentID = da.DepartmentID
WHERE e.Salary > da.AvgSalary;

Common Use Cases

Finding Specific Records:
- Example: Employees earning above the department average.
Dynamic Filtering:
- Example: Products with sales above the average in their category.
Conditional Aggregations:
- Example: Customers who placed more orders than the average for their region.

Conclusion

Correlated subqueries are powerful but can be computationally expensive due to their row-by-row evaluation. When performance is critical, consider optimizing them with JOINs, WITH clauses, or other techniques.

Souy Soeng

SQL Correlated Subquery