SQL Correlated Subquery

SQL Correlated Subquery

SQL Correlated Subquery

A correlated subquery in SQL is a subquery that refers to columns from the outer query. It is evaluated once for every row processed by the outer query. This makes it more dynamic than a regular subquery but also potentially more computationally expensive.


Key Features of Correlated Subqueries

  1. Dependency: The subquery depends on values from the outer query for its execution.
  2. Row-by-Row Execution: For each row processed by the outer query, the subquery is executed.
  3. Dynamic: The results of the subquery can change based on the current row of the outer query.

Syntax

SELECT column1, column2, ... FROM table1 outer_alias WHERE column = (SELECT aggregate_function(column) FROM table2 inner_alias WHERE outer_alias.column = inner_alias.column);

Example Dataset

Employees Table

EmployeeIDNameDepartmentIDSalary
1Alice105000
2Bob206000
3Charlie107000
4Diana308000
5Eve209000

Departments Table

DepartmentIDDepartmentName
10HR
20IT
30Finance

Examples

1. Find Employees Earning More Than Their Department's Average Salary

SELECT Name, Salary FROM Employees e WHERE Salary > (SELECT AVG(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID);

Explanation:

  • The outer query retrieves employee names and salaries.
  • The subquery calculates the average salary for the current employee's department.

Result:

NameSalary
Charlie7000
Eve9000

2. Find Employees in Departments with More Than Two Employees

SELECT Name, DepartmentID FROM Employees e WHERE 2 < (SELECT COUNT(*) FROM Employees WHERE DepartmentID = e.DepartmentID);

Explanation:

  • The subquery counts the number of employees in the current employee's department.
  • The outer query selects employees if their department has more than two employees.

Result:

NameDepartmentID
Alice10
Charlie10

3. List Departments Where the Highest Salary is Greater Than $8000

SELECT DepartmentID FROM Departments d WHERE 8000 < (SELECT MAX(Salary) FROM Employees WHERE DepartmentID = d.DepartmentID);

Explanation:

  • The subquery calculates the maximum salary for the current department.
  • The outer query lists departments where the maximum salary exceeds $8000.

Result:

DepartmentID
20

How Correlated Subqueries Work

For each row in the outer query:

  1. The subquery is executed.
  2. The subquery uses column values from the current row of the outer query to filter or calculate results.
  3. The subquery result is then used in the outer query's processing.

Difference Between Regular and Correlated Subqueries

FeatureRegular SubqueryCorrelated Subquery
DependencyIndependent of the outer queryDependent on the outer query
ExecutionExecuted onceExecuted once for each outer row
Use CaseStatic filtering or aggregationDynamic filtering or row-by-row logic
PerformanceFaster (less overhead)Slower (row-by-row execution)

Performance Considerations

  1. Indexes:

    • Ensure proper indexing on columns used in the subquery to improve performance.
  2. Optimization:

    • Try to rewrite correlated subqueries as JOINs or WITH clauses for better efficiency, if possible.
  3. Data Size:

    • Correlated subqueries can be slow for large datasets because of the row-by-row execution.

Example: Rewrite Correlated Subquery as JOIN

Original Query (Correlated Subquery):

SELECT Name, Salary FROM Employees e WHERE Salary > (SELECT AVG(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID);

Optimized Query (Using JOIN and GROUP BY):

WITH DepartmentAvg AS ( SELECT DepartmentID, AVG(Salary) AS AvgSalary FROM Employees GROUP BY DepartmentID ) SELECT e.Name, e.Salary FROM Employees e JOIN DepartmentAvg da ON e.DepartmentID = da.DepartmentID WHERE e.Salary > da.AvgSalary;

Common Use Cases

  1. Finding Specific Records:
    • Example: Employees earning above the department average.
  2. Dynamic Filtering:
    • Example: Products with sales above the average in their category.
  3. Conditional Aggregations:
    • Example: Customers who placed more orders than the average for their region.

Conclusion

Correlated subqueries are powerful but can be computationally expensive due to their row-by-row evaluation. When performance is critical, consider optimizing them with JOINs, WITH clauses, or other techniques.

Soeng Souy

Soeng Souy

Website that learns and reads, PHP, Framework Laravel, How to and download Admin template sample source code free.

Post a Comment

CAN FEEDBACK
close