MySQL PERCENT_RANK Function

MySQL PERCENT_RANK Function

MySQL PERCENT_RANK Function

The PERCENT_RANK() function in MySQL is a window function introduced in MySQL 8.0. It calculates the relative rank of a row within a result set as a percentage of the total rows.

This function is particularly useful for statistical analysis, where you need to compare how a specific value ranks relative to the entire dataset.

Syntax

PERCENT_RANK() OVER ( [PARTITION BY column_name] [ORDER BY column_name] )
  • PARTITION BY: Divides the result set into partitions. The rank is calculated separately for each partition.
  • ORDER BY: Specifies the order of rows within each partition for calculating the rank.

Formula

The the PERCENT_RANK() function calculates the rank using the following formula:

PERCENT_RANK = (rank - 1) / (total_rows - 1)
  • rank: The rank of the current row (starting at 1).
  • total_rows: The total number of rows in the partition or dataset.

If there is only one row in the partition, PERCENT_RANK() always returns 0.

Example 1: Calculating PERCENT_RANK for All Rows

Assume a table named employees with the following data:

employee_iddepartment_idsalary
115000
216000
317000
428000
529000

To calculate the PERCENT_RANK() for all rows based on their salary:

SELECT employee_id, salary, PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank FROM employees;

Output:

employee_idsalarypercent_rank
150000.0000
260000.2500
370000.5000
480000.7500
590001.0000

Example 2: Using PARTITION BY

To calculate the PERCENT_RANK() within each department:

SELECT employee_id, department_id, salary, PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS percent_rank FROM employees;

Output:

employee_iddepartment_idsalarypercent_rank
1150000.0000
2160000.5000
3170001.0000
4280000.0000
5290001.0000

Example 3: Filtering Rows Based on PERCENT_RANK

To find employees whose salaries fall in the top 25% of the dataset:

SELECT employee_id, salary, percent_rank FROM ( SELECT employee_id, salary, PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank FROM employees ) ranked WHERE percent_rank >= 0.75;

Output:

employee_idsalarypercent_rank
480000.7500
590001.0000

Key Considerations

  1. Zero-Based Index:

    • The first row in each partition always has a PERCENT_RANK() of 0.
  2. Uniform Distribution:

    • PERCENT_RANK() assumes a uniform distribution of rows based on the specified ORDER BY clause.
  3. Ties:

    • If multiple rows have the same value in the ORDER BY column, they are assigned the same rank, and the next rank is calculated accordingly.
  4. Single-Row Partition:

    • If a partition contains only one row, PERCENT_RANK() returns 0 for that row.

Use Cases

  1. Performance Analysis:

    • Compare employee performance relative to their peers within a department or organization.
  2. Grade Scaling:

    • Assign percentile ranks to students based on their scores.
  3. Statistical Analysis:

    • Rank products, services, or entities based on performance metrics.
  4. Outlier Detection:

    • Identify rows in the top or bottom percentage of the dataset.

Conclusion

The PERCENT_RANK() function is a powerful tool in MySQL 8.0 for calculating relative ranks as percentages. It is widely used in analytical queries to evaluate data distribution and rank entities within partitions or across an entire dataset. For pre-8.0 MySQL versions, it is not natively available and would require manual calculations or approximations using user-defined variables.

Soeng Souy

Soeng Souy

Website that learns and reads, PHP, Framework Laravel, How to and download Admin template sample source code free.

Post a Comment

CAN FEEDBACK
close