MySQL PERCENT_RANK Function
The PERCENT_RANK()
function in MySQL is a window function introduced in MySQL 8.0. It calculates the relative rank of a row within a result set as a percentage of the total rows.
This function is particularly useful for statistical analysis, where you need to compare how a specific value ranks relative to the entire dataset.
Syntax
PERCENT_RANK() OVER (
[PARTITION BY column_name]
[ORDER BY column_name]
)
PARTITION BY
: Divides the result set into partitions. The rank is calculated separately for each partition.ORDER BY
: Specifies the order of rows within each partition for calculating the rank.
Formula
The the PERCENT_RANK()
function calculates the rank using the following formula:
PERCENT_RANK = (rank - 1) / (total_rows - 1)
rank
: The rank of the current row (starting at 1).total_rows
: The total number of rows in the partition or dataset.
If there is only one row in the partition, PERCENT_RANK()
always returns 0.
Example 1: Calculating PERCENT_RANK for All Rows
Assume a table named employees
with the following data:
employee_id | department_id | salary |
---|---|---|
1 | 1 | 5000 |
2 | 1 | 6000 |
3 | 1 | 7000 |
4 | 2 | 8000 |
5 | 2 | 9000 |
To calculate the PERCENT_RANK()
for all rows based on their salary:
SELECT employee_id, salary,
PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;
Output:
employee_id | salary | percent_rank |
---|---|---|
1 | 5000 | 0.0000 |
2 | 6000 | 0.2500 |
3 | 7000 | 0.5000 |
4 | 8000 | 0.7500 |
5 | 9000 | 1.0000 |
Example 2: Using PARTITION BY
To calculate the PERCENT_RANK()
within each department:
SELECT employee_id, department_id, salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS percent_rank
FROM employees;
Output:
employee_id | department_id | salary | percent_rank |
---|---|---|---|
1 | 1 | 5000 | 0.0000 |
2 | 1 | 6000 | 0.5000 |
3 | 1 | 7000 | 1.0000 |
4 | 2 | 8000 | 0.0000 |
5 | 2 | 9000 | 1.0000 |
Example 3: Filtering Rows Based on PERCENT_RANK
To find employees whose salaries fall in the top 25% of the dataset:
SELECT employee_id, salary, percent_rank
FROM (
SELECT employee_id, salary,
PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees
) ranked
WHERE percent_rank >= 0.75;
Output:
employee_id | salary | percent_rank |
---|---|---|
4 | 8000 | 0.7500 |
5 | 9000 | 1.0000 |
Key Considerations
Zero-Based Index:
- The first row in each partition always has a
PERCENT_RANK()
of 0.
- The first row in each partition always has a
Uniform Distribution:
PERCENT_RANK()
assumes a uniform distribution of rows based on the specifiedORDER BY
clause.
Ties:
- If multiple rows have the same value in the
ORDER BY
column, they are assigned the same rank, and the next rank is calculated accordingly.
- If multiple rows have the same value in the
Single-Row Partition:
- If a partition contains only one row,
PERCENT_RANK()
returns 0 for that row.
- If a partition contains only one row,
Use Cases
Performance Analysis:
- Compare employee performance relative to their peers within a department or organization.
Grade Scaling:
- Assign percentile ranks to students based on their scores.
Statistical Analysis:
- Rank products, services, or entities based on performance metrics.
Outlier Detection:
- Identify rows in the top or bottom percentage of the dataset.
Conclusion
The PERCENT_RANK()
function is a powerful tool in MySQL 8.0 for calculating relative ranks as percentages. It is widely used in analytical queries to evaluate data distribution and rank entities within partitions or across an entire dataset. For pre-8.0 MySQL versions, it is not natively available and would require manual calculations or approximations using user-defined variables.