MySQL CUME_DIST Function

MySQL CUME_DIST Function

MySQL CUME_DIST Function

The CUME_DIST() function in MySQL is a window function introduced in MySQL 8.0. It calculates the cumulative distribution of a value within a result set. This function determines the relative rank of a row compared to other rows, expressed as a percentage of the total rows.

The result is a value between 0 (exclusive) and 1 (inclusive).

Syntax

CUME_DIST() OVER ( [PARTITION BY column_name] ORDER BY column_name [ASC | DESC] )
  • PARTITION BY: Divides the result set into partitions. The function calculates the cumulative distribution within each partition.
  • ORDER BY: Specifies the order of rows within the partition. This order determines the rank of the rows.

How It Works

  • CUME_DIST() calculates the proportion of rows with values less than or equal to the current row's value in the ordered partition.
  • Formula: CUME_DIST(row)=Number of rows with values <= current row valueTotal rows in the partition\text{CUME\_DIST}(\text{row}) = \frac{\text{Number of rows with values <= current row value}}{\text{Total rows in the partition}}

Examples

Example 1: Basic Usage

Suppose you have a table scores:

student_idsubjectscore
1Math90
2Math80
3Math80
4Math70

To calculate the cumulative distribution of scores:

SELECT student_id, score, CUME_DIST() OVER (ORDER BY score DESC) AS cumulative_distribution FROM scores;

Output:

student_idscorecumulative_distribution
1900.25
2800.75
3800.75
4701.00
  • For score = 90, the cumulative distribution is 14=0.25\frac{1}{4} = 0.25.
  • For score = 80, it appears twice, and the cumulative distribution includes both rows: 34=0.75\frac{3}{4} = 0.75.
  • For score = 70, it includes all rows: 44=1.00\frac{4}{4} = 1.00.

Example 2: Using PARTITION BY

If the the scores table includes multiple subjects:

student_idsubjectscore
1Math90
2Math80
3Math80
4Math70
5Science85
6Science80

To calculate the cumulative distribution of scores for each subject:

SELECT student_id, subject, score, CUME_DIST() OVER (PARTITION BY subject ORDER BY score DESC) AS cumulative_distribution FROM scores;

Output:

student_idsubjectscorecumulative_distribution
1Math900.25
2Math800.75
3Math800.75
4Math701.00
5Science850.50
6Science801.00
  • The cumulative distribution is calculated separately for each subject.

Example 3: Identifying Top Performers

To list students who are in the top 50% of scores:

SELECT student_id, score FROM ( SELECT student_id, score, CUME_DIST() OVER (ORDER BY score DESC) AS cumulative_distribution FROM scores ) ranked_scores WHERE cumulative_distribution <= 0.5;

Output:

student_idscore
190
  • Only the top 50% of scores are included based on the cumulative distribution.

Use Cases

  1. Percentile Analysis:

    • Analyze the rank of a row compared to the total rows in a dataset.
  2. Top/Bottom Analysis:

    • Identify rows within specific percentiles (e.g., top 10%, bottom 25%).
  3. Cumulative Rankings:

    • Evaluate the cumulative distribution for reporting or analytical purposes.

Conclusion

The CUME_DIST() function is a powerful tool for analyzing data distributions in MySQL. It is especially useful in percentile rankings and comparative analyses, helping users understand the relative position of rows within partitions or the entire result set. By using PARTITION BY and ORDER BY, you can customize its behavior for various scenarios.

Soeng Souy

Soeng Souy

Website that learns and reads, PHP, Framework Laravel, How to and download Admin template sample source code free.

Post a Comment

CAN FEEDBACK
close