How To Delete Duplicate Rows in MySQL

Duplicate rows in a table can cause data inconsistencies and affect the integrity of your database. Deleting duplicates in MySQL requires identifying rows with identical values in one or more columns and keeping only one unique record.

Steps to Delete Duplicate Rows

Step 1: Identify Duplicates

First, identify the duplicate rows using a GROUP BY query with an aggregate function like COUNT(*).

SELECT column1, column2, COUNT(*) AS count
FROM table_name
GROUP BY column1, column2
HAVING count > 1;

Replace column1, column2, etc., with the columns you want to check for duplicates.

Step 2: Delete Duplicates Using a Temporary Table

Create a temporary table to store unique rows.
Delete all data from the original table.
Reinsert the unique rows into the original table.

-- Step 1: Create a temporary table with unique rows
CREATE TABLE temp_table AS
SELECT DISTINCT column1, column2, column3
FROM table_name;

-- Step 2: Delete all rows from the original table
DELETE FROM table_name;

-- Step 3: Insert unique rows back into the original table
INSERT INTO table_name
SELECT * FROM temp_table;

-- Step 4: Drop the temporary table
DROP TABLE temp_table;

Step 3: Delete Duplicates Using `ROW_NUMBER()` (MySQL 8.0 and later)

You can use the ROW_NUMBER() function to assign a unique row number to each duplicate group and delete rows where the row number is greater than 1.

WITH cte AS (
    SELECT id, column1, column2,
           ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
    FROM table_name
)
DELETE FROM table_name
WHERE id IN (
    SELECT id
    FROM cte
    WHERE row_num > 1
);

Replace id with the primary key or a unique column in your table.
The PARTITION BY clause groups rows with identical values in column1, column2, etc.
The ROW_NUMBER() function assigns a sequential number to each row in the group.

Step 4: Delete Duplicates Without Using a Temporary Table or CTE

If your table has a unique identifier, you can delete duplicates by targeting rows with the greatest or least value of the identifier.

DELETE t1
FROM table_name t1
INNER JOIN table_name t2
ON t1.column1 = t2.column1
   AND t1.column2 = t2.column2
   AND t1.id > t2.id;

This query compares the table to itself (t1 and t2 are aliases) to find duplicates.
The condition t1.id > t2.id ensures that only the row with the greater id is deleted, leaving the smallest id intact.

Examples

Example Table: `products`

id	product_name	category	price
1	Laptop	Electronics	800
2	Laptop	Electronics	800
3	Phone	Electronics	500
4	Phone	Electronics	500
5	Tablet	Electronics	300

Deleting Duplicate Rows Using `ROW_NUMBER()`

WITH cte AS (
    SELECT id, product_name, category, price,
           ROW_NUMBER() OVER (PARTITION BY product_name, category, price ORDER BY id) AS row_num
    FROM products
)
DELETE FROM products
WHERE id IN (
    SELECT id
    FROM cte
    WHERE row_num > 1
);

Result:

id	product_name	category	price
1	Laptop	Electronics	800
3	Phone	Electronics	500
5	Tablet	Electronics	300

Deleting Duplicate Rows Without `ROW_NUMBER()`

DELETE t1
FROM products t1
INNER JOIN products t2
ON t1.product_name = t2.product_name
   AND t1.category = t2.category
   AND t1.price = t2.price
   AND t1.id > t2.id;

Result:

id	product_name	category	price
1	Laptop	Electronics	800
3	Phone	Electronics	500
5	Tablet	Electronics	300

Best Practices

Backup Data: Always create a backup of your table before performing delete operations.
Test Query: Run a SELECT query first to ensure the rows you intend to delete are correct.

Use Transactions: Wrap your delete operations in a transaction to allow rollback if something goes wrong.

START TRANSACTION;

DELETE t1
FROM table_name t1
INNER JOIN table_name t2
ON t1.column1 = t2.column1
   AND t1.id > t2.id;

COMMIT;

Conclusion

Deleting duplicate rows in MySQL can be achieved using various methods, depending on the version of MySQL and the structure of your data. For MySQL 8.0 and later, the ROW_NUMBER() function is highly effective. For older versions, self-joins or temporary tables provide a reliable alternative. Always ensure data safety by testing and backing up your database before proceeding.

Souy Soeng

How To Delete Duplicate Rows in MySQL