How to Remove Duplicates in SQL

Use duplicate-removal patterns when multiple rows represent the same logical record and only one should remain. The safest workflow is identify duplicates first, preview what will be deleted, then remove extras deterministically.

What you’ll build or solve

You’ll learn how to remove duplicates in SQL by keeping one row per duplicate group and deleting the rest. You’ll also know how to preview duplicates safely before deletion.

When this approach works best

This approach is the right choice when duplicate rows are accidental and one canonical row should stay.

Common real-world scenarios include:

  • Repeated email signups
  • Duplicate imported products
  • Multiple identical log rows
  • CRM merge cleanup
  • Migration overlap records

This is a bad idea when repeated rows are legitimate events, such as multiple orders from the same customer.

Prerequisites

You only need:

  • A SQL table that contains duplicates
  • A stable unique key such as id
  • Basic SELECT, GROUP BY, and DELETE knowledge

Step-by-step instructions

Step 1: Preview duplicate groups first

Start by identifying duplicate groups.

SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

This shows which values repeat.

Next, preview the rows you plan to keep versus remove.

SELECT *
FROM users
WHERE email = 'alex@example.com'
ORDER BY created_at;

A safe delete pattern is keeping the lowest id per duplicate group.

DELETE FROM users
WHERE id NOT IN (
  SELECT MIN(id)
  FROM users
  GROUP BY email
);

This keeps one row per email and removes the rest.

What to look for:

  • Always preview duplicates first
  • Keep one deterministic row
  • MIN(id) is a safe keep rule
  • Great for import cleanup
  • Add unique constraints afterward to prevent repeats

Examples you can copy

Remove duplicate emails

DELETE FROM users
WHERE id NOT IN (
  SELECT MIN(id)
  FROM users
  GROUP BY email
);

Duplicate SKUs

DELETE FROM products
WHERE id NOT IN (
  SELECT MIN(id)
  FROM products
  GROUP BY sku
);

Duplicate session tokens

DELETE FROM sessions
WHERE id NOT IN (
  SELECT MIN(id)
  FROM sessions
  GROUP BY token
);

Common mistakes and how to fix them

Mistake 1: Deleting before previewing

What the reader might do:

DELETE FROM users ...

Why it breaks: a wrong duplicate key can remove valid distinct rows.

Corrected approach:

Run the GROUP BY ... HAVING preview first.

Mistake 2: Keeping arbitrary rows without a rule

What the reader might do:

DELETE FROM users
WHERE email IN (...)

Why it breaks: the canonical record may be deleted.

Corrected approach:

Use MIN(id), MAX(created_at), or another clear keep rule.

Mistake 3: Not preventing future duplicates

What the reader might do:

Clean duplicates once and stop there.

Why it breaks: duplicates can reappear after the next import.

Corrected approach:

Add a unique index or constraint after cleanup.

Troubleshooting

If too many rows are deleted, verify the duplicate grouping column.

If the wrong row remains, change the keep rule.

If duplicates return, add a unique constraint.

If performance is slow, index the duplicate key column.

Quick recap

  • Find duplicates with GROUP BY ... HAVING
  • Preview rows before deleting
  • Keep one deterministic row
  • Delete the rest safely
  • Add unique constraints afterward