Step 1: Create a Database Table
Let's start by creating a simple table named users with columns for id, name, email, and age. We'll assume that id is the primary key.
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
age INT
);
Step 2: Insert Sample Data
Now, let's insert some sample data into the users table.
INSERT INTO users (id, name, email, age) VALUES
(1, 'John Doe', 'john@example.com', 25),
(2, 'Jane Smith', 'jane@example.com', 30),
(3, 'Bob Johnson', 'bob@example.com', 22),
(4, 'Alice Brown', 'alice@example.com', 28),
(5, 'Charlie Davis', 'john@example.com', 25);--duplicate data
Step 3: Find Duplicate Rows
To identify duplicate rows, we can use a query that groups by the columns we want to check for duplicates and filters based on counts greater than 1.
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
Step 4: Delete Duplicate Rows
Now, let's delete the duplicate rows. In this example, we'll keep the row with the lowest id and remove the others.
DELETE u1
FROM users u1
JOIN users u2 ON u1.email = u2.email
AND u1.id > u2.id;
Step 5: Verify Deletion
After deleting the duplicate rows, let's verify that the duplicates are removed by querying the table.
SELECT * FROM users;
Finally:
Managing duplicate rows in a database is a crucial aspect of maintaining data quality. In this guide, we demonstrated how to create a table, insert sample data, identify duplicate rows, and delete duplicates using SQL queries. Always ensure to have a backup before performing deletion operations on a production database. Additionally, consider using unique constraints and primary keys to prevent duplicate entries in the first place.