deleting duplicate rows from a table

December 5th, 2006 by Rajender Singh · 5 Comments

One of the old way in which I normally do this by using GROUP BY and HAVING keywords.

Example:

DELETE FROM EMP
WHERE ROWID NOT IN (SELECT RWID FROM ( SELECT COUNT(1), MAX(ROWID) RWID FROM EMP
GROUP BY empno
HAVING COUNT(1) > 1 ))
AND empno IN (SELECT empno FROM ( SELECT empno, COUNT(1) FROM EMP
GROUP BY empno
HAVING COUNT(1) > 1 ))
/

Thanks Sumit for pointing out the mistake.

Other way which we can perform same function is by using Oracle’s analytic functions ROW_NUMBER() as follows
DELETE FROM emp
WHERE ROWID IN (SELECT ROWID
FROM ( SELECT ROWID, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY EMPNO) rn
FROM emp2)
WHERE rn > 1)
/

This function almost act like a ROWNUM pseudocolumn with only addition feature that it resets back to one for each partition that we define in out query (ROW_NUMBER() OVER (PARTITION BY empno ORDER BY EMPNO))

Bookmark this post

Hide

Tags: Interesting Coding Showcase · SQL and PL/SQL

5 responses so far ↓

1 sumitiitg // Dec 5, 2006 at 3:21 pm

The first way to delete duplicates from a table as mentioned here doesn’t seem to be correct. This way we will only be able to delete the row with the MAX rowid and not all duplicates. Consider this example table

Rowid EmpNo
——- ——–
12344 1
12345 2
12346 1
12347 1

Now the innermost query will give

Count Max (Rowid)
—— ————-
3 12347

This rowid will be the one which will be returned to the outermost query and thus
the last tuple will be the only one to be deleted. Please correct me if I am missing something.

Thanks,
Sumit
2 rajs // Dec 5, 2006 at 3:43 pm

Hi Sumit,

You are right it should be

DELETE FROM EMP
WHERE ROWID NOT IN (SELECT RWID FROM ( SELECT COUNT(1), MAX(ROWID) RWID FROM EMP
GROUP BY empno
HAVING COUNT(1) > 1 ))
AND empno IN (SELECT empno FROM ( SELECT empno, COUNT(1) FROM EMP
GROUP BY empno
HAVING COUNT(1) > 1 ))
/

But space for improvement is there, I am not fully satisfied with above DML.

Thanks
3 Mumtaz Ahmad // Dec 12, 2006 at 5:46 pm

We can use this sql command to remove duplicate rows with out using COUNT function. ROWID gives better performance than COUNT function.

DELETE FROM
WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM
GROUP BY duplicate_fields_names
4 Rajender Singh // Dec 12, 2006 at 6:15 pm

Hi Mumtaz,

With your query it will delete the rows which are not even duplicate!

Thanks
5 // Jul 29, 2008 at 7:18 pm

In a table named “Hotel”,there is a column named “Room” which contains rows RoomNo1,RoomNo2,RoomNo3,RoomNo2,RoomNo4.
What will be the command to delete the “RoomNo2″ from 2nd row?