Detecting Duplicate Rows
Administration Tips
Detecting Duplicate Rows In order to determine which rows duplicate each other in a table, you must first be prepared to state a definition of "duplicate". For example, the rows "1, FRED, DEPT_10" and '2,FRED,DEPT_20' sort of duplicate each other on the NAME column -though chances are, this is not a real duplication, but merely a case of having too many Freds working in different departments. But if we saw these two rows: '1,FRED, DEPT_10', and '1,FRED,DEPT_20', we begin to wonder if there is duplication, since two different physical Freds should not end up being allocated the same Employee Number. So, the first thing is to determine which COLUMNS in your table define uniqueness. In my case, it's Employee_Number, but I don't care about duplications on something like 'NAME'. Having done that, the quickest way to find duplicates is to stick a UNIQUE constraint on the relevant columns, and attempt to enable that constraint. First, the bare constraint definition: ALTER TABLE BLAH ADD
(CONSTRAINT
EMPID_UQ UNIQUE DISABLED NOVALIDATE);
That adds the constraint -but it's in its 'disabled' or "switched off" state (it has to be -there may already be duplicate records whose presence would have prevented the constraint being defined in the first place if it hadn't been). Now we create a special table to house the errors that arise if we attempt to enforce the constraint (i.e., switch it fully on). Oracle supplies a script to create that table (it's called utlexcpt.sql, and it's found in the ORACLE_HOME/rdbms/admin directory). But the script is easy enough to replicate yourself: CREATE TABLE EXCEPTIONS(ROW_ID ROWID, OWNER VARCHAR2(30), TABLE_NAME VARCHAR2(30), CONSTRAINT VARCHAR2(30));
You'll note that this table happens to be called "exceptions". It could be called anything we like (such as 'unique_errs'), but "exceptions" is Oracle's choice of name, so I guess we shall live with it. The next thing to do is try and enable the unique constraint, but specifying that any errors arising from doing so should be written into this new exceptions table we've just created.
Copyright Š Howard Rogers 2001
10/17/2001
Page 1 of 2
Detecting Duplicate Rows
Administration Tips
Try this: ALTER TABLE BLAH ENABLE CONSTRAINT EMPID_UQ EXCEPTIONS INTO EXCEPTIONS;
The enabling of the constraint failed, but the trick is that the exceptions table now contains details of what rows exist that are duplicates, and thus prevent the constraint being fully enabled. All we need to do now is select * from exceptions, and we shall be able to see the offending records: ROW_ID OWNER TABLE_NAME ------------------ ------------------------------ ----------------AAAH2NAABAAAPCMAAA SCOTT BLAH AAAH2NAABAAAPCMAAB SCOTT BLAH ...at which point, you'll realise that unless you are fluent in Base-64, you haven't got a hope in Hell of ever interpreting those ROWIDs into something you can actually work with! So now comes the clever bit: we use those ROWIDs as a hook by which to extract the data out of the main table, in a format we can recognise and work with. That involves a relatively simple bit of sub-select'ing, like this: SELECT
*
FROM BLAH WHERE ROWID IN
(SELECT
ROW_ID FROM EXCEPTIONS);
And now we get a report that looks like this: EMPNO ---------1 1
NAME ---------FRED FREDERIKA
...and it becomes obvious where the problem is. There's no automatic fix here. All we've done is use the EXCEPTIONS table to fish out a reference to the offending rows. How they stop being offenders is up to you: you need to do manual DML to knock the rows into shape. In this case, a simple SET EMPNO=2 WHERE NAME='FREDERIKA'; will do the trick -but in the real world, you need to work out the correct updates to perform (and investigate why the problem ever arose in the first place). When the issue has been resolved, you simply need to truncate the exceptions table (unless you feel you'll never need it again, in which case feel free to drop it. It's easy enough to re-create, after all).
Copyright Š Howard Rogers 2001
10/17/2001
Page 2 of 2