Solved

Remove duplicates from database table (keeping all unique record)

3 years ago
11 January 2021
6 replies
69 views

joeri
Contributor
3 replies

I have a database table in which a number of the records are duplicated. I'd like to remove the duplicates but like to keep 1 record (so similar to the duplicateFilter unique port).

I have no unique identifier in the table so if I match a record (based on a combination of attributes) I'll remove all of them from the table, I need to add 1 of them again afterwards.

Currently I use the following approach:

duplicateFilter to find the duplicates
remove the records from the table using fme_db_operation = DELETE (this re
re-insert the unduplicated record

Any suggestions to improve this methodology?

icon

Best answer by david_r 11 January 2021, 17:38

View original

6 replies

Userlevel 1

+21

ebygomm
Contributor
3079 replies
3 years ago
11 January 2021

Truncate the table and write all the non duplicates in again? Depends on what percentage of the table are duplicates

Userlevel 4

david_r
8177 replies
3 years ago
11 January 2021
Best Answer

The fastest is almost certainly to use regular SQL in a SQLExecutor. Here's an example using in SQL Server:

WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;

Assumes you have a table "sales.contacts" and you want to remove all duplicates of the combination of first_name + last_name + email. It only needs to be executed once. If you have the right indexes this should be very fast.

fme_can_do_it
Supporter
41 replies
17 days ago
9 April 2024

hi @david_r I am facing the same issue of duplicate records in the database table. Some records are being inserted multiple times in the database. I want to keep only one record and remove the duplicates, preserving the original or unique record in the table. I haven't used SQL Executor before and was wondering if the SQL query mentioned above would work for my situation as well.Thanks!

Userlevel 4

Hi @fme_can_do_it , that should work for Microsoft SQL Server. If you’re using anything else it might need to be adapted to that particular SQL flavor.

You can check the syntax by using this non-destructive query (i.e. nothing gets deleted, only selected):

WITH cte AS (
    SELECT 
        contact_id, -- modify as needed
        first_name, -- modify as needed 
        last_name,  -- modify as needed
        email,  -- modify as needed
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name,  -- modify as needed
                last_name,  -- modify as needed
                email -- modify as needed
            ORDER BY 
                first_name,  -- modify as needed
                last_name,  -- modify as needed
                email -- modify as needed
        ) row_num
     FROM 
        sales.contacts -- modify as needed
)
SELECT * FROM cte
WHERE row_num > 1;

Of course you’ll have to modify/add/remove the columns and also modify your table name, I’ve flagged those with the comment “--modify as needed”

If you’re happy with the selection of records, you can delete the returned records by simply modifying the second to last line into DELETE FROM cte as in the example above. If you’re unsure, I recommend making a backup of the table first.

Userlevel 4

Of course, you could also do the same thing very easily with only FME and no SQL, e.g.

Reader → DuplicateFilter (Port: Duplicate) → DatabaseDeleter

@david_r i used Duplicate filter in my workflow and routed all duplicate features to the writer. In the configuration of this writer, I specified the delete operation based on the objectID column, which serves as a unique identifier. It worked! thank you!

Remove duplicates from database table (keeping all unique record)

6 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded