Solved

Remove duplicates from database table (keeping all unique record)

4 years ago
January 11, 2021
6 replies
534 views

joeri
Contributor
3 replies

I have a database table in which a number of the records are duplicated. I'd like to remove the duplicates but like to keep 1 record (so similar to the duplicateFilter unique port).

I have no unique identifier in the table so if I match a record (based on a combination of attributes) I'll remove all of them from the table, I need to add 1 of them again afterwards.

Currently I use the following approach:

duplicateFilter to find the duplicates
remove the records from the table using fme_db_operation = DELETE (this re
re-insert the unduplicated record

Any suggestions to improve this methodology?

Best answer by david_r

The fastest is almost certainly to use regular SQL in a SQLExecutor. Here's an example using in SQL Server:

WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;

Assumes you have a table "sales.contacts" and you want to remove all duplicates of the combination of first_name + last_name + email. It only needs to be executed once. If you have the right indexes this should be very fast.

View original

Did this help you find an answer to your question?

+39

ebygomm
Influencer
3313 replies
4 years ago
January 11, 2021

Truncate the table and write all the non duplicates in again? Depends on what percentage of the table are duplicates

david_r
8355 replies
Best Answer
4 years ago
January 11, 2021

The fastest is almost certainly to use regular SQL in a SQLExecutor. Here's an example using in SQL Server:

WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;

+11

fme_can_do_it
Enthusiast
65 replies
1 year ago
April 9, 2024

hi @david_r I am facing the same issue of duplicate records in the database table. Some records are being inserted multiple times in the database. I want to keep only one record and remove the duplicates, preserving the original or unique record in the table. I haven't used SQL Executor before and was wondering if the SQL query mentioned above would work for my situation as well.Thanks!

david_r
8355 replies
1 year ago
April 9, 2024

Hi @fme_can_do_it , that should work for Microsoft SQL Server. If you’re using anything else it might need to be adapted to that particular SQL flavor.

You can check the syntax by using this non-destructive query (i.e. nothing gets deleted, only selected):

WITH cte AS (
    SELECT 
        contact_id, -- modify as needed
        first_name, -- modify as needed 
        last_name,  -- modify as needed
        email,  -- modify as needed
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name,  -- modify as needed
                last_name,  -- modify as needed
                email -- modify as needed
            ORDER BY 
                first_name,  -- modify as needed
                last_name,  -- modify as needed
                email -- modify as needed
        ) row_num
     FROM 
        sales.contacts -- modify as needed
)
SELECT * FROM cte
WHERE row_num > 1;

Of course you’ll have to modify/add/remove the columns and also modify your table name, I’ve flagged those with the comment “--modify as needed”

If you’re happy with the selection of records, you can delete the returned records by simply modifying the second to last line into DELETE FROM cte as in the example above. If you’re unsure, I recommend making a backup of the table first.

david_r
8355 replies
1 year ago
April 9, 2024

Of course, you could also do the same thing very easily with only FME and no SQL, e.g.

Reader → DuplicateFilter (Port: Duplicate) → DatabaseDeleter

+11

fme_can_do_it
Enthusiast
65 replies
1 year ago
April 9, 2024

@david_r i used Duplicate filter in my workflow and routed all duplicate features to the writer. In the configuration of this writer, I specified the delete operation based on the objectID column, which serves as a unique identifier. It worked! thank you!

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Remove duplicates from database table (keeping all unique record)

6 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

How to read SDE layer in python startup script?icon

Overwriting featurres File Geodatabes and locksicon

OGCGEOPACKAGE writer: Unable to write feature, aborting write processicon

Tester filter not workingicon

FME Flow - remove queued jobs from fmeserver database table fme_jobs?icon

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings