Truncate the table and write all the non duplicates in again? Depends on what percentage of the table are duplicates
The fastest is almost certainly to use regular SQL in a SQLExecutor. Here's an example using in SQL Server:
WITH cte AS (
SELECT
contact_id,
first_name,
last_name,
email,
ROW_NUMBER() OVER (
PARTITION BY
first_name,
last_name,
email
ORDER BY
first_name,
last_name,
email
) row_num
FROM
sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;
Assumes you have a table "sales.contacts" and you want to remove all duplicates of the combination of first_name + last_name + email. It only needs to be executed once. If you have the right indexes this should be very fast.
hi @david_r I am facing the same issue of duplicate records in the database table. Some records are being inserted multiple times in the database. I want to keep only one record and remove the duplicates, preserving the original or unique record in the table. I haven't used SQL Executor before and was wondering if the SQL query mentioned above would work for my situation as well.Thanks!
Hi @fme_can_do_it , that should work for Microsoft SQL Server. If you’re using anything else it might need to be adapted to that particular SQL flavor.
You can check the syntax by using this non-destructive query (i.e. nothing gets deleted, only selected):
WITH cte AS (
SELECT
contact_id, -- modify as needed
first_name, -- modify as needed
last_name, -- modify as needed
email, -- modify as needed
ROW_NUMBER() OVER (
PARTITION BY
first_name, -- modify as needed
last_name, -- modify as needed
email -- modify as needed
ORDER BY
first_name, -- modify as needed
last_name, -- modify as needed
email -- modify as needed
) row_num
FROM
sales.contacts -- modify as needed
)
SELECT * FROM cte
WHERE row_num > 1;
Of course you’ll have to modify/add/remove the columns and also modify your table name, I’ve flagged those with the comment “--modify as needed”
If you’re happy with the selection of records, you can delete the returned records by simply modifying the second to last line into DELETE FROM cte
as in the example above. If you’re unsure, I recommend making a backup of the table first.
Of course, you could also do the same thing very easily with only FME and no SQL, e.g.
Reader → DuplicateFilter (Port: Duplicate) → DatabaseDeleter
@david_r i used Duplicate filter in my workflow and routed all duplicate features to the writer. In the configuration of this writer, I specified the delete operation based on the objectID column, which serves as a unique identifier. It worked! thank you!