Skip to main content

I'm working on removing duplicates in a list, but need to filter this based on multiple fields.

 

I can remove duplicates with only one field easily, but am having trouble where it relies on multiple fields.

 

 

Scenario:

 

Where a Business has the same name/address duplicates need to be removed.

 

However if a Business has the same name, but a different address, I want it kept in the list.

 

 

Example:

 

 

Name Address Action

 

McDonalds 123 Brown Street - Keep

 

McDonalds 123 Brown Street - Remove

 

McDonalds 999 South Road - Keep

 

Burger King 999 South Road - Keep

 

 

 

Where I'm at now all I can output is:

 

McDonalds 123 Brown Street

 

Burger King 999 Sourh Road

 

 

My actual list has about 15,000 records.

 

 

Any help would be much appreciated.

 

Hi @kieranodonnell,

I created a Workspace example and remove duplicate using the transformer ListDuplicateRemover.

The result in FME Data Inspector:

Thanks,

Danilo


Can you clarify what you mean by list? From the example it looks like you may have 15,000 records rather than a list?

If they are just records, which you need to exclude duplicates from a duplicate filter will handle multiple fields, just select name and address as your key attributes. The records you want to keep will exit the unique port.


Hi @kieranodonnell, I would recommend using the Matcher. (Detecting Matched Features with the Matcher)

If the business name and address are stored in two different attributes, you can set these two attributes as the "Selected Attributes" to match on. If they are separate, you can always use an AttributeSplitter or a string transformer to parse out the business name from the address.


Another method would be to use the CRCCalculator to create a unique number representing the values of multiple attributes. Then you can do a simple DuplicateRemover on the number generated by it. It might be quicker that way than using the Matcher (though if it's a one-off task it wouldn't matter much).


Hi @kieranodonnell, I would recommend using the Matcher. (Detecting Matched Features with the Matcher)

If the business name and address are stored in two different attributes, you can set these two attributes as the "Selected Attributes" to match on. If they are separate, you can always use an AttributeSplitter or a string transformer to parse out the business name from the address.

 

Thanks @TiaAtSafe that worked perfectly. I thought this would work but was having issues initially. Turns out my data had white spaces trailing on the Names and Address which I had to take care of first.

Reply