Question

how to find duplicate of specific column ?

  • 20 April 2024
  • 3 replies
  • 36 views

Userlevel 1
Badge +3

Hello ,

 

i would like to find duplicate of specific column 

As example

Column A contain 

256789 - 4567

4567 - 256789

256789 - 4567

i would like to find all duplicate values at column A ,consider  the values as duplicated in case  a-b or b-a 

i mean , 3 above values consider are matched .

the target to ignore order of each cell 

How could I achieve it ?

thanks in advance 

FME 2021

 

 


3 replies

Userlevel 4
Badge +30

Hello @soly 

 

Yu can use the transformer DuplicateFilter.

 

 

Thanks in Advance,

Danilo

Userlevel 4
Badge +13

These are the steps I used (FME 2021)

  • AttributeSplitter on Column A using the hyphen as the delimiter (trimming whitespace)
  • ListHistogrammer with the source being the list created by the AttributeSplitter
  • 2 ListConcatenators, concatenating both the histogram count and value
  • Matcher checking both the concatenated count and value attributes

This also works with more than 2 values in the cell.

 

Userlevel 4
Badge +13

An alternative method would be to use a ListSorter, instead of the ListHistogrammer, and follow that up with one ListConcatenator before going to the Matcher.

Both methods worked when testing with your sample data, however results varied once I made the values more complex. You could test out both methods with your actual dataset to determine which is preferred.

Reply