Skip to main content
Question

how to find duplicate of specific column ?

  • April 20, 2024
  • 3 replies
  • 207 views

soly
Contributor
Forum|alt.badge.img+3

Hello ,

 

i would like to find duplicate of specific column 

As example

Column A contain 

256789 - 4567

4567 - 256789

256789 - 4567

i would like to find all duplicate values at column A ,consider  the values as duplicated in case  a-b or b-a 

i mean , 3 above values consider are matched .

the target to ignore order of each cell 

How could I achieve it ?

thanks in advance 

FME 2021

 

 

3 replies

danilo_fme
Celebrity
Forum|alt.badge.img+52
  • Celebrity
  • April 20, 2024

Hello @soly 

 

Yu can use the transformer DuplicateFilter.

 

 

Thanks in Advance,

Danilo


liamfez
Influencer
Forum|alt.badge.img+44
  • Influencer
  • April 20, 2024

These are the steps I used (FME 2021)

  • AttributeSplitter on Column A using the hyphen as the delimiter (trimming whitespace)
  • ListHistogrammer with the source being the list created by the AttributeSplitter
  • 2 ListConcatenators, concatenating both the histogram count and value
  • Matcher checking both the concatenated count and value attributes

This also works with more than 2 values in the cell.

 


liamfez
Influencer
Forum|alt.badge.img+44
  • Influencer
  • April 20, 2024

An alternative method would be to use a ListSorter, instead of the ListHistogrammer, and follow that up with one ListConcatenator before going to the Matcher.

Both methods worked when testing with your sample data, however results varied once I made the values more complex. You could test out both methods with your actual dataset to determine which is preferred.