Skip to main content
Question

how to find duplicate of specific column ?


soly
Contributor
Forum|alt.badge.img+3
  • Contributor

Hello ,

 

i would like to find duplicate of specific column 

As example

Column A contain 

256789 - 4567

4567 - 256789

256789 - 4567

i would like to find all duplicate values at column A ,consider  the values as duplicated in case  a-b or b-a 

i mean , 3 above values consider are matched .

the target to ignore order of each cell 

How could I achieve it ?

thanks in advance 

FME 2021

 

 

3 replies

danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • April 20, 2024

Hello @soly 

 

Yu can use the transformer DuplicateFilter.

 

 

Thanks in Advance,

Danilo


liamfez
Influencer
Forum|alt.badge.img+34
  • Influencer
  • April 20, 2024

These are the steps I used (FME 2021)

  • AttributeSplitter on Column A using the hyphen as the delimiter (trimming whitespace)
  • ListHistogrammer with the source being the list created by the AttributeSplitter
  • 2 ListConcatenators, concatenating both the histogram count and value
  • Matcher checking both the concatenated count and value attributes

This also works with more than 2 values in the cell.

 


liamfez
Influencer
Forum|alt.badge.img+34
  • Influencer
  • April 20, 2024

An alternative method would be to use a ListSorter, instead of the ListHistogrammer, and follow that up with one ListConcatenator before going to the Matcher.

Both methods worked when testing with your sample data, however results varied once I made the values more complex. You could test out both methods with your actual dataset to determine which is preferred.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings