Skip to main content
Question

How to remove the repeated words in FME

  • 4 September 2024
  • 5 replies
  • 77 views

Hi All,

Anyone could help me with this specific question  in FME? I want to remove the repeated words in each row, e.g. A,B,C,A,B in a row, the A, B have been repeated twice, I would like to remove the repeated A and B only. please see attached the  sample data. many thanks.

 

5 replies

Badge +41

One way to do this:

  • Create a unique ID for each row. (Counter)
  • Create a list for all PlayName elements. (AttributeSplitter)
  • Explode list to features (ListExploder)
  • Clean up unique values per ID (Sampler, group by ID).
  • Merge all rows back into one (Aggregator, group by ID, merge attributes)
  • Merge rows back to originals. (FeatureMerger, merge by ID)
Userlevel 5
Badge +36

Hi @chaoluo 

 

You can use this logic:

 

 

Userlevel 5
Badge +40

Use a list and some list manipulation transformers:

  • Use an AttributeSplitter to create a list from the PlayName elements
  • Use a ListDuplicateRemover to remove the duplicate elements from the list
  • Use a ListConcatenator to fill an attribute with the remaining elements from the list

 

Userlevel 4
Badge +12

@chaoluo  Similarly method would use identical to that proposed by @nielsgerrits , @danilo_fme  and @geomancer 

The only difference between the methods is whether to use ListExploder + DuplicateFilter after the AttributeSpliiter, or instead just use ListDuplicateRemover after the AttributeSplitter.

I’ve used both approaches, and it depends on how big the Lists become over how many Features.  ListExploder + DuplicateFilter , despite needing more Transformers, can execute faster, as ListDuplicateRemover can be slow to traverse all the List Attributes and it has an overhead in having to rename whatever List Attributes are left after the duplicate all to new List index numbers.

So small->medium number of features and not a lot duplicates, ListDuplicateRemover approach above works fine as general approach, but if it performs slowly can look to trial the ListExploder + DuplicateFilter method instead.

 

The only extra tip is to think about an extra Sorter before the Aggregator (@nielsgerrits method) or ListSorter before the ListConcantenator ( @danilo_fme , @geomancer ) to get the values alphabetically sorted, comma-delimited in the final output.  I do this a lot to reduce the amount of random ordering that flows into say a ChangeDetector where if I didn’t sort the list first before comma-delimiting it, ChangeDetector would keep flagging a record had “changed”, where the next run of the workspace slightly change order of values, but where otherwise the same text strings and would cause the write to database to have an excess number of updates needed only because of the sometimes randomness of the order in fields with Eg. comma-delimited values.

 

Badge +8

@chaoluo . This workspace should work

its converts to lowercase while checking if its combined in the source and also check the usual used delim. But it does not choose if you want to keep for example Jurassic or jurassic if booth are in the string. So thats needs to bee tweeked in the flow or change in the source. 
But this is something to start with. 
I have no idea what HPNT or NPNT is so this combination in this example is probably wrong: )



  • after 

     

Reply