Solved

How to extract distinct values?

  • 9 April 2024
  • 4 replies
  • 85 views

Badge +1

Hi

I need to extract the distinct attribute values of each field/column. An example is provided below:

 

Input

 

   Owner Building_Type Building_ Age

Row 1:  Steve Office 1970-79

Row 2: Adam Office 1980-89

Row 3: Steve House 2001-5

 

Output:

   Owner Building_Type Building_ Age

Row 1:  Steve Office 1970-79

Row 2: Adam House 1980-89

Row 3: NULL NULL   2001-5

 

Any help will be appreciated.

 

icon

Best answer by jkr_wrk 16 April 2024, 16:47

View original

4 replies

Userlevel 3
Badge +19

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port 

A StatisticsCalculator with group by and using the string with a Min, should find each unique value and has the benefit of allowing you to also add count statistic to know how many times that values is in your dataset

I’m sure there are many other ways, but read up on group based processing

Badge +1

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port 

A StatisticsCalculator with group by and using the string with a Min, should find each unique value and has the benefit of allowing you to also add count statistic to know how many times that values is in your dataset

I’m sure there are many other ways, but read up on group based processing

I have tried something similar but this does this per field whereas I want every field. I can do this for every field but I feel like that is quite a lot of transfromers. 

Userlevel 4
Badge +18

If you need all distinct values for all attributes. First do an attributeExploder.

 

Be aware that FME shows attributes that are hidden after the attributeExploder like: fme_geometry,fme_type,fme_feature_type. There is no way to remove them all before the attributeExploder so you should test (filter) them out after the exploder.

 

Use an Aggregator (attributesOnly) with groupBy attr_name,attr_value as mentioned before. Or…
Use a Samples with First 1 Features and a groupBy on attr_name,attr_value. This is a non-blocking transformer unlike the Aggregator. Or use a DuplicateRemover.

 

Badge +1

I have solved this using this answer for inspiration 

 

  1. Attribute Exploder
  2. Duplicate Remover on both _attr_name and _attr_value 
  3. Counter grouped by _attr_name - so that the each instance of a given attribute for a given field has a different number.
  4. AttributeCreator with @Value(_attr_name):@Value(_attr_value). This generates a field for each attribute for each field.
  5. Aggregator grouped by _count with merge incoming attributes. 
  6. Sorter (Optional)
  7. Writer (schema from reader)

Reply