Skip to main content
Solved

How to extract distinct values?

  • April 9, 2024
  • 8 replies
  • 1491 views

me.aelmo
Contributor
Forum|alt.badge.img+1

Hi

I need to extract the distinct attribute values of each field/column. An example is provided below:

 

Input

 

   Owner Building_Type Building_ Age

Row 1:  Steve Office 1970-79

Row 2: Adam Office 1980-89

Row 3: Steve House 2001-5

 

Output:

   Owner Building_Type Building_ Age

Row 1:  Steve Office 1970-79

Row 2: Adam House 1980-89

Row 3: NULL NULL   2001-5

 

Any help will be appreciated.

 

Best answer by jkr_wrk

If you need all distinct values for all attributes. First do an attributeExploder.

 

Be aware that FME shows attributes that are hidden after the attributeExploder like: fme_geometry,fme_type,fme_feature_type. There is no way to remove them all before the attributeExploder so you should test (filter) them out after the exploder.

 

Use an Aggregator (attributesOnly) with groupBy attr_name,attr_value as mentioned before. Or…
Use a Samples with First 1 Features and a groupBy on attr_name,attr_value. This is a non-blocking transformer unlike the Aggregator. Or use a DuplicateRemover.

 

View original
Did this help you find an answer to your question?

8 replies

j.botterill
Influencer
Forum|alt.badge.img+34
  • Influencer
  • April 10, 2024

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port 

A StatisticsCalculator with group by and using the string with a Min, should find each unique value and has the benefit of allowing you to also add count statistic to know how many times that values is in your dataset

I’m sure there are many other ways, but read up on group based processing


me.aelmo
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • April 15, 2024
j.botterill wrote:

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port 

A StatisticsCalculator with group by and using the string with a Min, should find each unique value and has the benefit of allowing you to also add count statistic to know how many times that values is in your dataset

I’m sure there are many other ways, but read up on group based processing

I have tried something similar but this does this per field whereas I want every field. I can do this for every field but I feel like that is quite a lot of transfromers. 


jkr_wrk
Influencer
Forum|alt.badge.img+28
  • Best Answer
  • April 16, 2024

If you need all distinct values for all attributes. First do an attributeExploder.

 

Be aware that FME shows attributes that are hidden after the attributeExploder like: fme_geometry,fme_type,fme_feature_type. There is no way to remove them all before the attributeExploder so you should test (filter) them out after the exploder.

 

Use an Aggregator (attributesOnly) with groupBy attr_name,attr_value as mentioned before. Or…
Use a Samples with First 1 Features and a groupBy on attr_name,attr_value. This is a non-blocking transformer unlike the Aggregator. Or use a DuplicateRemover.

 


me.aelmo
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • April 17, 2024

I have solved this using this answer for inspiration 

 

  1. Attribute Exploder
  2. Duplicate Remover on both _attr_name and _attr_value 
  3. Counter grouped by _attr_name - so that the each instance of a given attribute for a given field has a different number.
  4. AttributeCreator with @Value(_attr_name):@Value(_attr_value). This generates a field for each attribute for each field.
  5. Aggregator grouped by _count with merge incoming attributes. 
  6. Sorter (Optional)
  7. Writer (schema from reader)

j.m-visma
Contributor
Forum|alt.badge.img+4
  • Contributor
  • January 14, 2025

 

@jkr_wrk You can also exclude unexposed attributes like fme_geometry in the AttributeExploder.

This exploder ignores the usually unexposed fme_ attributes

Since the AttributeExploder can be really slow I try to minimize the number of attributes whenever I use it. The Ignore Attributes Containing can really cut that down. You can use several searching strings separated by a |.


jkr_wrk
Influencer
Forum|alt.badge.img+28
  • January 14, 2025

@j.m-visma 

Nice new feature but not available to me:

 

 

Btw. I hope this works:

^fme_|^multi_

I would like to be sure the attribute starts with fme and it's not somewhere in the middle.

 


j.m-visma
Contributor
Forum|alt.badge.img+4
  • Contributor
  • January 14, 2025

@jkr_wrk Thank you, I just tested what adding ^does and it works for me. With the ^, the ignore function only selects attributes whose names begin with the searched values. That will make my workspaces a little more reliable. 

 

How did you know to try and do that? Does that mean the ignore logic uses regex?

 

BTW I don't know when this functionality was introduced but it was available in the 2023 editions I used as well.


jkr_wrk
Influencer
Forum|alt.badge.img+28
  • January 15, 2025

It was a wild guess that it would be regex based.

From the documentation:

 

Enter a regular expression, and matching attributes will be ignored.

For example, if the source data is CSV, you could use the regular expression ^fme_|^multi_|^csv_ to ignore any attributes starting with fme_, multi_, or csv_.

 

So for both of us… Read the #@ manual. 😉


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings