Solved

How to extract distinct values?

1 year ago
April 9, 2024
8 replies
2023 views

me.aelmo
Contributor
3 replies

I need to extract the distinct attribute values of each field/column. An example is provided below:

Input

Owner Building_Type Building_ Age

Row 1: Steve Office 1970-79

Row 2: Adam Office 1980-89

Row 3: Steve House 2001-5

Output:

Owner Building_Type Building_ Age

Row 1: Steve Office 1970-79

Row 2: Adam House 1980-89

Row 3: NULL NULL 2001-5

Any help will be appreciated.

Best answer by jkr_wrk

If you need all distinct values for all attributes. First do an attributeExploder.

Be aware that FME shows attributes that are hidden after the attributeExploder like: fme_geometry,fme_type,fme_feature_type. There is no way to remove them all before the attributeExploder so you should test (filter) them out after the exploder.

Use an Aggregator (attributesOnly) with groupBy attr_name,attr_value as mentioned before. Or…
Use a Samples with First 1 Features and a groupBy on attr_name,attr_value. This is a non-blocking transformer unlike the Aggregator. Or use a DuplicateRemover.

View original

Did this help you find an answer to your question?

+40

j.botterill
Influencer
326 replies
1 year ago
April 10, 2024

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port

A StatisticsCalculator with group by and using the string with a Min, should find each unique value and has the benefit of allowing you to also add count statistic to know how many times that values is in your dataset

I’m sure there are many other ways, but read up on group based processing

me.aelmo
Author
Contributor
3 replies
1 year ago
April 15, 2024

j.botterill wrote:

There are likely a number of ways to achieve this unique value

Aggregator, apecifying the column of interest as a group by

Add an AtrributeFilter and run with the feature cache turned on. This use the cache to populate the filter ports to sort features into each output port

I’m sure there are many other ways, but read up on group based processing

I have tried something similar but this does this per field whereas I want every field. I can do this for every field but I feel like that is quite a lot of transfromers.

+29

jkr_wrk
381 replies
Best Answer
1 year ago
April 16, 2024

If you need all distinct values for all attributes. First do an attributeExploder.

💡 Did you know... The FeatureWriter is more intuitive. 😏

me.aelmo
Author
Contributor
3 replies
1 year ago
April 17, 2024

I have solved this using this answer for inspiration

Keeping unique values from multiple Excel columns

Attribute Exploder
Duplicate Remover on both _attr_name and _attr_value
Counter grouped by _attr_name - so that the each instance of a given attribute for a given field has a different number.
AttributeCreator with @Value(_attr_name):@Value(_attr_value). This generates a field for each attribute for each field.
Aggregator grouped by _count with merge incoming attributes.
Sorter (Optional)
Writer (schema from reader)

j.m-visma
Contributor
12 replies
5 months ago
January 14, 2025

@jkr_wrk You can also exclude unexposed attributes like fme_geometry in the AttributeExploder.

This exploder ignores the usually unexposed fme_ attributes

Since the AttributeExploder can be really slow I try to minimize the number of attributes whenever I use it. The Ignore Attributes Containing can really cut that down. You can use several searching strings separated by a |.

+29

jkr_wrk
381 replies
5 months ago
January 14, 2025

@j.m-visma

Nice new feature but not available to me:

Btw. I hope this works:

^fme_|^multi_

I would like to be sure the attribute starts with fme and it's not somewhere in the middle.

💡 Did you know... The FeatureWriter is more intuitive. 😏

j.m-visma
Contributor
12 replies
5 months ago
January 14, 2025

@jkr_wrk Thank you, I just tested what adding ^does and it works for me. With the ^, the ignore function only selects attributes whose names begin with the searched values. That will make my workspaces a little more reliable.

How did you know to try and do that? Does that mean the ignore logic uses regex?

BTW I don't know when this functionality was introduced but it was available in the 2023 editions I used as well.

+29

jkr_wrk
381 replies
5 months ago
January 15, 2025

It was a wild guess that it would be regex based.

From the documentation:

Enter a regular expression, and matching attributes will be ignored.
For example, if the source data is CSV, you could use the regular expression ^fme_|^multi_|^csv_ to ignore any attributes starting with fme_, multi_, or csv_.

So for both of us… Read the #@ manual. 😉

💡 Did you know... The FeatureWriter is more intuitive. 😏

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

How to extract distinct values?