Skip to main content
Solved

How do I count distinct values of an attribute with identical values in another attribute?


pflegpet
Contributor
Forum|alt.badge.img+8

I'm almost too embarrassed to ask this question, because it seems like something that could be done with one or two statistic calculators, but I just can't find the right solution. Anyway, here is what I want to do:

 

 

I have a table with attribute 1 and attribute 2. I want to calculate the count of unique values of attribute 1 having the same attribute 2. Something like this:

With statistics calculator I can count the number of all belonging to attribute 2, even all with the same attribute 1 but not only the distinct ones.

I would very much appreciate any help with this very simple problem. Thank you!

Best answer by erik_jan

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

View original
Did this help you find an answer to your question?

10 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • Best Answer
  • October 4, 2017

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.


dustin
Influencer
Forum|alt.badge.img+30
  • Influencer
  • October 4, 2017

I believe the Matcher transformer will work for you. Geometry will be 'none', Attributes to match will be your attribute1 and attribute2, and set the 'Match count attribute' as result.

EDIT: To expand on my answer with @erik_jan mentioning of FeatureMerger, here is the workflow I would use.


erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • October 4, 2017

If you are into SQL you could also use the InlineQuerier with a query like:

Select att1, att2, count(distinct att1) as result

From <feature type> -- fill in the name

Group by att2;


pflegpet
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 4, 2017
dustin wrote:

I believe the Matcher transformer will work for you. Geometry will be 'none', Attributes to match will be your attribute1 and attribute2, and set the 'Match count attribute' as result.

EDIT: To expand on my answer with @erik_jan mentioning of FeatureMerger, here is the workflow I would use.

I think that will not work as the unique combinations like B-3 and C-3 don't count as a match.

 

 


pflegpet
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 4, 2017
erik_jan wrote:

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

It happens I just found a similar solution to yours using a ListBuilder, ListDuplicateRemover and a ListElementCounter. However it would be great if I could avoid the FeatureMerger as it slows things down quite a bit.

 

 


nathanatsafe
Safer
Forum|alt.badge.img+7

Hi @kasparlov, thanks for your question!

 

In line with @erik_jan's suggestion, I used an InlineQuerier and FeatureMerger to produce your desired result. It should be scalable with more data. The InlineQuerier counts the distinct results or attribute1 against attribute2. We need to also retain our original attribute2 values so we can merge our SQL results with our original in a FeatureMerger. Hope this helps!

sqlcountdistinct.fmw


pflegpet
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 4, 2017
erik_jan wrote:

If you are into SQL you could also use the InlineQuerier with a query like:

Select att1, att2, count(distinct att1) as result

From <feature type> -- fill in the name

Group by att2;

I just started learning SQL. Could be a good way to practice :) Thank you all for your great suggestions.

 

 


dustin
Influencer
Forum|alt.badge.img+30
  • Influencer
  • October 4, 2017
pflegpet wrote:
I think that will not work as the unique combinations like B-3 and C-3 don't count as a match.

 

 

@kasparlov Ahh, I miss-read your intentions. You are correct, this won't work.

 


takashi
Influencer
  • October 4, 2017
erik_jan wrote:

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

List operation could be the bottle neck on the performance in this case, rather than the FeatureMerger. As @erik_jan suggested, you can use the StatisticsCalculator in conjunction with a DuplicateFilter, instead of the list operation. Other than StatisticsCalculator, the Aggregator or the Mathcer can also be used. How about comparing their performance?

 


pflegpet
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 5, 2017
erik_jan wrote:

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

I went with the DuplicateFilter, StatisticsCalculator, FeatureMerger combination. Took about 5 seconds for 40 000 records.

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings