Skip to main content

I'm almost too embarrassed to ask this question, because it seems like something that could be done with one or two statistic calculators, but I just can't find the right solution. Anyway, here is what I want to do:

 

 

I have a table with attribute 1 and attribute 2. I want to calculate the count of unique values of attribute 1 having the same attribute 2. Something like this:

With statistics calculator I can count the number of all belonging to attribute 2, even all with the same attribute 1 but not only the distinct ones.

I would very much appreciate any help with this very simple problem. Thank you!

You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.


I believe the Matcher transformer will work for you. Geometry will be 'none', Attributes to match will be your attribute1 and attribute2, and set the 'Match count attribute' as result.

EDIT: To expand on my answer with @erik_jan mentioning of FeatureMerger, here is the workflow I would use.


If you are into SQL you could also use the InlineQuerier with a query like:

Select att1, att2, count(distinct att1) as result

From <feature type> -- fill in the name

Group by att2;


I believe the Matcher transformer will work for you. Geometry will be 'none', Attributes to match will be your attribute1 and attribute2, and set the 'Match count attribute' as result.

EDIT: To expand on my answer with @erik_jan mentioning of FeatureMerger, here is the workflow I would use.

I think that will not work as the unique combinations like B-3 and C-3 don't count as a match.

 

 


You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

It happens I just found a similar solution to yours using a ListBuilder, ListDuplicateRemover and a ListElementCounter. However it would be great if I could avoid the FeatureMerger as it slows things down quite a bit.

 

 


Hi @kasparlov, thanks for your question!

 

In line with @erik_jan's suggestion, I used an InlineQuerier and FeatureMerger to produce your desired result. It should be scalable with more data. The InlineQuerier counts the distinct results or attribute1 against attribute2. We need to also retain our original attribute2 values so we can merge our SQL results with our original in a FeatureMerger. Hope this helps!

sqlcountdistinct.fmw


If you are into SQL you could also use the InlineQuerier with a query like:

Select att1, att2, count(distinct att1) as result

From <feature type> -- fill in the name

Group by att2;

I just started learning SQL. Could be a good way to practice 🙂 Thank you all for your great suggestions.

 

 


I think that will not work as the unique combinations like B-3 and C-3 don't count as a match.

 

 

@kasparlov Ahh, I miss-read your intentions. You are correct, this won't work.

 


You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

List operation could be the bottle neck on the performance in this case, rather than the FeatureMerger. As @erik_jan suggested, you can use the StatisticsCalculator in conjunction with a DuplicateFilter, instead of the list operation. Other than StatisticsCalculator, the Aggregator or the Mathcer can also be used. How about comparing their performance?

 


You could use the StatisticsCalculator group by attribute 2 as you mentioned. But first use the DuplicateFilter on both attributes. After the StatisticsCalculater you can merge the caculated result on the original records by using the FeatureMerger.

I went with the DuplicateFilter, StatisticsCalculator, FeatureMerger combination. Took about 5 seconds for 40 000 records.

 

 


Reply