Skip to main content
Question

Quick alternative to StatisticsCalculator?


ld
Participant
Forum|alt.badge.img+1
  • Participant

I'm calling 6 separate StatisticsCalculator transformers, and they each take a (relatively) long time to run. All I'm calculating each time is a min, max and count attribute so this seems like overkill.

Is there an easy alternative I'm missing? Is there a way to use a math function against the entire dataset for example?

15 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • May 16, 2018

Have a look at the InlineQuerier.

A SQL statement might be easier and quicker.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 16, 2018

Depending on your data, it might also be quicker to sort on your attribute and sample the first and/or last feature for the min/max.

The quickest way I've found to get a total count attribute is to use the Aggregator (with List) with a count attribute, followed by a deaggregator to restore the original features.


david_r
Evangelist
  • May 17, 2018

If your data source is a SQL database, I think it would be hard to beat the performance of a SQLExecutor with something like:

select 
  max(my_value) as my_max, 
  min(my_value) as my_min, 
  sum(my_value) as my_sum 
from 
  my_table
where
  ...


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 17, 2018
david_r wrote:

If your data source is a SQL database, I think it would be hard to beat the performance of a SQLExecutor with something like:

select 
  max(my_value) as my_max, 
  min(my_value) as my_min, 
  sum(my_value) as my_sum 
from 
  my_table
where
  ...

Agreed.  However if the source data in not a database, and is sufficiently large that the StatsCalculator is taking a long time to run, I wonder about the overhead of generating the SQLite database in the InlineQuerier.

 

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 17, 2018
jdh wrote:

Depending on your data, it might also be quicker to sort on your attribute and sample the first and/or last feature for the min/max.

The quickest way I've found to get a total count attribute is to use the Aggregator (with List) with a count attribute, followed by a deaggregator to restore the original features.

I changed my mind, the quickest way to get a total feature count for medium sized datasets is via python.  (10% more efficient than aggregator/deaggregator w/ 10000 features)

 

 

class FeatureProcessor(object):
    def __init__(self):
        self.Total = 0
        self.Features=[]
    def input(self,feature):
        self.Total+=1
        self.Features.append(feature)
        
    def close(self):
        for feature in self.Features:
            feature.setAttribute("TotalCount", self.Total)
            self.pyoutput(feature)
I haven't done any benchmarking for large or complex datasets (lots of attributes, heavy geometry)

fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • May 18, 2018
jdh wrote:
Agreed. However if the source data in not a database, and is sufficiently large that the StatsCalculator is taking a long time to run, I wonder about the overhead of generating the SQLite database in the InlineQuerier.

 

 

I suspect that if you have a large amount of data, the InlineQuerier overhead may not be too bad compared to doing 6 stats calcs one after the other. WHen you set up the InlineQuerier, just be sure to only define the columns you will want to do the select against to speed things even more.

 

 


takashi
Supporter
  • May 18, 2018

Hi @ld, before thinking of an alternative, I'd like to know why you need to use 6 StatisticsCalculators. If you intend to perform statistics calculation simultaneously on 6 attributes in the same feature type, just a single StatisticsCalculator would be enough.


ld
Participant
Forum|alt.badge.img+1
  • Author
  • Participant
  • May 18, 2018
takashi wrote:

Hi @ld, before thinking of an alternative, I'd like to know why you need to use 6 StatisticsCalculators. If you intend to perform statistics calculation simultaneously on 6 attributes in the same feature type, just a single StatisticsCalculator would be enough.

Hi, they're all operating on separate datasets at different stages of the workbench. Believe me I would put them all through the same transformer if that were an option!

 

 


ld
Participant
Forum|alt.badge.img+1
  • Author
  • Participant
  • May 18, 2018
david_r wrote:

If your data source is a SQL database, I think it would be hard to beat the performance of a SQLExecutor with something like:

select 
  max(my_value) as my_max, 
  min(my_value) as my_min, 
  sum(my_value) as my_sum 
from 
  my_table
where
  ...

Hi, the input data is a .dwg and an XML.

 

 


ld
Participant
Forum|alt.badge.img+1
  • Author
  • Participant
  • May 18, 2018
jdh wrote:
I changed my mind, the quickest way to get a total feature count for medium sized datasets is via python.  (10% more efficient than aggregator/deaggregator w/ 10000 features)

 

 

class FeatureProcessor(object):
    def __init__(self):
        self.Total = 0
        self.Features=[]
    def input(self,feature):
        self.Total+=1
        self.Features.append(feature)
        
    def close(self):
        for feature in self.Features:
            feature.setAttribute("TotalCount", self.Total)
            self.pyoutput(feature)
I haven't done any benchmarking for large or complex datasets (lots of attributes, heavy geometry)

 

Thank you, why didn't I think of Python? I will try this.

 


takashi
Supporter
  • May 18, 2018
takashi wrote:

Hi @ld, before thinking of an alternative, I'd like to know why you need to use 6 StatisticsCalculators. If you intend to perform statistics calculation simultaneously on 6 attributes in the same feature type, just a single StatisticsCalculator would be enough.

I got it. As other experts suggested, the InlineQuerior or the PythonCaller would be a good solution.

 

I provide another way. If you are using FME 2018.0, you can use the Sampler (sampling the last 1 feature) effectively to do that. This custom transformer implementation might be useful.

 

quick-statisticscalculator.fmw (FME 2018.0.0.3)

 

 

 


takashi
Supporter
  • May 18, 2018
takashi wrote:

Hi @ld, before thinking of an alternative, I'd like to know why you need to use 6 StatisticsCalculators. If you intend to perform statistics calculation simultaneously on 6 attributes in the same feature type, just a single StatisticsCalculator would be enough.

Updated to add the source attribute name as prefix to the resulting attribute names.

 

quick-statisticscalculator-2.fmw (FME 2018.0.0.3)

 

 


ld
Participant
Forum|alt.badge.img+1
  • Author
  • Participant
  • May 18, 2018
takashi wrote:
I got it. As other experts suggested, the InlineQuerior or the PythonCaller would be a good solution.

 

I provide another way. If you are using FME 2018.0, you can use the Sampler (sampling the last 1 feature) effectively to do that. This custom transformer implementation might be useful.

 

quick-statisticscalculator.fmw (FME 2018.0.0.3)

 

 

 

Thanks @takashi, I'm stuck with 2017.1 for a little while longer but will look at this when our organisation updates to 2018.

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 18, 2018
ld wrote:

 

Thank you, why didn't I think of Python? I will try this.

 

It should be fairly simple to adapt this to also check max min values for any number of attributes.

 

 

The above code assumes that you want the values added to every feature, if you just want a summary feature instead, then you don't need to build the feature list, and can just output a single feature in the close method. This will be a lot more efficient, since all the features do not need to be stored in memory.

 

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 18, 2018
takashi wrote:
I got it. As other experts suggested, the InlineQuerior or the PythonCaller would be a good solution.

 

I provide another way. If you are using FME 2018.0, you can use the Sampler (sampling the last 1 feature) effectively to do that. This custom transformer implementation might be useful.

 

quick-statisticscalculator.fmw (FME 2018.0.0.3)

 

 

 

That's a nice, efficient way of getting a summary feature.

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings