Solved

Statistics inside a python caller

  • 13 November 2018
  • 9 replies
  • 9 views

Badge +1

Hello,

 

I would like to read all attributes for some specific fields and then compute some satistics.

 

I know that it is possible with multiple statistics calculators and feature mergers.

How can I do this inside a python caller?

Thanks

icon

Best answer by takashi 14 November 2018, 16:17

View original

9 replies

Userlevel 2
Badge +17

The StatisticsCalculator supports calculating statistics on multiple attributes at once. Try setting all the attributes to the Attributes to Analyze parameter in a single StatisticsCalculator.

Userlevel 4
Badge +30

Hi @arthy

If you look the custom transformer ListStatisticCalculator https://hub.safe.com/transformers/liststatisticscalculator this transformer has a transformer PythonCaller that use script python to work with statistics in list attributes.

You could check this scripot to have idea.

 

* @takashi thanks for this amazing and interesting custom transformer

 

Thanks,

Danilo

Userlevel 4

I agree with the others that it's easier to use e.g. the StatisticsCalculator if it can do what you need. However there are statistical operations that you may have to implement yourself, e.g. using Python. 

My recommendation would be to use the class interface, for example here's how to calculate a running total of all the attribute values of my_number, then output the total as the_total_is in a separate feature at the end:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

I would also recommend reading the documentation for the PythonCaller, where much of this is explained in detail: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm

Badge +1

I agree with the others that it's easier to use e.g. the StatisticsCalculator if it can do what you need. However there are statistical operations that you may have to implement yourself, e.g. using Python. 

My recommendation would be to use the class interface, for example here's how to calculate a running total of all the attribute values of my_number, then output the total as the_total_is in a separate feature at the end:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

I would also recommend reading the documentation for the PythonCaller, where much of this is explained in detail: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

Let's have a look at that example where the total area of all the features processed is calculated and then a new attribute containing the total area is created. What if I want to calculate the total area of all features processed group by an attribute?

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

 

Cheers

Userlevel 4

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

Let's have a look at that example where the total area of all the features processed is calculated and then a new attribute containing the total area is created. What if I want to calculate the total area of all features processed group by an attribute?

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

 

Cheers

So that's getting squarely out of FME territory and into pure Python, which isn't wrong in itself, but I suspect you'd be better off using an AreaCalculator + a StatisticsCalculator with a Group By in this instance.

Would that work for you?

Badge +1

So that's getting squarely out of FME territory and into pure Python, which isn't wrong in itself, but I suspect you'd be better off using an AreaCalculator + a StatisticsCalculator with a Group By in this instance.

Would that work for you?

In fact, I want to perform some statistics that I can't really obtain with a simple statistics calculator. SOme of those statistics have to be computed with a group by and other not.

My situation has nothing to do with the area calculation. I just gave that example as an illustration of what I want to do at the end.

Yes, you are right. It is more python than FME but it is not out of FME territory.

Userlevel 4

Ok, I see your point. My preference would be to use a dict for grouping your features, where each dict key would contain the "group by" attribute value, and the dict value would contain a second dict containing the total group area and a list of the features.

If we had the groups A, B and C, the dict might look something like this (not actual code):

areas = {
 'A': {'total_area': 123, 'features': [<feature1>, <feature2>, ...]},
 'B': {'total_area': 56,  'features': [<feature3>]},
 'C': {'total_area': 789, 'features': [<feature4>, <feature5>, ...]}
}

When accessing the total area and features associated with group A, you can do:

empty_group = {'total_area': 0, 'features': []}
my_group = areas.get('A', empty_group)  # Get items for group A, default to empty dict
group_area = my_group['total_area']     # returns 123
group_features = my_group['features']   # Returns [<feature1>, <feature2>, ...]

The second line will either return the nested dict for group A, or an empty default dict if group A hasn't been defined yet.

To update group A, you can do:

new_area = group_area + feature.getGeometry().getArea()  # Calculate new total
group_features.append(feature)
areas['A'] = {'total_area': new_area, 'features': group_features}  # Update group A

One thing to be aware of is memory consumption: the way your code is currently constructed you're effectively buffering all the feaures in memory until the workspace terminates. This will become a problem after a certain (large-ish) number of features, all tings depending, but you can alleviate this by going with 64-bit FME. My recommendation would also be to use an AttributeKeeper just before your PythonCaller to keep the feature size down as much as possible by removing any unneeded attributes.

Userlevel 2
Badge +17

I sometimes implement a group-based processing applying this PythonCaller script frame work, in conjunction with a Sorter that sorts input features by Group By attribute.

# Assume all the features have been sorted by "_group_id" beforehand.
class FeatureProcessor(object):
    def __init__(self):
        self.groupId = None
        self.features = [] # list of features

    def input(self,feature):
        id = feature.getAttribute('_group_id')
        if id != self.groupId:
            self.process()
            self.groupId = id # Update current group ID.
            self.features = [] # Reset the list of features.
        self.features.append(feature)

    def close(self):
        self.process()

    def process(self):
        if not self.features:
            return
        #
        # TODO: Process features in a group and output result.
        #

If you need to perform really advanced statistics calculation, however, consider learning the R language and leveraging the RCaller.

Badge +22

@takashi's solution is the most memory efficient, but if you need to process features where you can't guarantee they are ordered by group, I like to use default dict.

 

 

The close method for loop iterates over each group, providing the attribute value of the group by and the list of features in that group.
import fmeobjects
from collections import defaultdict

class FeatureProcessor(object):

    def __init__(self):
        self.features = defaultdict(list)

    def input(self,feature):
       id = feature.getAttribute('_group_id')
       self.features[id].append(feature)

    def close(self):
       for id, featureList in self.features.items():
 #do whatever

Reply