Skip to main content
Solved

Statistics inside a python caller

  • November 13, 2018
  • 9 replies
  • 84 views

arthy
Contributor
Forum|alt.badge.img+8
  • Contributor
  • 101 replies

Hello,

 

I would like to read all attributes for some specific fields and then compute some satistics.

 

I know that it is possible with multiple statistics calculators and feature mergers.

How can I do this inside a python caller?

Thanks

Best answer by takashi

I sometimes implement a group-based processing applying this PythonCaller script frame work, in conjunction with a Sorter that sorts input features by Group By attribute.

# Assume all the features have been sorted by "_group_id" beforehand.
class FeatureProcessor(object):
    def __init__(self):
        self.groupId = None
        self.features = [] # list of features

    def input(self,feature):
        id = feature.getAttribute('_group_id')
        if id != self.groupId:
            self.process()
            self.groupId = id # Update current group ID.
            self.features = [] # Reset the list of features.
        self.features.append(feature)

    def close(self):
        self.process()

    def process(self):
        if not self.features:
            return
        #
        # TODO: Process features in a group and output result.
        #

If you need to perform really advanced statistics calculation, however, consider learning the R language and leveraging the RCaller.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

9 replies

takashi
Celebrity
  • 7843 replies
  • November 13, 2018

The StatisticsCalculator supports calculating statistics on multiple attributes at once. Try setting all the attributes to the Attributes to Analyze parameter in a single StatisticsCalculator.


danilo_fme
Celebrity
Forum|alt.badge.img+51
  • Celebrity
  • 2077 replies
  • November 13, 2018

Hi @arthy

If you look the custom transformer ListStatisticCalculator https://hub.safe.com/transformers/liststatisticscalculator this transformer has a transformer PythonCaller that use script python to work with statistics in list attributes.

You could check this scripot to have idea.

 

* @takashi thanks for this amazing and interesting custom transformer

 

Thanks,

Danilo


david_r
Celebrity
  • 8394 replies
  • November 14, 2018

I agree with the others that it's easier to use e.g. the StatisticsCalculator if it can do what you need. However there are statistical operations that you may have to implement yourself, e.g. using Python. 

My recommendation would be to use the class interface, for example here's how to calculate a running total of all the attribute values of my_number, then output the total as the_total_is in a separate feature at the end:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

I would also recommend reading the documentation for the PythonCaller, where much of this is explained in detail: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm


arthy
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • 101 replies
  • November 14, 2018

I agree with the others that it's easier to use e.g. the StatisticsCalculator if it can do what you need. However there are statistical operations that you may have to implement yourself, e.g. using Python. 

My recommendation would be to use the class interface, for example here's how to calculate a running total of all the attribute values of my_number, then output the total as the_total_is in a separate feature at the end:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

I would also recommend reading the documentation for the PythonCaller, where much of this is explained in detail: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

Let's have a look at that example where the total area of all the features processed is calculated and then a new attribute containing the total area is created. What if I want to calculate the total area of all features processed group by an attribute?

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

 

Cheers


david_r
Celebrity
  • 8394 replies
  • November 14, 2018

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

Let's have a look at that example where the total area of all the features processed is calculated and then a new attribute containing the total area is created. What if I want to calculate the total area of all features processed group by an attribute?

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

 

Cheers

So that's getting squarely out of FME territory and into pure Python, which isn't wrong in itself, but I suspect you'd be better off using an AreaCalculator + a StatisticsCalculator with a Group By in this instance.

Would that work for you?


arthy
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • 101 replies
  • November 14, 2018

So that's getting squarely out of FME territory and into pure Python, which isn't wrong in itself, but I suspect you'd be better off using an AreaCalculator + a StatisticsCalculator with a Group By in this instance.

Would that work for you?

In fact, I want to perform some statistics that I can't really obtain with a simple statistics calculator. SOme of those statistics have to be computed with a group by and other not.

My situation has nothing to do with the area calculation. I just gave that example as an illustration of what I want to do at the end.

Yes, you are right. It is more python than FME but it is not out of FME territory.


david_r
Celebrity
  • 8394 replies
  • November 14, 2018

Ok, I see your point. My preference would be to use a dict for grouping your features, where each dict key would contain the "group by" attribute value, and the dict value would contain a second dict containing the total group area and a list of the features.

If we had the groups A, B and C, the dict might look something like this (not actual code):

areas = {
 'A': {'total_area': 123, 'features': [<feature1>, <feature2>, ...]},
 'B': {'total_area': 56,  'features': [<feature3>]},
 'C': {'total_area': 789, 'features': [<feature4>, <feature5>, ...]}
}

When accessing the total area and features associated with group A, you can do:

empty_group = {'total_area': 0, 'features': []}
my_group = areas.get('A', empty_group)  # Get items for group A, default to empty dict
group_area = my_group['total_area']     # returns 123
group_features = my_group['features']   # Returns [<feature1>, <feature2>, ...]

The second line will either return the nested dict for group A, or an empty default dict if group A hasn't been defined yet.

To update group A, you can do:

new_area = group_area + feature.getGeometry().getArea()  # Calculate new total
group_features.append(feature)
areas['A'] = {'total_area': new_area, 'features': group_features}  # Update group A

One thing to be aware of is memory consumption: the way your code is currently constructed you're effectively buffering all the feaures in memory until the workspace terminates. This will become a problem after a certain (large-ish) number of features, all tings depending, but you can alleviate this by going with 64-bit FME. My recommendation would also be to use an AttributeKeeper just before your PythonCaller to keep the feature size down as much as possible by removing any unneeded attributes.


takashi
Celebrity
  • 7843 replies
  • Best Answer
  • November 14, 2018

I sometimes implement a group-based processing applying this PythonCaller script frame work, in conjunction with a Sorter that sorts input features by Group By attribute.

# Assume all the features have been sorted by "_group_id" beforehand.
class FeatureProcessor(object):
    def __init__(self):
        self.groupId = None
        self.features = [] # list of features

    def input(self,feature):
        id = feature.getAttribute('_group_id')
        if id != self.groupId:
            self.process()
            self.groupId = id # Update current group ID.
            self.features = [] # Reset the list of features.
        self.features.append(feature)

    def close(self):
        self.process()

    def process(self):
        if not self.features:
            return
        #
        # TODO: Process features in a group and output result.
        #

If you need to perform really advanced statistics calculation, however, consider learning the R language and leveraging the RCaller.


jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • November 14, 2018

@takashi's solution is the most memory efficient, but if you need to process features where you can't guarantee they are ordered by group, I like to use default dict.

 

 

The close method for loop iterates over each group, providing the attribute value of the group by and the list of features in that group.
import fmeobjects
from collections import defaultdict

class FeatureProcessor(object):

    def __init__(self):
        self.features = defaultdict(list)

    def input(self,feature):
       id = feature.getAttribute('_group_id')
       self.features[id].append(feature)

    def close(self):
       for id, featureList in self.features.items():
 #do whatever