Solved

Statistics inside a python caller

6 years ago
November 13, 2018
9 replies
65 views

arthy
Contributor
101 replies

Hello,

I would like to read all attributes for some specific fields and then compute some satistics.

I know that it is possible with multiple statistics calculators and feature mergers.

How can I do this inside a python caller?

Thanks

Best answer by takashi

I sometimes implement a group-based processing applying this PythonCaller script frame work, in conjunction with a Sorter that sorts input features by Group By attribute.

# Assume all the features have been sorted by "_group_id" beforehand.
class FeatureProcessor(object):
    def __init__(self):
        self.groupId = None
        self.features = [] # list of features

    def input(self,feature):
        id = feature.getAttribute('_group_id')
        if id != self.groupId:
            self.process()
            self.groupId = id # Update current group ID.
            self.features = [] # Reset the list of features.
        self.features.append(feature)

    def close(self):
        self.process()

    def process(self):
        if not self.features:
            return
        #
        # TODO: Process features in a group and output result.
        #

If you need to perform really advanced statistics calculation, however, consider learning the R language and leveraging the RCaller.

View original

Did this help you find an answer to your question?

takashi
7715 replies
6 years ago
November 13, 2018

The StatisticsCalculator supports calculating statistics on multiple attributes at once. Try setting all the attributes to the Attributes to Analyze parameter in a single StatisticsCalculator.

+45

danilo_fme
Evangelist
2057 replies
6 years ago
November 13, 2018

Hi @arthy

If you look the custom transformer ListStatisticCalculator https://hub.safe.com/transformers/liststatisticscalculator this transformer has a transformer PythonCaller that use script python to work with statistics in list attributes.

You could check this scripot to have idea.

* @takashi thanks for this amazing and interesting custom transformer

Thanks,

Danilo

david_r
8355 replies
6 years ago
November 14, 2018

I agree with the others that it's easier to use e.g. the StatisticsCalculator if it can do what you need. However there are statistical operations that you may have to implement yourself, e.g. using Python.

My recommendation would be to use the class interface, for example here's how to calculate a running total of all the attribute values of my_number, then output the total as the_total_is in a separate feature at the end:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

I would also recommend reading the documentation for the PythonCaller, where much of this is explained in detail: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm

arthy
Author
Contributor
101 replies
6 years ago
November 14, 2018

david_r wrote:

class CalculateStatistics(object):
    def __init__(self):
        # This method is called once before first feature enters the PythonCaller
        self.running_total = 0

    def input(self, feature):
        # Called once for every feature
        self.running_total += int(feature.getAttribute('my_number') or 0)
        self.pyoutput(feature)

    def close(self):
        # This method is called once after the last feature has exited the PythonCaller
        sum_feature = fmeobjects.FMEFeature()
        sum_feature.setAttribute('the_total_is', self.running_total)
        self.pyoutput(sum_feature)

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

Let's have a look at that example where the total area of all the features processed is calculated and then a new attribute containing the total area is created. What if I want to calculate the total area of all features processed group by an attribute?

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

Cheers

david_r
8355 replies
6 years ago
November 14, 2018

arthy wrote:

Hello @david_r,

Thanks for your reply, I was able to implement a code code with the Class Interface similar to the one describe on the link link you sent above regarding the documentation for pythonCaller.

Now I would like to do something else.

import fmeobjects

class FeatureProcessor(object):
   def __init__(self):
      self.featureList = []
      self.totalArea = 0.0

def input(self,feature):
   self.featureList.append(feature)
   self.totalArea += feature.getGeometry().getArea()

def close(self):
   for feature in self.featureList:
      feature.setAttribute("total_area", self.totalArea)
      self.pyoutput(feature)

Cheers

So that's getting squarely out of FME territory and into pure Python, which isn't wrong in itself, but I suspect you'd be better off using an AreaCalculator + a StatisticsCalculator with a Group By in this instance.

Would that work for you?

arthy
Author
Contributor
101 replies
6 years ago
November 14, 2018

david_r wrote:

Would that work for you?

In fact, I want to perform some statistics that I can't really obtain with a simple statistics calculator. SOme of those statistics have to be computed with a group by and other not.

My situation has nothing to do with the area calculation. I just gave that example as an illustration of what I want to do at the end.

Yes, you are right. It is more python than FME but it is not out of FME territory.

david_r
8355 replies
6 years ago
November 14, 2018

Ok, I see your point. My preference would be to use a dict for grouping your features, where each dict key would contain the "group by" attribute value, and the dict value would contain a second dict containing the total group area and a list of the features.

If we had the groups A, B and C, the dict might look something like this (not actual code):

areas = {
 'A': {'total_area': 123, 'features': [<feature1>, <feature2>, ...]},
 'B': {'total_area': 56,  'features': [<feature3>]},
 'C': {'total_area': 789, 'features': [<feature4>, <feature5>, ...]}
}

When accessing the total area and features associated with group A, you can do:

empty_group = {'total_area': 0, 'features': []}
my_group = areas.get('A', empty_group)  # Get items for group A, default to empty dict
group_area = my_group['total_area']     # returns 123
group_features = my_group['features']   # Returns [<feature1>, <feature2>, ...]

The second line will either return the nested dict for group A, or an empty default dict if group A hasn't been defined yet.

To update group A, you can do:

new_area = group_area + feature.getGeometry().getArea()  # Calculate new total
group_features.append(feature)
areas['A'] = {'total_area': new_area, 'features': group_features}  # Update group A

One thing to be aware of is memory consumption: the way your code is currently constructed you're effectively buffering all the feaures in memory until the workspace terminates. This will become a problem after a certain (large-ish) number of features, all tings depending, but you can alleviate this by going with 64-bit FME. My recommendation would also be to use an AttributeKeeper just before your PythonCaller to keep the feature size down as much as possible by removing any unneeded attributes.

takashi
7715 replies
Best Answer
6 years ago
November 14, 2018

I sometimes implement a group-based processing applying this PythonCaller script frame work, in conjunction with a Sorter that sorts input features by Group By attribute.

# Assume all the features have been sorted by "_group_id" beforehand.
class FeatureProcessor(object):
    def __init__(self):
        self.groupId = None
        self.features = [] # list of features

    def input(self,feature):
        id = feature.getAttribute('_group_id')
        if id != self.groupId:
            self.process()
            self.groupId = id # Update current group ID.
            self.features = [] # Reset the list of features.
        self.features.append(feature)

    def close(self):
        self.process()

    def process(self):
        if not self.features:
            return
        #
        # TODO: Process features in a group and output result.
        #

If you need to perform really advanced statistics calculation, however, consider learning the R language and leveraging the RCaller.

+28

jdh
Contributor
1982 replies
6 years ago
November 14, 2018

@takashi's solution is the most memory efficient, but if you need to process features where you can't guarantee they are ordered by group, I like to use default dict.

The close method for loop iterates over each group, providing the attribute value of the group by and the list of features in that group.

import fmeobjects
from collections import defaultdict

class FeatureProcessor(object):

    def __init__(self):
        self.features = defaultdict(list)

    def input(self,feature):
       id = feature.getAttribute('_group_id')
       self.features[id].append(feature)

    def close(self):
       for id, featureList in self.features.items():
	 #do whatever

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Statistics inside a python caller

9 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Enhance FME Server on Linux so PythonCaller scripts can use GDAL / OGR and assoc

Handle Graphic of a Feature in PythonCaller (to add image to MS Word document)icon

how to assign NULL (or MISSING) values inside a python caller?icon

How to expose a large amount of attributes created inside a PythonCaller?icon

Custom Python module locations in FME Servericon

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings