Question

PointOnAreaOverlay and TestFilter



Show first post

34 replies

Userlevel 2
Badge +17
@takashi

 

This works perfectly. Thanks.

 

 

However the result has uncovered another issue, as some supermarkets and schools are being counted multiple times, as there are multiple point features denoting same supermarket or school for example. Each has a unique ID, but have a name field denoting the name of the supermarket.

 

I would need to count the below as 1, rather than 3

 

 

e.g.

 

 

ID Category Name Description

 

1 Supermarket Costco Costco Tyre Department

 

2 Supermarket Costco Costco Fuel

 

3 Supermarket Costco Costco Warehouse
Have a look at the Matcher (if both name and geometry match) or the DuplicateFilter (if name matches but geometry doesn't).
Badge +6

@takashI

I don't have an attribute which denote supermarket etc but instead need to apply query like classification code in('a','b',...) to get all supermarket point features.

I'll look into using the JSON/List options

Thank you

@takashi

 

Matcher seems to work for me. Thank you

 

I now have to introduce an additional attribute to count. I have a excel spreadsheet, and have used a FeatureMerger to bring in the new class. I therefore end up with 2 lists in PointOnAreaOverlay, 2 ListHistogrammer etc.. Is there a way I could merge these 2 into 1 JSON document ?

 

 

fme-counts2.png

 

Userlevel 2
Badge +17
@takashi

 

Matcher seems to work for me. Thank you

 

I now have to introduce an additional attribute to count. I have a excel spreadsheet, and have used a FeatureMerger to bring in the new class. I therefore end up with 2 lists in PointOnAreaOverlay, 2 ListHistogrammer etc.. Is there a way I could merge these 2 into 1 JSON document ?

 

 

fme-counts2.png

 

If the input polygon features has unique ID attribute, you can merge the features from the two JSONFlatteners by a FeatureMerger using the ID attribute as the join key.

 

Badge +6

@takashI

I don't have an attribute which denote supermarket etc but instead need to apply query like classification code in('a','b',...) to get all supermarket point features.

I'll look into using the JSON/List options

Thank you

@takashi

 

All working now. Thank you.

 

Performance is still an issue though, in particular reading 5 million point features.

 

The data in a sde geodatabase.

 

I have exposed only the attributes required, and ignored subtypes and domains.

 

At times, it seems to hang and takes hours to read the data. I also get message Optimizing memory usage.

 

Any suggestion how this can be improved? Would SQL creator perform better, although I may need to create the geometry using the coordinates fields.
Userlevel 2
Badge +17

@takashI

I don't have an attribute which denote supermarket etc but instead need to apply query like classification code in('a','b',...) to get all supermarket point features.

I'll look into using the JSON/List options

Thank you

It's hard to discuss about performance issue based on the partial information on your workspace, since several factors could affect the performance.

 

Saying about only the PointOnAreaOverlayer, if you could set "Yes" to the "Area First" parameter, better performance could be expected in general. Note: It has to be guaranteed that all the polygons have been entered the PointOnAreaOverlayer before starting to enter the points, if you would use the "Are First" option.

 

 

Badge +6

@takashI

I don't have an attribute which denote supermarket etc but instead need to apply query like classification code in('a','b',...) to get all supermarket point features.

I'll look into using the JSON/List options

Thank you

@takashi

 

I appreciate it is difficult to discuss performance. I have already set the option Area First to True. However the time to read the 5 million point features takes a long time itself. I was thinking if there is an alternative way to read the point data instead of using Geodatabase SDE.

 

Userlevel 2
Badge +17

If the performance is really critical, it might be better to reconsider entire workflow to leverage Python scripting effectively.

Assuming that the polygon features have unique ID attribute called "_polygon_id" and each point feature has point type ("supermarket", "sport center" etc.) as an attribute called "_point_class", this workflow is possible, for example.

# PythonCaller Script Example
import fmeobjects
class FeatureProcessor(object):
    def __init__(self):
        # {Polygon ID : {Point Class : Count}}
        self.histograms = {}
        
    def input(self, feature):
        polygonId = feature.getAttribute('_polygon_id')
        pointClass = feature.getAttribute('_point_class')
        if polygonId in self.histograms:
            n = self.histograms[polygonId].get(pointClass, 0)
            self.histograms[polygonId][pointClass] = n + 1
        else:
            self.histograms[polygonId] = {pointClass : 1}
        
    def close(self):
        for polygonId, histogram in self.histograms.items():
            feature = fmeobjects.FMEFeature()
            feature.setAttribute('_polygon_id', polygonId)
            for pointClass, count in histogram.items():
                feature.setAttribute(pointClass, count)
            self.pyoutput(feature)

0684Q00000ArLHPQA3.png

Badge +6

If the performance is really critical, it might be better to reconsider entire workflow to leverage Python scripting effectively.

Assuming that the polygon features have unique ID attribute called "_polygon_id" and each point feature has point type ("supermarket", "sport center" etc.) as an attribute called "_point_class", this workflow is possible, for example.

# PythonCaller Script Example
import fmeobjects
class FeatureProcessor(object):
    def __init__(self):
        # {Polygon ID : {Point Class : Count}}
        self.histograms = {}
        
    def input(self, feature):
        polygonId = feature.getAttribute('_polygon_id')
        pointClass = feature.getAttribute('_point_class')
        if polygonId in self.histograms:
            n = self.histograms[polygonId].get(pointClass, 0)
            self.histograms[polygonId][pointClass] = n + 1
        else:
            self.histograms[polygonId] = {pointClass : 1}
        
    def close(self):
        for polygonId, histogram in self.histograms.items():
            feature = fmeobjects.FMEFeature()
            feature.setAttribute('_polygon_id', polygonId)
            for pointClass, count in histogram.items():
                feature.setAttribute(pointClass, count)
            self.pyoutput(feature)

0684Q00000ArLHPQA3.png

@takashi

 

Hi, Thanks for this suggestion. In your screenshot, you have Polygon as Read First. Should I be configuring anything on the reader or is it just on the Clipper.

 

Also because I have multiple point classes, I guess I will need to handle this in the python script.

 

 

Userlevel 2
Badge +17

If the performance is really critical, it might be better to reconsider entire workflow to leverage Python scripting effectively.

Assuming that the polygon features have unique ID attribute called "_polygon_id" and each point feature has point type ("supermarket", "sport center" etc.) as an attribute called "_point_class", this workflow is possible, for example.

# PythonCaller Script Example
import fmeobjects
class FeatureProcessor(object):
    def __init__(self):
        # {Polygon ID : {Point Class : Count}}
        self.histograms = {}
        
    def input(self, feature):
        polygonId = feature.getAttribute('_polygon_id')
        pointClass = feature.getAttribute('_point_class')
        if polygonId in self.histograms:
            n = self.histograms[polygonId].get(pointClass, 0)
            self.histograms[polygonId][pointClass] = n + 1
        else:
            self.histograms[polygonId] = {pointClass : 1}
        
    def close(self):
        for polygonId, histogram in self.histograms.items():
            feature = fmeobjects.FMEFeature()
            feature.setAttribute('_polygon_id', polygonId)
            for pointClass, count in histogram.items():
                feature.setAttribute(pointClass, count)
            self.pyoutput(feature)

0684Q00000ArLHPQA3.png

I intended to just indicate that the polygons should be entered into the Clipper first, in order to use the "Clippers First" option.

 

Reply