Skip to main content
Hi,

 

 

I was wondering if there is an efficient way of pivoting many attributes and statistically calculating data into shapefile polygons at the same time. (I have 2.5 million records).

 

 

I actually have 10 attributes to pivot (with many different values to pivot), but in the following example I have shown two.

 

 

For example:

 

 

Hypothetical input data1.            

 

Category ¦ Priority ¦ x ¦ y 

 

A ¦ Red ¦ 10000 ¦ 53453

 

B ¦ Green ¦ 14646 ¦ 53897

 

C ¦ Amber ¦ 34545 ¦ 34634

 

A ¦ Green ¦ 34575 ¦ 67388

 

...¦..........¦...........¦...........¦

 

 

Input data 2.   - Shape polygons

 

District¦

 

AreaA¦

 

AreaB¦

 

 

Output data - shape polygons

 

Area¦A¦B¦C¦Red¦Amber¦Green

 

AreaA¦254¦164¦170¦584¦363

 

AreaB¦634¦636¦563¦255¦346
Output data required is more like the following - shape polygons

 

The numbers are a count of the occurances per district,

 

District¦A¦B¦C¦Red¦Amber¦Green

 

AreaA¦254¦164¦170¦584¦363

 

AreaB¦634¦636¦563¦255¦346
where does the district relate/link to the data? (shape to tabular data link)
Hi,

 

 

I would use a StatisticsCalculator and a simple Tcl script.

 

After migrating the input data 1 to the data 2:

 

 

1) Send the distric polygons to a StatisticsCalculator to create histograms.

 

Group By: District

 

Attributes to Analyze: Category, Priority

 

Histogram List Attribute: histogram

 

 

The StatisticsCalculator creates histograms for each attributes, like:

 

Category.histogram{n}.value: stores a Category values (A, B, or C)

 

Category.histogram{n}.count: stores the number of the value occurences.

 

Priority.histogram{n}.value: stores a Priority values (Red, Amber, or Green)

 

Priority.histogram{n}.count: stores the number of the value occurences.

 

 

2) Connect a TclCaller to the Summary port of the StatisticsCalculator. This script example will create new attributes whose names are the original attribute values; their values will be the number of occurences.

 

-----

 

proc pivot {} {

 

    set attributes {"Category" "Priority"}

 

    foreach {attr} $attributes {

 

        for {set i 0} {1} {incr i} {

 

            set newAttr ÂFME_GetAttribute "$attr.histogram{$i}.value"]

 

            if .string equal $newAttr {}] {break}

 

            FME_SetAttribute $newAttr ÂFME_GetAttribute "$attr.histogram{$i}.count"]

 

        }

 

    }

 

}

 

-----

 

In this method, you can add other attributes easily without adding any other transformers. That is, just add required attribute names to the "Attributes to Analyze" parameter of the StatisticsCalculator and the "attributes" list defined at the first line of the script.

 

 

After the TclCaller, you can expose the new attribute names (A, B, C, ... Red, Amber, Green, ...) with an AttributeExposer, if necessary.

 

 

Takashi
In addition, the Tcl script example assumes that there is no empty value in the original attributes. If there could be empty attributes, the script should be improved.

 

You can also use the "Attributes to Expose" parameter of the TclCaller To expose the new attribute names.

 

And, of course, a PythonCaller can be used as well.

 

-----

 

# Python script example

 

def pivot(feature):

 

    attributes = b'Category', 'Priority']

 

    for attr in attributes:

 

        i = 0

 

        while True:

 

            newAttr = feature.getAttribute('%s.histogram{%d}.value' % (attr, i))

 

            if newAttr == None:

 

                break

 

            count = feature.getAttribute('%s.histogram{%d}.count' % (attr, i))

 

            feature.setAttribute(newAttr, count)

 

            i += 1

 

-----
Thanks for the response Takashi.

 

 

I was wanting to take a step back and simplify this, but getting one pivoter working..

 

 

I have tried joining a points file and a shape file using PointonAreaOverlayer. 

 

But the Overlap count attribute is showing zero for each district.When I have checked the data and there are many points in each district.

 

 

Any ideas why this isn't counting the points in each polygon?

 

 
Sounds strange. How is the parameter setting of the transformer?

 

looks like there are no overlaps as the _overlaps field is 0 for every LSOA_code. Seems very odd when I load the the points and shaepfiles and there are many points which overlap the shapefiles .... 
Point on area will not propagate to the areas, i had this problem today as well. It should but it won't.

 

I replaced it with a spatialrelator and a tester to get it propagated to the areas.

 

 

(seems bugged?)

 

Anyway i had a message from safe that the spatialrelator is repaired...in fme2015beta...i think they will not put effort in repairing 2014 (?)
If you set "LSOA_CODE" to the "Group By" parameter, only points and polygons which have the same "LSOA_CODE" value will be compared. Is it your intention of the "Group By" parameter setting?
Plan B. I think this procedure is relatively efficient.

 

 

1) Send the polygons to the Clipper port, the points to the Clippee port of a Clipper, so that "District" will be added to inside points of each polygon.

 

Group By: <not set>

 

Merge Attributes: Yes

 

Merge Attribute Prefix: <blank>

 

 

The PointOnAreaOverlayer or the SpatialFilter can also be used. But in my experiences, the Clipper seems to be more efficient than both of them, when transferring attributes of a polygon to inside points.

 

 

2) Connect a Tcl/PythonCaller to the Inside port of the Clipper.

 

-----

 

# Tcl Script Example

 

proc pivot {} {

 

    set attributes {"Category" "Priority"}

 

    foreach {attr} $attributes {

 

        set newAttr  FME_GetAttribute $attr]

 

        if {!rstring equal $newAttr {}]} {

 

            FME_SetAttribute $newAttr {1}

 

        }

 

    }

 

}

 

-----

 

# Python Script Example

 

def pivot(feature):

 

    attributes = /'Category', 'Priority']

 

    for attr in attributes:

 

        newAttr = feature.getAttribute(attr)

 

        if newAttr:

 

            feature.setAttribute(newAttr, 1)

 

-----

 

The script creates new attributes. Their names will be original attribute values (A, B, C, Red, Amber, Green); the value is 1.

 

 

3) Expose the new attribute names. Set the "Attributes To Expose" parameter of the Tcl/PythonCaller, or use the AttributeExposer.

 

 

4) Connect an Aggregator to calculate number of occurences of the new attributes for each district.

 

Group By: District

 

Mode: Attributes Only

 

Keep Input Attributes: No

 

Attributes to Sum: A, B, C, Red, Amber, Green

 

 

5) Merge the resulting features to the original polygons using "District" as the "Join On" parameter.
If you need to get the number of points for each polygon, specify the "Count Attribute" parameter of the Aggregator.
Came back after Christmas and followed your Plan B. and this has worked like a dream. 

 

Thank you very much Takashi.

 

Happy New Year.

Reply