Solved

Group points into squares

  • 27 November 2014
  • 18 replies
  • 14 views

Badge
(I'm using FME Server 2011)

 

 

hi FME'ers,

 

 

I have a whole bunch of points, millions, and I want to create a grid of squares/rectangles (bounding boxes) covering the points so that each grid feature contains roughly the same number of points.  I'm then going to use this grid to set of a whole load of fme server jobs to process the points - so am using the squares as a way to split up the process into managable chunks, by passing in the min/max x/y values to my 'worker' workbench.

 

 

I've split this job up so i'm dealing with 10k chunks, so want to divide this 10k square into smaller squares (or rectangles) based on the density within the square.

 

 

I also want to reduce the number of jobs I create on the server, so only want to break up the tiles when they are over a certain point count - say 5000, and group them to between 3000-8000 say.

 

 

so in the picture below, I want all the green /amber tiles grouped together (but only if they form another square or rectangle below a certain threshold.

 

 

not sure I'm making sense....sorry.

 

 

 
icon

Best answer by takashi 28 November 2014, 11:50

View original

18 replies

Userlevel 4
Hi,

 

 

have you looked at the Tiler?

 

 

David
Badge
hi David,

 

 

Yes I'm using Tiler at the moment to create the squares in the picture.

 

 

at the moment i'm counting the number of points, then setting tiler to create smaller tiles depending on how many points are in the big tile, and this would work well - if the distribution of points was uniform.

 

 

I've been looking at how to merge the smaller 'tiles' back into squares, but to do this in vector tiles would mean some sort of self iterating loop i think, there must be a simpler way? (or how to do the loop to group the tiles into larger tiles).

 

 

maybe add two tiles, test the size, and next two tiles test the size, and keep going until i get within a threshold.
Userlevel 2
Badge +17
Hi,

 

 

In fact, FME is not so good at such a recursive processing. Scripting (Python) could be a relatively quick solution.

 

This image is a result generated by an experimental script. The script divides recursively the bounding box by four until the number of inside points becomes less than 6.

 

 

 

If the result matches to your requirement, I can provide the script for your reference.

 

 

Takashi
Badge +3
you can find scripts for "QUAD indexing" on the net, stackexchange for instance.

 

The python code for that is not very difficult.

 

 

And it is a recursive procedure of course, but that is the extent of your question.

 

Recusrion should not need be that scary.

 

 

And you can do that with custom loop transformers btw. (saved as it is a blocking transformer issue)

 

 
Badge
Both,

 

 

many thanks for your reponses.  I thought i'd seen this sort of algorithem before Gio, thanks for the reminder.  I was doing all sorts of searches for this!

 

 

Takashi,  If you don't mind sharing your python script that would be great.

 

 

I don't really like using python in FME as
  1. it feels like cheating
  2. hinders backwards compatibility
  3. make the workbench less self explanatory,
however I suspect this is one of those times.

 

 

I've used recursive custom transformers to some success before, but your right, in the quick go I had at doing this in FME i needed to use a blocker transformer and wasn't sure how to use an external linked custom transformer in a workbench running as a server job? (the job failed, saying unable to reference transformer, i had published the transformer to server.  a question for another day perhaps.

 

 

so gave up and asked here!

 

 

I've had to park this work now, but I will revisit it at a  later date as it sounds like a good tool to have in the FME Server arsnal to take advantage of multiple engines and divide system resource evenly.
Userlevel 2
Badge +17
This is the script.

 

-----

 

# PythonCaller Script Example (FME 2011)

 

# Designed for FME 2011, may not work with FME 2012 or later.

 

import pyfme

 

 

class NonUniformBoxGenerator(object):

 

    def __init__(self):

 

        self.points = [] # List of (x, y, point feature)

 

        self.boxId = 0 # Bounding Box ID

 

        

 

    def input(self, feature):

 

        # Extract coordinate and store the point feature.

 

        coord = feature.get2DCoord(0)

 

        self.points.append((coord[0], coord[1], feature))

 

        

 

    def close(self):

 

        # Calculate initial extent.

 

        xmin, ymin, p = self.points[0]

 

        xmax, ymax = xmin, ymin

 

        for x, y, p in self.points[1:]:

 

            if x < xmin:

 

                xmin = x

 

            elif xmax < x:

 

                xmax = x

 

            if y < ymin:

 

                ymin = y

 

            elif ymax < y:

 

                ymax = y

 

                

 

        # Start creating boxes.

 

        self.createBoundingBox((xmin, ymin, xmax, ymax), self.points)    

 

    

 

    # This method will be called recursively to divide box by four,

 

    # until the number of inside points becomes less than 6.

 

    def createBoundingBox(self, extent, points):

 

        xmin, ymin, xmax, ymax = extent

 

        

 

        if len(points) < 6:

 

            # Create a box polygon; output the polygon and inside points.

 

            # Output features can be classified by the GeometryFilter.

 

            xCoords = [xmin, xmax, xmax, xmin, xmin]

 

            yCoords = [ymin, ymin, ymax, ymax, ymin]

 

            box = pyfme.FMEFeature()

 

            box.addCoordinates(xCoords, yCoords)

 

            box.setGeometryType(pyfme.FME_GEOM_POLYGON)

 

            box.setIntAttribute('_box_id', self.boxId)

 

            self.pyoutput(box)

 

            

 

            for x, y, p in points:

 

                p.setIntAttribute('_box_id', self.boxId)

 

                self.pyoutput(p)

 

            self.boxId += 1

 

            

 

        else:

 

            # Calculate extent of divided boxes.

 

            xc = (xmin + xmax) * 0.5 # Center X

 

            yc = (ymin + ymax) * 0.5 # Center Y

 

            extentList = [

 

                (xmin, ymin, xc, yc), # bottom-left

 

                (xc, ymin, xmax, yc), # bottom-right

 

                (xmin, yc, xc, ymax), # top-left

 

                (xc, yc, xmax, ymax) # top-right

 

            ]

 

            

 

            # Collect inside points for each box.

 

            pointsList = [[], [], [], []]

 

            for x, y, p in points:

 

                if y < yc:

 

                    if x < xc: # bottom-left

 

                        pointsList[0].append((x, y, p))

 

                    else: # bottom-right

 

                        pointsList[1].append((x, y, p))

 

                else:

 

                    if x < xc: # top-left

 

                        pointsList[2].append((x, y, p))

 

                    else: # top-right

 

                        pointsList[3].append((x, y, p))

 

            del points # save memory

 

            

 

            # Call the method for each box.

 

            for extent, points in zip(extentList, pointsList):

 

                self.createBoundingBox(extent, points)

 

-----
Badge
Many thanks.  I'll give it a go.
Userlevel 2
Badge +17
This is a custom transformer example wrapping an enhanced Python script.

 

RecursiveTiler2011 (https://drive.google.com/file/d/0B0ufVP2t0eApYlVESGVPdTNpNzg/view?usp=sharing)

 

Requirement: FME 2011 SP4+

 

FYI.
Badge
This is great, but... ;)

 

 

Can you explain the input settings?

 

 

It looks like the root tile is actually the second level of tiling?

 

 

I'm having an issue where the tiles are scaled up a factor outside of the input points accumulated bounding box.

 

 

e.g. the picture below has a set of points extracted from within a 10km square, and I set the root tile size to 5000 in H and W, however the output size is 15,000 m accross and it outputs 9 'root' tiles of 5 km squares. (if i set root to 10,000 it outputs 4 * 10 km squares = 20KM 'root' tile.)

 

 

The behaviour I was expecting (or need) was that the output tiles would cover a predefined box (the root tile) (maybe as an input feature?) and this root feature would be split if the points exceeds 'max no of unique points' value.

 

 

note: the extent of the root tile will be differnt to the total extent of the input point features as the tile is itself a 'job'.  (i.e. i'm tiling the UK into 10km chunks)

 

 

Userlevel 2
Badge +17
The transformer basically creates only tiles within minimum area covering incoming points. Unfortunately there isn't a option that can be used to create tiles outside of the minimum bounding box.

 

 

Possibly this version works as expected for your requirement.

 

Ver. 2 (https://drive.google.com/file/d/0B0ufVP2t0eApbld4QThYQkdueFk/view?usp=sharing)

 

Try setting Minimum X / Y, Maximum X / Y of Creation Area parameters.
Userlevel 2
Badge +17
If you set the "Seed Coordinate X / Y" parameter to bottom-left of the predefined tile, the transformer perhaps could generate preferable result.

 

The seed coordinate is something like "origin" that will be used to align the root tile boundaries. If you leave the parameter blank, the transformer treats xmin/ymin of incoming points as the seed. The behavior is similar to the same parameter of the existing Tiler.
Userlevel 2
Badge +17
ah, the Tiler didn't have the "Seed Coordinate X/Y" parameter in FME 2011.

 

Maybe it's the reason for the confusion.

 

 

The transformer calculates extent (xmin, ymin, xmax, ymax) of the entire tiling area automatically based on the "Root Tile Width/Height" (initial tile sizes at recursion starting) and "Seed Coordinate X/Y".

 

For example, if you set 5000 to "Root Tile Width" and 1000 to "Seed Coordinate X", xmin of the tiling area will be (N * 5000 + 1000). Here, N is an integer (may be negative), it will be determined so that the xmin will be less than and also closest to, or equal to minimum X of all the incoming points.

 

Equally, the ymin will be less than and also closest to, or equal to minimum Y of all the incoming points; the xmax/ymax will be greater than and also closest to, or equal to maximum X/Y of all the incoming points.

 

Result, the tiling area covers all the input points, and the width/height of the area will always be minimum multiple of "Root Tile Width/Height".

 

 

Therefore, if you set the "Seed Coordinate X/Y" to the origin that you used to perform the 10km tiling, maybe you can get preferable result.

 

In that case, "Mimimum/Maximum X/Y of Creation Area" of the version 2 will not be necessary. You may leave them blank.
Badge
Many thanks for your help with this Takashi.

 

 

I'll digest this info and take a closer look at the code/transformer (probably over christmas as the project i'm doing this for hasn't started yet).  It sounds like your code is being cleverer than my requirments as I want to preset the root tile (like v2 would seem to do), and create an 'index' over the points within that root tile.  I don't want to use the accumulated bounding box of the points as that doesn't define my area - although it may well in some other situations.

 

 

I'll post my findings.
Badge
Hi,

 

 

In fact, FME is not so good at such a recursive processing. Scripting (Python) could be a relatively quick solution.

 

This image is a result generated by an experimental script. The script divides recursively the bounding box by four until the number of inside points becomes less than 6.

 

 

 

If the result matches to your requirement, I can provide the script for your reference.

 

 

Takashi
Hi @takashi,

 

 

I've finally gotten around to continuing this work!

 

 

I'm using the recursive Tiler2011 transformer I think you sent me, which uses your script posted above - it's great, thankyou.

 

 

However, to optimise the process (circa 60 million records) i'm having to split the data up into 10KM chunks(or so), so I'd like the tile extent to match the chunk size exactly so that i can merge all my tiles together.

 

 

I think this means altering the seed options, but i'm struggling to work it out? Could you explain what the seed parameters do?
Badge
Hi,

 

 

In fact, FME is not so good at such a recursive processing. Scripting (Python) could be a relatively quick solution.

 

This image is a result generated by an experimental script. The script divides recursively the bounding box by four until the number of inside points becomes less than 6.

 

 

 

If the result matches to your requirement, I can provide the script for your reference.

 

 

Takashi
sorry @takashi, I've figured out i need to use 0 as my seed coordinates for the behaviour I require. I was using a non square set of features, which confused me.

 

 

Still interested about what you had in mind for the seed parameters though.
Userlevel 2
Badge +17
Hi,

 

 

In fact, FME is not so good at such a recursive processing. Scripting (Python) could be a relatively quick solution.

 

This image is a result generated by an experimental script. The script divides recursively the bounding box by four until the number of inside points becomes less than 6.

 

 

 

If the result matches to your requirement, I can provide the script for your reference.

 

 

Takashi
Hi @nrich, Good to hear that you accomplished the job.

 

Regarding the Seed Coordinate X/Y parameters, I intended that they work to adjust the left-bottom corner location of the tiles. If you set (0,0) to them, for example, the left-bottom coordinates will not have decimal places. If you left blank to them, the left-bottom of the resulting tile will be equal to the left-bottom of the bounding box covering all the input points.

 

This behavior is similar to the same parameters of the Tiler in FME 2013 or later. See also the help on the latest Tiler transformer.
Userlevel 2
Badge +17
Hi,

 

 

In fact, FME is not so good at such a recursive processing. Scripting (Python) could be a relatively quick solution.

 

This image is a result generated by an experimental script. The script divides recursively the bounding box by four until the number of inside points becomes less than 6.

 

 

 

If the result matches to your requirement, I can provide the script for your reference.

 

 

Takashi

sorry @nrich, correction.

If you left blank to the Seed parameters, the left-bottom will be equal to (Minimum X, Minimum Y).

Hello!

I am trying to use this python script which would be very helpful for me. I have a huge amount of data from a WFS which needs to be tiled in a good way. @takashi had a very good code, but I can’t get it to work in 2017.10.154 edition (the version I have).

I have a example file with more than 2000 points and I want tiles with 230 points each.

I have python installed on my computer and I am using the python caller.

I get error messages:

” FME Configuration: No destination coordinate system set. PythonFactory failed to load python symbol `FeatureProcessor'. Factory proxy not initialized f_4(PythonFactory): PythonFactory failed to process feature”

Any suggestions?

Reply