Question

Optimising Clipper usage - Triggers in FME?


Badge
Hi All,

 

I'm having some problems with Clippers.

 

 

I have a set of features which I'm grouping together, and turning into a grid (BBoxAccumulator ->2D Gridder); I'm then using these features as Clippers against the original data (now clippees), using the same "grouping by".

 

 

Because the same dataset is both Clipper and Clippee, and the because Clippers have two blockers before they get to the Clipper as compared to none for the Clippee's, I'm struggling to optimise the Clipper (there's enough data that I end up running out of RAM and FME starts "optimising" data to disk).

 

 

I can't set "Clippers First", no matter what combination of featureHolder I put before the Clippees, they always get there first.

 

I've tried using a Sorter, and then "Clippers First" and/or "Input is ordered by group", but the process then goes even slower, and uses slightly more RAM because of the Sorter.

 

 

Any suggestions for how to optimise/do this?

 

 

Thanks,

 

Jonathan

10 replies

Userlevel 4
Hi,

 

 

as you've discovered, the Sorter can be rather expensive if you have lots of data. I would avoid it in your case. If possible, consider replacing the reader with a SQLCreator and set an ORDER BY clause instead, it is a lot quicker (make sure there's an index on the sort field, though). This would have the same effect as the Sorter without blocking your data flow, and you could set "Input is ordered by group" to yes.

 

 

You will also have to rethink your process to try and get the Clippers first, so that you can enable that option. It can make a big difference. I cannot tell you how to do that without knowing the workspace, though.

 

 

David

 

 
Badge
Hi David,

 

Thanks for your thoughts. Alas the source dataset is a shapefile or File/Personal GeoDatabase, there's no option for ORDER BY.

 

 

If you want to see if you can rethink it, below (next post) is the custom transformer (my work blocks things like paste-bin so can't use that). It's entirely self-contained and fairly well documented. Input should be a vector dataset, probably in a Mercator (ideally British national grid).

 

 

Thanks,

 

Jonathan

 

 
Userlevel 2
Badge +17
Hi Jonathan,

 

 

I sometimes use the InlineQuerier to sort a lot of features read from non-database datasets. Since the InlineQuerier creates a temporary database, takes a certain time indeed. But the database will be saved as files into the FME_TEMP folder, so it consumes almost no memory. As a pre-processing of the clipping, it may be worth trying. 

 

 

Takashi
Badge

Hi Takashi,

 

Thanks for the suggestion, InlineQuerier crossed my mind after reading David's suggestion, but I suspect it'll be quite slow given its writing to disk. I'll have to try it.

 

 

 

Incidentally, the custom transformer can be had from here if you or David want's to take a look: https://transfer.hrwallingford.com/pkg?token=e7ec82ba-ca22-46e7-99ee-b8e1488ea682

 

 

Thanks,

Jonathan

Badge +3
Hi,

 

 

I encountered this issue to try to set parallel processing to a parameter.

 

 

When for instance making a iterative neigbourfinder, all parameters ccan be set except parallel processing.

 

 

However, did you try to sdo this in the navigator panel?

 

 

You can create a Parallel proces group by attribute and then link the parallel process by in the advanced section.

 

Like this:

 

 

 

 

If you don't link it u get the macro acces error.

 

 

This example is used in this :

 

 

 

to create this:

 

 

 
Badge +3
..alas you can not paramtrize the level of paralell processing.
Badge
Hi Gio,

 

Interesting. I had tried setting parrallelism as a parameter and got the "undefined macro" error you're reporting. Interesting solution, though in my experience for the most part things don't go faster with parrallelism and this didn't much when I tried it.

 

Cheers,

 

Jonathan
Userlevel 4
Jonathan,

 

 

for what it's worth, your experiences with parallell processing mirrors mine. I have yet to find a compelling case where it makes a noticable difference. It's a pity, really, as in principle it holds great promise.

 

 

David
Badge +3
Oh but i have found that.

 

 

I have a work bench to read and process heightdata.

 

 

The tirck is to set up a tilingstrategy.

 

This tiling must be (consequently) uphold trough all the streams, then wherever a parallelprocesable transfromer is you can use this tiling to set a processinglevel.

 

 

Parallelprocessing does not do much if it's like "hey! i can set it here...well let's"

 

It is only usefull when a workbenchwide strategy is set and uphold.

 

 

In the Taskmanager you can see all cores go up, so it works neatly.

 

 

Lot's of workbenches don't lend themselves for this, so cannot benefit from it.
Badge
Hi Gio,

 

This custom transformer actually is a tiling strategy - I basically have the given custom transformer subsetting a dataset (using this clipper) into ever smaller grid squares.

 

 

While the CPU usage certainly does rocket in some situations (and I have 16cores to play with here), unfortunately this is offset by the overhead of starting/stopping all those processes, and also transferring the data is too much to make it worthwhile.

 

 

Furthermore, CPU use only rises in some cases. If I'm using "Input is Ordered by group" (which I've determined is the optimum for this situation) then the child process finishes so quickly there's no use of multiple cores.

 

Even with that set to no I'm only seeing use of two cores the benches finish so quickly.

 

 

From my testing, I think parallel Processing works best for a smaller number of group-by features that are performing a task that requires lots of CPU power.

 

But after reasonably exhaustive testing, what I'm seeing here is a 20-50% slowdown when using any level of parralelism.

 

 

Cheers,

 

Jonathan

Reply