Skip to main content
Simple setup:

 

 

A large number of Line features (millions) needs to be overlaid with a set of Area features (25000). So a simple LineOnAreaOverlayer is used. This process is very slow and I am running into memory problems even with FMEx64. I have an attribute on the Line features that can group them into about 250 groups. So a case for parallel processing I thought (with which I have limited experience).

 

 

My problem is with the Areas. I want to process each group Line features with the same set of Area features. So I want to re-use the same set of Area futures for every batch after initial input. I cannot seem to think of a good idea to accomplish this. Even with a Custom Transformer. The Area Features do not have the group attribute and for various reasons each batch of Line features must be intersected with all Area features.

 

 

The only (fairly inelegant) solution that I could come up with is to cross join the Area Features with the list of groups using a FeatureMerger but that also explodes my Area Features into the millions.

 

 

Any other ideas?

 

 

Regards
Have you tried using the Clipper transformer instead of the LineOnAreaOverlayer? The Clipper has a `Clipper Type: Clippers First` option. If you can load the areas into the Clipper first (change the order of the readers in the Navigator to have the areas read first; if the lines and areas are both in the same dataset, try using 2 separate readers), then the memory overhead will be lower.
Thanks I will definitely give this a try and let you know. Did not think of this!
Have you tried using the Clipper transformer instead of the LineOnAreaOverlayer? The Clipper has a `Clipper Type: Clippers First` option. If you can load the areas into the Clipper first (change the order of the readers in the Navigator to have the areas read first; if the lines and areas are both in the same dataset, try using 2 separate readers), then the memory overhead will be lower.

I completely agree with @ryancragg 's approach above and think it is a great way to solve the problem.

I wanted however to plant the idea that you could make a custom transformer that had a FeatureReader in it to read the areas. In that way, every time that custom transformer was fired up (in a parallel processing situation), the areas would be read first and effectively a copy of them would appear in the transformer.

If there was some way to just query out your line features by "group" from their source (what format are they in), then perhaps a feature reader in the same custom transformer could read the right subset. And in so doing you'd be able to do all the processing in parallel -- the main workspace would just send in one feature per "group" to be read, which would then trigger the custom transformer to read the right group.

Could be an interesting scenario to mock up anyway.


Thanks for the help. I never clearly understood the implications of the Clippert First option in conjunction with Group By. But this can of course be set up to clip features in batches with the same set of aras as you suggested. And by collecting the features from both the Inside and Outside ports I can mimic the LineOnAreaOverlayer I had initially. Run my workspace and 24 hours later I am happy. Clipping took about 4 hours in stead of bombing out after 12 with the LineOnAreaOverlayer. Thanks 1 000 000.

Reply