Skip to main content

I have couple workbenches processing bulk updates each night of which the point data is in NAD and the users would like the eastings/northings for the relevant state plane added as attributes.

I created this transformer attached where I flow in the lat/lons, aggregate them based on the rounded coordinate to run the next transformer in a grouped by parallel processing which intersects with state plane polygons and then does the project to E/N before converting back to NAD.

It's still slow - anyone have any ideas how to speed this up or do it differently?

First quick ideas - be sure the Point On Area Overlayer has Areas First to Yes. That should allow your points to go through straight away. Feels like the FeatureHolder shouldn't be needed to block up the points -- I think the Creator should cause that FeatureReader to send the polygons in before the points arrive. That to me is the key to getting this thing moving faster.

Out of curiosity, how many points are involved?


First quick ideas - be sure the Point On Area Overlayer has Areas First to Yes. That should allow your points to go through straight away. Feels like the FeatureHolder shouldn't be needed to block up the points -- I think the Creator should cause that FeatureReader to send the polygons in before the points arrive. That to me is the key to getting this thing moving faster.

Out of curiosity, how many points are involved?

about 5k per night. But sometimes we could do an ad-hoc bulk update so that's 1 - 5 million.

 

 

Made that change and that did improve performance a bit though it's still realtively slow,was hoping spatial intersections were fast.

 

 


If the purpose is just to determine if each point overlaps an area, the Clipper (Clipper Type: Clippers First) might be more efficient than the PointOnAreaOverlayer.


I can think of a lot of ways to vary this, though which is faster I'm not sure.

Firstly you can use the points as the Initiator features for the FeatureReader. ie each point queries the polygon dataset with a spatial filter (Initiator is Within Result), just the the PointOnAreaOverlayer does. Just set Geometry as "Use Initiator" and it should keep the geometry and merge in the attributes of the source polygons. Saves having a two-step process.

Secondly, you don't need to reproject there and then back again. You can just use the CoordinateExtractor to fetch the coords in NAD and use the AttributeReprojector transformer to reproject those values to State Plane. That, I think, should save you some time; and means you don't have to risk changes to your geometry by round-tripping through a reprojection process.

Thirdly, as Takashi says, the Clipper is almost certainly quicker than the PointOnAreaOverlayer - at least in my experience - so even if the above ideas don't help, switching to a Clipper should be a benefit. Note, it's not that the Overlayer is inherently slower at doing the same thing, but it does carry out additional functions (such as counting the number of overlaps, or generating lists) that the Clipper doesn't do, and which I don't think you need here.

Edit: Some other ideas/thoughts I just had...

Are these points always the same? Or re-used frequently? What I'm getting at is that if there is any way to save the state plane zone (even if it's in a lookup table with a foreign key match) future processing will be much quicker than re-doing a spatial filter. Or, if these are addresses with address info stored as attributes, then you already know which state the address falls in, just not which zone. Basically spatial filtering is an expensive process in my experience, and if you can avoid it, or minimize it, then you're saving a lot of time.

I'd also add that I don't believe Clippers First, or Areas First is going to save you much time here. Generally they reduce memory use rather than time. The time aspect only comes in to play when you use up so much memory that FME has to cache to disk. That's when xxxx First can save time, by preventing disk caching. But for 5k point features? Shouldn't be an issue.


I can think of a lot of ways to vary this, though which is faster I'm not sure.

Firstly you can use the points as the Initiator features for the FeatureReader. ie each point queries the polygon dataset with a spatial filter (Initiator is Within Result), just the the PointOnAreaOverlayer does. Just set Geometry as "Use Initiator" and it should keep the geometry and merge in the attributes of the source polygons. Saves having a two-step process.

Secondly, you don't need to reproject there and then back again. You can just use the CoordinateExtractor to fetch the coords in NAD and use the AttributeReprojector transformer to reproject those values to State Plane. That, I think, should save you some time; and means you don't have to risk changes to your geometry by round-tripping through a reprojection process.

Thirdly, as Takashi says, the Clipper is almost certainly quicker than the PointOnAreaOverlayer - at least in my experience - so even if the above ideas don't help, switching to a Clipper should be a benefit. Note, it's not that the Overlayer is inherently slower at doing the same thing, but it does carry out additional functions (such as counting the number of overlaps, or generating lists) that the Clipper doesn't do, and which I don't think you need here.

Edit: Some other ideas/thoughts I just had...

Are these points always the same? Or re-used frequently? What I'm getting at is that if there is any way to save the state plane zone (even if it's in a lookup table with a foreign key match) future processing will be much quicker than re-doing a spatial filter. Or, if these are addresses with address info stored as attributes, then you already know which state the address falls in, just not which zone. Basically spatial filtering is an expensive process in my experience, and if you can avoid it, or minimize it, then you're saving a lot of time.

I'd also add that I don't believe Clippers First, or Areas First is going to save you much time here. Generally they reduce memory use rather than time. The time aspect only comes in to play when you use up so much memory that FME has to cache to disk. That's when xxxx First can save time, by preventing disk caching. But for 5k point features? Shouldn't be an issue.

This is awesome, never knew about the AttributeReprojector - thanks for that! Points are always different unfortunately. But the AttributeReproj helped and the clipper. Thanks!

 

 


Reply