Skip to main content

Given a workspace that uses FeatureReader, with a Geodatabase feature class as the "initiator", does anyone have a suggestion for how to process one Geodatabase feature (from the initiator) at a time? I have a large geographic area currently as the initiator and the model goes on to process millions and millions of features from dozens of feature classes (via FeatureReader), and is a slug!

 

I am hoping that by clipping the data that represents my initiator in the above description, and finding a way to make workbench process only one feature at a time from a single geodatabase feature class, it can process the data in more timely manner. I have looked in to Workspace Runner but I can find any material on how to use it to process features from within the same feature class.

 

Thank you

I'm no WorkspaceRunner expert, and a lot depends on the type of processing you want to do, but some pointers:

  • As a start, you can initiate a proces, also a FeatureReader using a Creator transformer. You don't have to use a classic reader per se.
  • You want to build the Child workspace first.
  • Use Published Parameters as input for the FeatureReader.
    • Example: FeatureReader reading a FeatureClass based on where clause [where objectid = $(objectid)]
  • Then build the Parent, add the WorkspaceRunner, point to the Child, connect the Child's Published Parameters with the Parents attributes.
  • Now read all the originals and send them one by one to the Child.
  • Running a WorkspaceRunner is black box processing. No connections with numbers as in Workbench. This is why I often have an extra table with a worklist with the parent feed objectid's, start and finish datestamp, some input and output metrics, and a log, which I created / concatenated from all rejected / failed outputports.
  • You can speed up the Child proces by only reading the Candidates which overlap with the base to begin with, using the FeatureReader's spatial filter.

But again, just from my limited experience. Probably someone else has a better answer.


I'm no WorkspaceRunner expert, and a lot depends on the type of processing you want to do, but some pointers:

  • As a start, you can initiate a proces, also a FeatureReader using a Creator transformer. You don't have to use a classic reader per se.
  • You want to build the Child workspace first.
  • Use Published Parameters as input for the FeatureReader.
    • Example: FeatureReader reading a FeatureClass based on where clause [where objectid = $(objectid)]
  • Then build the Parent, add the WorkspaceRunner, point to the Child, connect the Child's Published Parameters with the Parents attributes.
  • Now read all the originals and send them one by one to the Child.
  • Running a WorkspaceRunner is black box processing. No connections with numbers as in Workbench. This is why I often have an extra table with a worklist with the parent feed objectid's, start and finish datestamp, some input and output metrics, and a log, which I created / concatenated from all rejected / failed outputports.
  • You can speed up the Child proces by only reading the Candidates which overlap with the base to begin with, using the FeatureReader's spatial filter.

But again, just from my limited experience. Probably someone else has a better answer.

@nielsgerrits​ That sounds like an elegant solution. I will give it a shot and report back. Thank you!


@nielsgerrits​ That sounds like an elegant solution. I will give it a shot and report back. Thank you!

Cheers, I would say, start small and do some performance testing. Sometimes it's just not worth all the hassle and I really really like the workbench feedback when running a job.

Not having enough experience to say where the optimum lies between big workbench running all at once vs parent/child jobs and all the disadvantages which come with it.


So, firstly I would say turn off feature caching. That will be causing a tremendous slowdown with the amount of data that you have.

Secondly, I do wonder about the connection from AreaOnAreaOverlayer to Dissolver. Would you get the same result sending the data directly to the Dissolver? I see you'd need the AOAOverlayer anyway, because its output is going elsewhere as well. But I think the Dissolver is probably spending a lot of time reversing the overlay operation. So send the data to the AOAOverlayer, AND send it to the Dissolver as a separate stream that bypasses the overlay operation. I think the result of the Dissolver will be the same, but faster.

Next... I'm not sure what your FeatureReader parameters are, but as Niels mentions, you might be able to use a WHERE clause in there (spatial or attribute-based) to reduce the amount of data being read.

But overall, you have a number of group-based transformers strung together, and that always causes a bit of a slowdown. Turning off caching will help, but as you figure, the WorkspaceRunner might be an alternative. The results are the same, but you're running a separate process per group, rather than one process for all groups. That can speed things up, and I think Niels has given you a good starting point there.

 

Hope this helps

 

Mark


More questions than answers really, but, why does the FeatureReader need to read the same Data Source twice? Is there something being used in the Initiator Features to change how FeatureReader reads on each iteration?

 

Apart from that AoAOverlayer can be quite problematic in performance, and very slow, when dealing with polygon data that does not have very good Topology ie. Lots of duplicate vertices, lots of "slivers" etc from very tiny overlaps etc. For example, ESRI's default Geodatabase topology accuracy is 0.001 metres, but this accuracy creates all sorts of problems when those features which aren't quuuuiiiiiite snapped together start getting processed in FME ie. ArcGIS considers them "snapped" when vertices are <0.001 m apart, but FME does not (not by default).

 

I did find I significantly improved AoAOverlayer's run times, and significantly reduced the number of Features output, by first putting all the Polygons through GeometryValidator to "clean" the polygons of Duplicate Nodes, Consecutive Nodes <0.001 m apart, Self Intersections etc. and then using AnchoredSnapper to snap all the polygon vertices together to a "Reference" polygon layer as much as possible with a say 0.001 metre Tolerance. This produced a much cleaner Topological polygon dataset for AoAOverlayer to have to deal with.

 

This sped up AoAOverlayer by a factor of ~100 times because otherwise it was taking hours and producing millions upon millions of poorly topologically noded polygons. For whatever reason, the alternative of just setting the tolerance in AoAOverlayer was nowhere near as performant.


Reply