Skip to main content

In your experience, would splitting the feature flow and directing it into multiple, simultaneous SurfaceDraper transformers significantly decrease the processing time for a workspace, all else being equal? Let me explain what I am dealing with and why I ask ...

SurfaceDraper bottleneckI have 730K points (center points of square grid cells) and 2.1K triangular 3D polygons, overlapping with each other in both 2D and 3D. For each point, I must identify the triangles it intersects (in 2D) and identify the triangle that is lowest at that point's location, extracting the triangle's z-value as an attribute. For that, I use a SurfaceDraper -- see the screenshot above. Points and triangles go through a PointOnAreaOverlayer with output of both points and areas, and those two flows are directed through ListExploders to end up with two matched sets of equal size as input to a single SurfaceDraper transformer. Functionally, the workspace works fine, but the processing time is massive. The SurfaceDraper appears to be the bottleneck. The point-on-triangle input to the SurfaceDraper can't be any simpler. So, I am wondering if splitting the two input flows to the Surface Draper into multiple pairs of flows, which are then directed into multiple SurfaceDraper transformers, would significantly decrease overall processing time for the workspace ... or is that just a parallel processing pipe dream? Processing rate also appears to be slowing down over time, which I do not understand. It took about 15 minutes to get all 2M matched inputs to the two Sorters just before the SurfaceDraper. It then took about 7.5 hours for the Surface Draper to process 800K input pairs. Now, it has taken the balance of the 22-hour runtime so far to process the next 300K input pairs. This really makes no sense to me unless it is a memory/disk swap space thing (which it may be). Thoughts?

 

In the SurfaceDraper (assuming you use Group Processing) the Group By Mode is set to Process When Group Changes, correct?

 

I think splitting the stream to multiple SurfaceDrapers will not help because running workbench is sequential, not parallel.

The only way I see parallel workflow processing work is using WorkspaceRunners. But this might generate a lot of I/O so it is always hard to find the balance between time spend in processing and time spend in I/O.

 

If you see "...optimizing memory..." in the log workbench is swapping. Personally, when I see this in a logfile a stop running workbench and start redesigning the workflow.


Reply