Skip to main content

I have a process I need to run on point clouds. In short, I tile them, sort the tiles, and remerge the tiles back into a single cloud. This is for compatibility reasons with a piece of software that doesn't like clouds that are not well-sorted spatially.

 

The workspace is fairly simple; input, tiler, sorter. pointcloudcombiner, output. It works well and does what I want when run on a single input cloud.

 

However, I need to run it in bulk on large numbers of clouds. And performance suffers. I realized this was because it was tiling every input cloud before sorting them and assigning them back to the combined clouds, so I added Group Bys. Then I changed the group bys to "Process when group changes" to keep the data moving.

 

But each step holds another set of data "in process" when set that way. So instead of only keeping one cloud in memory at a time, it keeps 3. If I create a custom transformer of my steps and copy it as many times as I have input clouds, each runs one at a time and the overall performance is considerably faster.

 

However, I can't make enough copies of the custom transformer for any task I might encounter in the future.

 

So my question is, how can I control the feature flow such that when a new feature enters the pipeline, it is completely cleared through the chain of transformers and to the writer before the next feature is loaded into memory? Or is that even possible? I thought about using the WorkspaceRunner, but that seems overly complicated.

I also think the WorkspaceRunner should be the way to go. And it is not complicated, it is actually quit easy. A parent workspace to devide the workload, one or more child workspace to run the tasks.


Reply