Skip to main content

I've got a very big feature class which contains more than 12,000,000 polygons and I need to use TopologyBuilder and other transformers to do some processing, but it took too much time for TopologyBuilder to process the dataset. So I came up an idea that I split the dataset into several some small feature classes which contain less polygons. Now I only need to run my workspace repeatly through WorkSpaceRunner, but WorkSpaceRunner seems can not process feature classes in the GDB one by one, I have to export feature classes in to separate GDBs to get it work, but exporting processing also took a lot of time and made my folder very complex.

So is there any way to iterate through GDB? I'd appreciate any @suggestion, thank you!

To partition your data you need a group id on the polygons. So first make a group id. Perhaps there is already a useful key that is useful, otherwise create a polygon index and add the index. Often this can just be a grid depending on the edge case of your processing. You may assign the polygon by centroid so that there are no overlaps.

Then you can run the workspace runner using the polygon index id. This is passed into the workbench as a parameter that does a query on the full database. Next you output to a subset or append to an output name.

I do this when using NeighborhoodFinder to limit the addresses to search. I also use a groupby on NeighborhoodFinder to limit the search to addresses with each road name. If a process crashes due to faulty data it may be hard to detect, so I run a qa afterwards to ensure each tile was successful.

This should run quickly because the tiles should run in parallel. You will see the CPU run at 100%!


To partition your data you need a group id on the polygons. So first make a group id. Perhaps there is already a useful key that is useful, otherwise create a polygon index and add the index. Often this can just be a grid depending on the edge case of your processing. You may assign the polygon by centroid so that there are no overlaps.

Then you can run the workspace runner using the polygon index id. This is passed into the workbench as a parameter that does a query on the full database. Next you output to a subset or append to an output name.

I do this when using NeighborhoodFinder to limit the addresses to search. I also use a groupby on NeighborhoodFinder to limit the search to addresses with each road name. If a process crashes due to faulty data it may be hard to detect, so I run a qa afterwards to ensure each tile was successful.

This should run quickly because the tiles should run in parallel. You will see the CPU run at 100%!

Thank you for your suggestion, but I didn't get it very well, perhaps you can show me some example or screenshots? Maybe I didn't describe the problem very clearly, what I need is how to iterate feature classes through a GDB, not features through a feature class.


You don't need to split you feature class into several feature classes, but as kimo states you have to group your features into smaller sets. If you have an attribute 'groupAttribute' and then transform your TopogyBuilder to a custom transformer so you van use parallel processing. Then use your attribute 'groupAttribute' to group by.image


Reply