Skip to main content

I'm using AreaGapAndOverlapCleaner to fill in holes created within or on the edge of polygons in a dataset of 1.7 million polygons. This takes some time (12 hours for the whole Workspace although there are other transformers in there). I have wondered about using Group By in AreaGapAndOverlapCleaner but I don't know whether this would mean that gaps between polygons in different groups would be cleaned or not.

For example, polygon 1 is in group A and polygon 2 is in group B. If there's a gap between these polygons, and I'm using Group By in AreaGapAndOverlapCleaner, will the gap be filled in?

This is on my to do list of experiments but if anyone knows the answer straight away, please let me know :-)

BTW "AreaGapAndOverlapCleaner" is not available as a topic to tag this post with...

No it won't. Unless there is also happens to be a gap between polygon 1 and polygon 3, both in group A.


Thought so. So my next idea is some sort of iterative cleaning process where Group By or child Workspace is used on the first pass. Then on the second pass, something that can work out which polygons still have holes in them or gaps between them and other polygons and only send those to the cleaning process. But it might just end up taking as much time as the basic process outlined above. Maybe this is one scenario where I just have to wait...


I think if you split your process into batches and process each batch via a workspace runner, runnning 7 concurrent processes you'd be able to improve the speed considerably. Although would then require a further cleanup process at the end I'd still expect it to be quicker.

I've used this sort of workflow in the past to detect overlaps between 2.3 million polygons covering the whole of England in the past. Took about 40 minutes


I think if you split your process into batches and process each batch via a workspace runner, runnning 7 concurrent processes you'd be able to improve the speed considerably. Although would then require a further cleanup process at the end I'd still expect it to be quicker.

I've used this sort of workflow in the past to detect overlaps between 2.3 million polygons covering the whole of England in the past. Took about 40 minutes

Good idea. How do you select just the remaining gaps to clean up? Or do you run the whole dataset through a single Workspace and it doesn't take as long because a lot of the gaps have already been cleaned?


No it won't. Unless there is also happens to be a gap between polygon 1 and polygon 3, both in group A.

I have verified this answer with some test data, but the real data I'm working with doesn't seem to have this problem, at least according to the analysis I've done using this article:

https://knowledge.safe.com/articles/55275/data-qa-identifying-slivers-overlaps-and-gaps-in-p.html


Reply