Skip to main content
Question

AreaGapAndOverlapCleaner with Group By - what happens to gaps between polygons in different groups?

  • December 19, 2018
  • 5 replies
  • 19 views

tim_wood
Contributor
Forum|alt.badge.img+8

I'm using AreaGapAndOverlapCleaner to fill in holes created within or on the edge of polygons in a dataset of 1.7 million polygons. This takes some time (12 hours for the whole Workspace although there are other transformers in there). I have wondered about using Group By in AreaGapAndOverlapCleaner but I don't know whether this would mean that gaps between polygons in different groups would be cleaned or not.

For example, polygon 1 is in group A and polygon 2 is in group B. If there's a gap between these polygons, and I'm using Group By in AreaGapAndOverlapCleaner, will the gap be filled in?

This is on my to do list of experiments but if anyone knows the answer straight away, please let me know :-)

BTW "AreaGapAndOverlapCleaner" is not available as a topic to tag this post with...

5 replies

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • December 19, 2018

No it won't. Unless there is also happens to be a gap between polygon 1 and polygon 3, both in group A.


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • December 20, 2018

Thought so. So my next idea is some sort of iterative cleaning process where Group By or child Workspace is used on the first pass. Then on the second pass, something that can work out which polygons still have holes in them or gaps between them and other polygons and only send those to the cleaning process. But it might just end up taking as much time as the basic process outlined above. Maybe this is one scenario where I just have to wait...


ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • December 20, 2018

I think if you split your process into batches and process each batch via a workspace runner, runnning 7 concurrent processes you'd be able to improve the speed considerably. Although would then require a further cleanup process at the end I'd still expect it to be quicker.

I've used this sort of workflow in the past to detect overlaps between 2.3 million polygons covering the whole of England in the past. Took about 40 minutes


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • December 20, 2018
ebygomm wrote:

I think if you split your process into batches and process each batch via a workspace runner, runnning 7 concurrent processes you'd be able to improve the speed considerably. Although would then require a further cleanup process at the end I'd still expect it to be quicker.

I've used this sort of workflow in the past to detect overlaps between 2.3 million polygons covering the whole of England in the past. Took about 40 minutes

Good idea. How do you select just the remaining gaps to clean up? Or do you run the whole dataset through a single Workspace and it doesn't take as long because a lot of the gaps have already been cleaned?


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • June 21, 2019
jdh wrote:

No it won't. Unless there is also happens to be a gap between polygon 1 and polygon 3, both in group A.

I have verified this answer with some test data, but the real data I'm working with doesn't seem to have this problem, at least according to the analysis I've done using this article:

https://knowledge.safe.com/articles/55275/data-qa-identifying-slivers-overlaps-and-gaps-in-p.html


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings