Skip to main content
Question

How to process dataset in batches?

  • April 22, 2016
  • 2 replies
  • 110 views

Hi,

I have got a workspace which is reading from a big dataset of spatial data, and then runs a dissolve and aggregate before outputting into a postgis database. This process can take a very long time, and I wanted it to run in batches e.g. take the first 1000 unique objects and run the process on, and then the next 1000 etc...any ideas please?

I have tried looking at some batch processing documentation but not sure this would help.

thanks

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

2 replies

patrick_koning
Contributor
Forum|alt.badge.img+7

I think a workspacerunner would be helpfull.

get only the id's from the objects with an sqlcreator split them up in portions and feed them to the workspace you created.


mark2atsafe
Safer
Forum|alt.badge.img+59
  • Safer
  • April 22, 2016

The Dissolver transformer has a parallel processing mode, which would be just as good as a WorkspaceRunner.

But in either case the problem you'll have is what happens if two features should be dissolved, but appear in two separate groups? You'd have to run everything through a second time to make sure those polygons get dissolved.

To be honest, the better route might be to just load all the data in PostGIS and use the ST_UNION function to dissolve them together in there. The performance might be better.