Question

How to process dataset in batches?

  • 22 April 2016
  • 2 replies
  • 24 views

Hi,

I have got a workspace which is reading from a big dataset of spatial data, and then runs a dissolve and aggregate before outputting into a postgis database. This process can take a very long time, and I wanted it to run in batches e.g. take the first 1000 unique objects and run the process on, and then the next 1000 etc...any ideas please?

I have tried looking at some batch processing documentation but not sure this would help.

thanks


2 replies

I think a workspacerunner would be helpfull.

get only the id's from the objects with an sqlcreator split them up in portions and feed them to the workspace you created.

Userlevel 4
Badge +25

The Dissolver transformer has a parallel processing mode, which would be just as good as a WorkspaceRunner.

But in either case the problem you'll have is what happens if two features should be dissolved, but appear in two separate groups? You'd have to run everything through a second time to make sure those polygons get dissolved.

To be honest, the better route might be to just load all the data in PostGIS and use the ST_UNION function to dissolve them together in there. The performance might be better.

Reply