Skip to main content

I was considering the possibilities working on large datasets by doing transformations as needed, in stead of clogging down the system with one big run. One way to do this would be to extract the required area, conducting the desired transformations and writing the result to a new table. Repeat when required, at some time in the future the new table would be completely "transformed".

 

Assuming a dataset in postgis that has no county codes or similar ways to easily group adjacent features. And assuming it is huge and vectorized with many tables and gigabytes of data. From my experience the clipper transformer is pretty slow and would not perform well on the large datasets in this example.

 

Q: Is there a efficient way to create a system of tiles covering a huge area, like a country or similar? And then writing these "tilecodes" to the features spatially covered? Can this be done with SQL?

 

This is a follow up question to one I had a few days ago: https://community.safe.com/s/question/0D54Q00008YcnHzSAJ/strategies-to-process-huge-datasets

 

A possible solution could be to get the bounding box of the entire dataset (BoundingBoxAccumulator), then create a set of tiles (Tiler using the seed coordinates from the bounding box) that covers the entire area. You can then loop over each tile, e.g. using a WorkspaceRunner or a custom transformer, in which you can use the tile polygon as a trigger to the FeatureReader to only retrieve the features intersecting the tile. If you have features that span several tiles, you may have to add some logic to avoid processing them multiple times, e.g. by setting an attribute that indicates if the feature has been processed or not.


Reply