Hi,
The problem I'm trying to solve is as follows:
- I have about 2,000 aerial imagery ECW files, averaging in size around 30MB (so around 60GB in total).
- Each of the files cover 1km x 1km on the ground at a resolution of 10cm, so are 10,000px x 10,000px.
- I have been asked to downsample each file to 2,048px x 2,048px and output the results as a series of 4km x 4km tiles (each tile being 8,192px x 8,192px).
- I have an index feature class that lists each 'FileName' and its corresponding 'Tile' attribute.
I've attached an image that shows this spatially, below.
Large black squares and number labels represent the tiles that I want to output. The coloured squares with red outlines represent the original files.
Previously, when working with a much smaller amount of data, I've been able to use my index feature class, in conjunction with a StringConcatenater, to provide the full path of each file to an 'ER Mapper ECW' FeatureReader.
From there, I used RasterResampler and RasterMosaicker before using a FeatureWriter to output the new data as a single JPEG file.
However, this approach is very slow and likely to fall over when dealing with the large amount of data above, and also didn't account for tiling of the output.
I believe I need to use a smarter method to process one tile at a time and to name my output files with their corresponding 'Tile' name.
What is the best method to do this?
I'm not sure whether I should be using a custom transformer, WorkspaceRunner or PythonCaller to group by 'Tile' and process one at a time. And, if it is one of these, I'm not sure exactly how to implement it...
I feel like I should be able to group my raster processing into a custom transformer and pass one tile in at a time, but can't figure out how.
My work in progress screenshot, below:
Any advice would be most appreciated!
Many thanks,
Lindsay.