Skip to main content

Hi,

The problem I'm trying to solve is as follows:

  • I have about 2,000 aerial imagery ECW files, averaging in size around 30MB (so around 60GB in total).
  • Each of the files cover 1km x 1km on the ground at a resolution of 10cm, so are 10,000px x 10,000px.
  • I have been asked to downsample each file to 2,048px x 2,048px and output the results as a series of 4km x 4km tiles (each tile being 8,192px x 8,192px).
  • I have an index feature class that lists each 'FileName' and its corresponding 'Tile' attribute.

I've attached an image that shows this spatially, below.

Large black squares and number labels represent the tiles that I want to output. The coloured squares with red outlines represent the original files.

Previously, when working with a much smaller amount of data, I've been able to use my index feature class, in conjunction with a StringConcatenater, to provide the full path of each file to an 'ER Mapper ECW' FeatureReader.

From there, I used RasterResampler and RasterMosaicker before using a FeatureWriter to output the new data as a single JPEG file.

However, this approach is very slow and likely to fall over when dealing with the large amount of data above, and also didn't account for tiling of the output.

I believe I need to use a smarter method to process one tile at a time and to name my output files with their corresponding 'Tile' name.

What is the best method to do this?

I'm not sure whether I should be using a custom transformer, WorkspaceRunner or PythonCaller to group by 'Tile' and process one at a time. And, if it is one of these, I'm not sure exactly how to implement it...

I feel like I should be able to group my raster processing into a custom transformer and pass one tile in at a time, but can't figure out how.

My work in progress screenshot, below:

Any advice would be most appreciated!

 

Many thanks,

Lindsay.

reshuffleresizeraster.fmwHi @lindsay, would it be an idea to create the tiles you want to output as a feature class first with a rectangular geometry? You can than read all raster files, and replace them by their bounding box. Use a Clipper Transformer to clip the bounding boxes and sort the bounding boxes on the row and column of the tiles. Now you can re-read the raster files back in again, resample them down and this time clip the raster using the tile features group by row, column and a Group By Mode "Process When Group Changes". After that you won't use the RasterTiler any more and the RasterMosaicker can also be used in group by modus and "Process When Group Changes". You might be doing some more raster reading, but you won't run into memory trouble.

I attached a mockup workspace that uses FME Training sample datasets to mimic this behaviour.


reshuffleresizeraster.fmwHi @lindsay, would it be an idea to create the tiles you want to output as a feature class first with a rectangular geometry? You can than read all raster files, and replace them by their bounding box. Use a Clipper Transformer to clip the bounding boxes and sort the bounding boxes on the row and column of the tiles. Now you can re-read the raster files back in again, resample them down and this time clip the raster using the tile features group by row, column and a Group By Mode "Process When Group Changes". After that you won't use the RasterTiler any more and the RasterMosaicker can also be used in group by modus and "Process When Group Changes". You might be doing some more raster reading, but you won't run into memory trouble.

I attached a mockup workspace that uses FME Training sample datasets to mimic this behaviour.

Hi @helmoet,

Thanks very much for your quick and thoughtful response!

I had a look at your attached workbench and I think I understand the logic.

But I'm not sure that I need to use the bounding box clipping, as the boundaries and attributes of both my initial image files and my intended output tiles are captured in my index feature class, meaning I can use that to input a list of features and group by 'Tile'.

I do like the look of the option to Group By in the RasterMosaicker but, again, I think if I can just feed in one tile at a time, even that shouldn't be necessary, as the grouping would have happened ahead of time and the RasterMosaicker should only ever operate on a discrete tile.

For example, in the attached screenshot, I've used a ListBuilder to group by 'Tile' and create a list of the individual 'FileName' attributes:

If I can isolate the 'Raster Processing' green bookmark and feed one tile at a time, I would expect a single mosaic to be output. You can see that on either side of that bookmark, the input and output match the number of tiles, rather than individual files.

I've also attached an FMWT, which I hope contains a sample of the data I'm working with.

The imagery has been downsampled considerably and only covers 4 tiles, but the logic is what I am trying to achieve, except for processing one tile at a time.

I'm also now joining the index feature class back to the ECW files in order to maintain the 'Tile' attribute for naming the output.

Any further suggestions would be most appreciated!

tileaerialsample.fmwt


Hi @helmoet,

Thanks very much for your quick and thoughtful response!

I had a look at your attached workbench and I think I understand the logic.

But I'm not sure that I need to use the bounding box clipping, as the boundaries and attributes of both my initial image files and my intended output tiles are captured in my index feature class, meaning I can use that to input a list of features and group by 'Tile'.

I do like the look of the option to Group By in the RasterMosaicker but, again, I think if I can just feed in one tile at a time, even that shouldn't be necessary, as the grouping would have happened ahead of time and the RasterMosaicker should only ever operate on a discrete tile.

For example, in the attached screenshot, I've used a ListBuilder to group by 'Tile' and create a list of the individual 'FileName' attributes:

If I can isolate the 'Raster Processing' green bookmark and feed one tile at a time, I would expect a single mosaic to be output. You can see that on either side of that bookmark, the input and output match the number of tiles, rather than individual files.

I've also attached an FMWT, which I hope contains a sample of the data I'm working with.

The imagery has been downsampled considerably and only covers 4 tiles, but the logic is what I am trying to achieve, except for processing one tile at a time.

I'm also now joining the index feature class back to the ECW files in order to maintain the 'Tile' attribute for naming the output.

Any further suggestions would be most appreciated!

tileaerialsample.fmwt

In that case you could just select the bookmark and turn it into a custom transformer. And after that, in the custom transformer's Transformer parameters, select for Parallel Processing at least "Minimal" which will give your custom transformer a group by parameter. You can use that one to process tile by tile. Be sure to sort your data by tile and the Group By Mode "Process When Group Changes".

Your transformer in the main canvas will look like this:


In that case you could just select the bookmark and turn it into a custom transformer. And after that, in the custom transformer's Transformer parameters, select for Parallel Processing at least "Minimal" which will give your custom transformer a group by parameter. You can use that one to process tile by tile. Be sure to sort your data by tile and the Group By Mode "Process When Group Changes".

Your transformer in the main canvas will look like this:

@helmoet, that's perfect!

I just found a similar method, posted a couple of years ago and it looks like it is even easier now in FME 2020 to 'Group By' for a custom transformer, provided you know the trick.

My workbench is running right now and seems to be doing exactly what I wanted. It will be some time before it completes, but I'm pretty confident it will get there without falling over!

Thanks again for your help.


Hi Lindsay

I am struggling with the same issue - any chance you would share you setup?


Hi Lindsay

I am struggling with the same issue - any chance you would share you setup?

Hi @sbp​ ,

It's been a little while since I developed it, but I've attached a version of my workbench, which I hope points you in the right direction.

 

I may have made some mistakes with the custom tranformer settings, but it seems to do the job...

 

Let me know how you go.


Reply