Skip to main content

Dear FME-users,

 

 

I’m creating feature classes (polygons) from NetCDF-data. Among other things I use the transformer “RasterCellCoercer”, the transformation needs so much time.

 

 

I’ve 504 input and 504 output files. The geometry is always the same, just the values of the input data are different.

 

 

Do you’ve an idea how to reduce the processing time? Maybe a predefined grid?

 

 

Thank you so much and best regards.

 

 

Konrad

Unfortunately, the RasterCellCoercer is really slow when dealing with large datasets.

Apparently you can also consider using the PointCloudCoercer which supposedly is a lot faster, but I've never used it myself.


Hi @UBA_KP, generally, creating many features consume much resource and could take a long time. The RasterCellCoercer is a typical case. Since it creates large number of new features (number of columns x number of rows), could take a long time to complete the processing as @david_r mentioned.

Although Python API for raster manipulations has been introduced in FME 2017, I don't think a large effect can be expected even if you use the API, as long as the large number of features have to be created as the translation result.

However, if you could significantly reduce the number of features that should be created by setting some conditions, Python scripting could be a possible way to improve the performance and I think it would be worth to try.


The RasterToPolygonCoercer can be an alternative but as mentioned before this does not necessarly mean that it will be faster.

If the resolution of the rasters is not very important, consider resampling before converting to polygons.


Thank you so much for answering my question @david_r, @takashi and @itay.

 

 

Maybe there’s a possibility to factorize the spatial information and operations?

 

 

It means that the attribute values of the NetCDF data have to written into a list with an allocated ID. Afterwards the ID’s can combined with the ID’s of a predefined grid. It might be possible that the processing time can be reduced on this way.

 

 

What do you think about that? Maybe you’ve an idea. :)

 

 

You’re right @itay, the “RasterToPolygonCoercer” works much faster.

 

 

Thank you and best regards. :)

 

 

Konrad

It seems that my question was not clear enough. Is it possible to write NetCDF data into a table without spatial information?


Yes, remove the geometry with the GeometryRemover after extracting the data into attributes.


It seems that my question was not clear enough. Is it possible to write NetCDF data into a table without spatial information?

Which part of the NetCDF? Only the layer infos or also all the cell values? If you need the cell values, what's the database schema like?
Which part of the NetCDF? Only the layer infos or also all the cell values? If you need the cell values, what's the database schema like?
I only need the cell values for every column/line.

 

 


Yes, remove the geometry with the GeometryRemover after extracting the data into attributes.

I've done this but the table is still empty. I'd like to have all cell values, for this I've to use the "RasterToPolygonCoercer" or "RasterToPointCoercer". This needs a lot of time and I'm looking for alternatives.

 


I only need the cell values for every column/line.

 

 

Not sure I understand. You need all the cell values then?

It seems that my question was not clear enough. Is it possible to write NetCDF data into a table without spatial information?

What kind of table is required? For example, does a CSV table (comma separated cell values x rows) satisfy the requirement?

 


Hi Konrad,

You have already seen how quick
the list representation of NetCDF data is very quick. If you can ignore
the rasterness of the data in what you are trying to accomplish i would do so.
But for slice/chunking NetCDF, there might be a RCaller recipe out there
you could use which would be more dynamic.

But if your NetCDF is a known
size every time.

Have you tried to use a
RasterTiler? You could force the Row/Col layout of the tiles to be 1 row
and n of how many columns your data has. this would allow you to limit
the amount of dataset I/O (_tile_column = 1) you read and could express the list
values into your features at that point.

and then drop the geometry altogether as simple table records.


Hi Konrad,

You have already seen how quick
the list representation of NetCDF data is very quick. If you can ignore
the rasterness of the data in what you are trying to accomplish i would do so.
But for slice/chunking NetCDF, there might be a RCaller recipe out there
you could use which would be more dynamic.

But if your NetCDF is a known
size every time.

Have you tried to use a
RasterTiler? You could force the Row/Col layout of the tiles to be 1 row
and n of how many columns your data has. this would allow you to limit
the amount of dataset I/O (_tile_column = 1) you read and could express the list
values into your features at that point.

and then drop the geometry altogether as simple table records.

Thank you very much for your input @scyphers. I'll try this very soon.

 


Not sure I understand. You need all the cell values then?
Thank you @david_r. I'll try what @scyphers has recommended and I'll give you feedback afterwards.
What kind of table is required? For example, does a CSV table (comma separated cell values x rows) satisfy the requirement?

 

Thank you @takashi,
I don't need any table format like the CSV format. I'm interested to
create a table-based process without geometry, because I think it's
faster. At the end of my process I'd like to write all table values into
a predefined grid. I'll try what @scyphers has recommended and I'll give you feedback afterwards.

 


I'm thinking that some of our new technology for handling large numbers of identical schema features would really really shine here (someday). In the meantime, about the best I can recommend is to run FME in Parallel across all these, using either workspacerunner or a custom transformer that does the work and runs in parallel...


I'm thinking that some of our new technology for handling large numbers of identical schema features would really really shine here (someday). In the meantime, about the best I can recommend is to run FME in Parallel across all these, using either workspacerunner or a custom transformer that does the work and runs in parallel...

Hi @daleatsafe, I'm very expectant how to handle large numbers of identical schema features in future. Thank you very much and best reagrds.

 


Reply