Skip to main content
Question

How to work properly with CoG's?


tb09114
Supporter
Forum|alt.badge.img+23

Currently I have 23 CoG’s in an Azure Blob container and I want to retrieve only the area of those CoG’s covered by a rectangle that I use as the initiator for a FeatureReader.

Unfortunately, the FeatureReader is reading first all the CoG’s in before making the decision where the bounding box is within, and then it returns the entire CoG, where I was rather expecting it would return the part the bounding box is covering.

The CoG’s originate from the Copernicus DEM coverage GLO-30, contain overviews, etc…

It takes almost 3 minutes for the FeatureReader to process the request. The bounding box is within a single CoG.

When looking at the parameters for the COG reader, then I can see there is actually the possibility to set a search envelope. I could feed the bounding box to the envelope, which sadly is missing in the FeatureReader.

 

Is there a way to speed things up? I only want to have the parts of the COG’s covered by the bounding box used as initiator, basically performing a range request.

19 replies

kailinatsafe
Safer
Forum|alt.badge.img+21

Hello ​@tb09114, thanks for posting! The functionality you’re after has just been added to the FeatureReader in FME Form 2024.2 b24783 (FMEENGINE-8214). You are now able to clip using the initiator geometry: https://fme.safe.com/downloads/. Hope this helps, Kailin. 

 


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 17, 2024

Hi ​@kailinatsafe, thanks for your answer!

I just installed the new Form version and gave it a try. The clip to the initiators envelop works as it should. 🙂
However, looking at the log file I can see that the FeatureReader is still reading all GeoTiff (or COG) files in the folder before evaluating if one, or multiple intersect the result. The effect this has is that basically all the processing time of the entire workspace is within this procedure.

I was wondering if this would be something for the ideas: GeoTiff and COG all come with information about the coordinate system and their corner coordinates. It takes a split second to retrieve these information using e.g. gdalinfo. Wouldn’t it be much faster to read first the meta-info from the files, evaluate which of the files bounding box intersects with the envelop of the initiator, and therafter to only continue with the files the initiator intersects with?


virtualcitymatt
Celebrity
Forum|alt.badge.img+35
tb09114 wrote:

Hi ​@kailinatsafe, thanks for your answer!

I just installed the new Form version and gave it a try. The clip to the initiators envelop works as it should. 🙂
However, looking at the log file I can see that the FeatureReader is still reading all GeoTiff (or COG) files in the folder before evaluating if one, or multiple intersect the result. The effect this has is that basically all the processing time of the entire workspace is within this procedure.

I was wondering if this would be something or the ideas: GeoTiff and COG all come with information about the coordinate system and their corner coordinates. It takes a split second to retrieve these information using e.g. gdalinfo. Wouldn’t it be much faster to read first the meta-info from the files, evaluate which of the files bounding box intersects with the envelop of the initiator, and therafter to only continue with the files the initiator intersects with?

Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out? 


virtualcitymatt
Celebrity
Forum|alt.badge.img+35

Yeah I just found a dataset and tested it (in FME 2024.1). Indeed FME does seem to download the whole image when it should be just be requesting parts on the file based on the header info in the file(s). 

The benefit of a COG is to be able to request just parts of the data as internally it’s already pre tiled into chunks - it should work in a similar way to a WMS or WMTS

FME should definitely have the capability to support reading the data in this way via URL.  


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 18, 2024
virtualcitymatt wrote:

Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out? 

Yes I was under the same impression. Though I think the intention is/was to support the full functionality of COG.


virtualcitymatt
Celebrity
Forum|alt.badge.img+35
tb09114 wrote:
virtualcitymatt wrote:

Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out? 

Yes I was under the same impression. Though I think the intention is/was to support the full functionality of COG.

I hope so - They’ve even added a specific reader for COG but I think the implementation is either not working as expected (e.g., a bug) or it was never properly implemented in the first place.  


evoteck
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • December 18, 2024

FeatureReader in FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions, is just to be updated with recent updates as a User.


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 18, 2024
evoteck wrote:

FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions...

@evoteck , since yesterday I am running build 24801, and I cannot see any possibility to have an influence on what the FeatureReader is reading before considering the initiators envelope.

If you could elaborate on how you manage that the FeatureReader is reading only the COG’s the initiator is intersecting with?


virtualcitymatt
Celebrity
Forum|alt.badge.img+35
tb09114 wrote:
evoteck wrote:

FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions...

@evoteck , since yesterday I am running build 24801, and I cannot see any possibility to have an influence on what the FeatureReader is reading before considering the initiators envelope.

If you could elaborate on how you manage that the FeatureReader is reading only the COG’s the initiator is intersecting with?

Yes I am also skeptical - as far as I’m aware the only addition is the check box to actually clip the image to the input request area.


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 18, 2024

It’s not only the FeatureReader. As ​@virtualcitymatt pointed out earlier, the COG reader does the same.
I tried it in build 24801… set the bounding box for the envelope, set the check box to clip to the envelope and ran it. → All 23 COG’s are read. The whole thing takes 2:36 minutes. All in all the 23 COG’s have a size of 406 MB, where the biggest of the 23 has a size of 41.8 MB.

Essentially, when working with COG’s there is no need to read a) everything the directory contains, and b) not everything of the COG that’s intersecting with the envelope/initiator.


kailinatsafe
Safer
Forum|alt.badge.img+21

Hello ​@tb09114 ​@virtualcitymatt ​@evoteck, I decided to look into this more. So, COG/GeoTIFF is read via GDAL library. The streaming is used in the COG Reader, if all of these is true:

  • The file has the appropriate COG headers
  • The file is being read via HTTP or one of the WaaFS connectors we support (S3, Azure, or Google Cloud)
  • GDAL's cloud virtual FS infrastructure decides it should stream the file (Unsure the exact criteria here, it may always decide to do so.)

However, there are also some “wrinkles” here to be aware: 

  • If you know a large proportion of the file will be used, it is more likely that downloading the file in advance will be faster. This can be done via one of the connector transformers to pull it to a local filesystem. This will almost certainly be faster than issuing several small requests to read the same amount of data.
  • Even thought the COG reader is used, this does not guarantee geotiff can be read in an optimized manner. The dataset requires additional optional headers that allow us to read the file in subsets.
  • Depending on how the file is organized, we may need to download more than is needed.

If you are willing to share a dataset and reader configuration that are showing poor performance, happy to take a look & see if we can improve things.

The FeatureReader parameter does seem to be working/clipping as expected. Best, Kailin. 


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 18, 2024

Hi ​@kailinatsafe, thanks for looking further into this. I cannot grant access to the container, but I can certainly share the files within the container. 


kailinatsafe
Safer
Forum|alt.badge.img+21

Hello ​@tb09114, awesome - happy if you share them here or if you’d prefer to submit a support case! Whatever works best, Kailin. 


virtualcitymatt
Celebrity
Forum|alt.badge.img+35

@kailinatsafe and ​@tb09114 I was able to get one to work as expected. 

Good news. It would seem in my testing I had an environment variable set which was messing up HTTPS requests and for some reason it was falling back to downloading the whole image for no reason.

It does seems like more information about these “wrinkles” could be written down somewhere also perhaps warnings or an option to error in the case it can’t be streamed for whatever reason.

Needless to say I’m now very happy that this is working. 

 


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 19, 2024

Allright ​@kailinatsafe and ​@virtualcitymatt, here a minimal workspace, log file, and two of the 23 COGs. The initiator only intersects with one of the COGs, but both of them are downloaded, and read when using Azure (Blob or FileShare container) as source.
If I use a local source directory for the COGs then the workspace finishes in 1.8 seconds. For comparsion the Azure source will finish in 45 seconds with only two COGs in the container. But it takes almost 7 minutes when there are 23 COGs.
There will be more then 26.000 COGs in the final container. 


tb09114
Supporter
Forum|alt.badge.img+23
  • Author
  • Supporter
  • December 19, 2024

 

 

When I use the Azure Blob Storage and a web connection to authenticate then there is no problem getting access to the COGs.

When I use the URL (https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk) or https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif then I am getting the following returned when using the same web connection to authenticate as before:

  1. Worker 12328 > COG reader: Failed to open the dataset '/vsicurl/https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif'. Please ensure source data is valid and correct reader is selected
  2. Worker 12328 > Failed to obtain any schemas from reader 'COG' from 1 datasets. This may be due to invalid datasets or format accessibility issues due to licensing, dependencies, or module loading. See logfile for more information
  3. Worker 12328 > Failed to read schema features from dataset 'https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif' using the 'COG' reader

kailinatsafe
Safer
Forum|alt.badge.img+21

Hello ​@tb09114, thank you for providing the files, workspace and explanation. I’ll throw your DEM on S3 and do some testing. I will hopefully reach out with updates in a bit. Best, Kailin.


kailinatsafe
Safer
Forum|alt.badge.img+21

Hello ​@tb09114, thanks for your patience - so sorry for the delay! I played around with your workspace & data. Tested the same process described earlier in the thread.


While reading with a bbox that intersects 2 COGS, I see files 'Opened', but not 'Retrieved'. I believe in this case, FME is streaming data, as no caching was noted in the logfile. Only the bbox data was returned.


With a spatial envelope, I believe each chunk of the bounding box overlaps will need to be read, even if most of that chunk is not overlapping the box. Meaning if the same request is made to 2 different data volumes, it is expected that the request made to the larger set of tiles will take longer (eg. more data to process in general).


Can you share how these COGs were produced? FME Form should produce COGs in a way where streaming conditions are met, and GDALs streaming infrastructure can be utilized.


I suspect the FeatureReader 'Worker' errors are being thrown because network authentication was not provided when you tried to select feature types. In the FeatureReader, network authentication section will be dynamically exposed within the FeatureReader COG parameters when a URI-or-similar is provided as the dataset. Please try adding network authentication and let me know if the issue is resolved.


Right now, unfortunately, it looks COG Reader lost the ability to use WaaFS / read files from web in FME Form 25.0. I’ve filed an issue to address this (FMEENGINE-85362). 


If you are seeing different reading behaviour please reach out via support case and share a complete logfile. Happy to help, Kailin.


kailinatsafe
Safer
Forum|alt.badge.img+21

Hey ​@tb09114, I finally tested COG reading Azure Blob Storage and am also seeing the data downloaded, opposed to streamed. This issue has been filed as: FMEENGINE-85670. Best, Kailin


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings