Currently I have 23 CoG’s in an Azure Blob container and I want to retrieve only the area of those CoG’s covered by a rectangle that I use as the initiator for a FeatureReader.
Unfortunately, the FeatureReader is reading first all the CoG’s in before making the decision where the bounding box is within, and then it returns the entire CoG, where I was rather expecting it would return the part the bounding box is covering.
The CoG’s originate from the Copernicus DEM coverage GLO-30, contain overviews, etc…
It takes almost 3 minutes for the FeatureReader to process the request. The bounding box is within a single CoG.
When looking at the parameters for the COG reader, then I can see there is actually the possibility to set a search envelope. I could feed the bounding box to the envelope, which sadly is missing in the FeatureReader.
Is there a way to speed things up? I only want to have the parts of the COG’s covered by the bounding box used as initiator, basically performing a range request.
Page 1 / 1
Hello @tb09114, thanks for posting! The functionality you’re after has just been added to the FeatureReader in FME Form 2024.2 b24783 (FMEENGINE-8214). You are now able to clip using the initiator geometry: https://fme.safe.com/downloads/. Hope this helps, Kailin.
Hi @kailinatsafe, thanks for your answer!
I just installed the new Form version and gave it a try. The clip to the initiators envelop works as it should. However, looking at the log file I can see that the FeatureReader is still reading all GeoTiff (or COG) files in the folder before evaluating if one, or multiple intersect the result. The effect this has is that basically all the processing time of the entire workspace is within this procedure.
I was wondering if this would be something for the ideas: GeoTiff and COG all come with information about the coordinate system and their corner coordinates. It takes a split second to retrieve these information using e.g. gdalinfo. Wouldn’t it be much faster to read first the meta-info from the files, evaluate which of the files bounding box intersects with the envelop of the initiator, and therafter to only continue with the files the initiator intersects with?
Hi @kailinatsafe, thanks for your answer!
I just installed the new Form version and gave it a try. The clip to the initiators envelop works as it should. However, looking at the log file I can see that the FeatureReader is still reading all GeoTiff (or COG) files in the folder before evaluating if one, or multiple intersect the result. The effect this has is that basically all the processing time of the entire workspace is within this procedure.
I was wondering if this would be something or the ideas: GeoTiff and COG all come with information about the coordinate system and their corner coordinates. It takes a split second to retrieve these information using e.g. gdalinfo. Wouldn’t it be much faster to read first the meta-info from the files, evaluate which of the files bounding box intersects with the envelop of the initiator, and therafter to only continue with the files the initiator intersects with?
Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out?
Yeah I just found a dataset and tested it (in FME 2024.1). Indeed FME does seem to download the whole image when it should be just be requesting parts on the file based on the header info in the file(s).
The benefit of a COG is to be able to request just parts of the data as internally it’s already pre tiled into chunks - it should work in a similar way to a WMS or WMTS
FME should definitely have the capability to support reading the data in this way via URL.
Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out?
Yes I was under the same impression. Though I think the intention is/was to support the full functionality of COG.
Oh man - I has assumed that FME would handle this. Have you got a hosted dataset I can try out?
Yes I was under the same impression. Though I think the intention is/was to support the full functionality of COG.
I hope so - They’ve even added a specific reader for COG but I think the implementation is either not working as expected (e.g., a bug) or it was never properly implemented in the first place.
FeatureReader in FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions, is just to be updated with recent updates as a User.
FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions...
@evoteck , since yesterday I am running build 24801, and I cannot see any possibility to have an influence on what the FeatureReader is reading before considering the initiators envelope.
If you could elaborate on how you manage that the FeatureReader is reading only the COG’s the initiator is intersecting with?
FME Form 2024.2 b24783 (FMEENGINE-8214) has the necessary solutions...
@evoteck , since yesterday I am running build 24801, and I cannot see any possibility to have an influence on what the FeatureReader is reading before considering the initiators envelope.
If you could elaborate on how you manage that the FeatureReader is reading only the COG’s the initiator is intersecting with?
Yes I am also skeptical - as far as I’m aware the only addition is the check box to actually clip the image to the input request area.
It’s not only the FeatureReader. As @virtualcitymatt pointed out earlier, the COG reader does the same. I tried it in build 24801… set the bounding box for the envelope, set the check box to clip to the envelope and ran it. → All 23 COG’s are read. The whole thing takes 2:36 minutes. All in all the 23 COG’s have a size of 406 MB, where the biggest of the 23 has a size of 41.8 MB.
Essentially, when working with COG’s there is no need to read a) everything the directory contains, and b) not everything of the COG that’s intersecting with the envelope/initiator.
Hello @tb09114 @virtualcitymatt @evoteck, I decided to look into this more. So, COG/GeoTIFF is read via GDAL library. The streaming is used in the COG Reader, if all of these is true:
The file has the appropriate COG headers
The file is being read via HTTP or one of the WaaFS connectors we support (S3, Azure, or Google Cloud)
GDAL's cloud virtual FS infrastructure decides it should stream the file (Unsure the exact criteria here, it may always decide to do so.)
However, there are also some “wrinkles” here to be aware:
If you know a large proportion of the file will be used, it is more likely that downloading the file in advance will be faster. This can be done via one of the connector transformers to pull it to a local filesystem. This will almost certainly be faster than issuing several small requests to read the same amount of data.
Even thought the COG reader is used, this does not guarantee geotiff can be read in an optimized manner. The dataset requires additional optional headers that allow us to read the file in subsets.
Depending on how the file is organized, we may need to download more than is needed.
If you are willing to share a dataset and reader configuration that are showing poor performance, happy to take a look & see if we can improve things.
The FeatureReader parameter does seem to be working/clipping as expected. Best, Kailin.
Hi @kailinatsafe, thanks for looking further into this. I cannot grant access to the container, but I can certainly share the files within the container.
Hello @tb09114, awesome - happy if you share them here or if you’d prefer to submit a support case! Whatever works best, Kailin.
@kailinatsafe and @tb09114 I was able to get one to work as expected.
Good news. It would seem in my testing I had an environment variable set which was messing up HTTPS requests and for some reason it was falling back to downloading the whole image for no reason.
It does seems like more information about these “wrinkles” could be written down somewhere also perhaps warnings or an option to error in the case it can’t be streamed for whatever reason.
Needless to say I’m now very happy that this is working.
Allright @kailinatsafe and @virtualcitymatt, here a minimal workspace, log file, and two of the 23 COGs. The initiator only intersects with one of the COGs, but both of them are downloaded, and read when using Azure (Blob or FileShare container) as source. If I use a local source directory for the COGs then the workspace finishes in 1.8 seconds. For comparsion the Azure source will finish in 45 seconds with only two COGs in the container. But it takes almost 7 minutes when there are 23 COGs. There will be more then 26.000 COGs in the final container.
When I use the Azure Blob Storage and a web connection to authenticate then there is no problem getting access to the COGs.
When I use the URL (https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk) or https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif then I am getting the following returned when using the same web connection to authenticate as before:
Worker 12328 > COG reader: Failed to open the dataset '/vsicurl/https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif'. Please ensure source data is valid and correct reader is selected
Worker 12328 > Failed to obtain any schemas from reader 'COG' from 1 datasets. This may be due to invalid datasets or format accessibility issues due to licensing, dependencies, or module loading. See logfile for more information
Worker 12328 > Failed to read schema features from dataset 'https://fmestoragetest.blob.core.windows.net/world-dsm/dsm_dk/*.tif' using the 'COG' reader
Hello @tb09114, thank you for providing the files, workspace and explanation. I’ll throw your DEM on S3 and do some testing. I will hopefully reach out with updates in a bit. Best, Kailin.