Question

Error reading large zip files on FME Cloud

  • 17 November 2015
  • 2 replies
  • 3 views

Badge
Why am I getting "Dataset does not exist" error when reading large ESRI Grid within a zip file?

 

 

I’ve been able to run the clip, zip ship service on several test zips of varying size.

 

 

 

The top one works, as does the middle one. When I try to run a job using the 3rd on (tas5m) I get the following error:

 

2015-11-16 23:52:43| 16.8| 0.0|ERROR |ARCVIEWGRID reader: Dataset '/data/fmeserver/Data/ElvIS/test/00e86b75-tas5m-4b3b-b273-a047da02ea65.zip/00e86b75-tas5m-4b3b-b273-a047da02ea65' does not exist

 

 

All these zips have the same internal structure and all have been downloaded from FME Cloud and tested with FME Desktop to ensure they are valid (they all are).

 

This is the structure inside the zip:

 

 

Where the top directory is an ESRI Grid and the Ancillary folder contains some metadata documents. Not sure why it throws the ‘does not exist’ error when its clearly there when I download it and unpack it.

 

When a job runs, bounding coordinates are passed in and used to extract the requested extent from the ESRI Grid using the reader’s Search Envelope parameters. This extract is written out to a zip along with the entire Ancillary directory.

 

 

When the job runs successfully this is what the ArcView Grid reader lines in the log file look like:

 

2015-11-17 04:03:45| 13.9| 0.0|WARN |ARCVIEWGRID reader: A dataset opened by GDALOpenShared should have the same filename (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65\\w001001.adf) and description (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65)

 

2015-11-17 04:03:45| 13.9| 0.0|INFORM|Using MultiWriter $Revision$ ( $Date$ ) with keyword `MULTI_WRITER' to output data (ID_ATTRIBUTE is `multi_writer_id')

 

2015-11-17 04:03:45| 13.9| 0.0|INFORM|Writer output will be ordered by value of multi_writer_id

 

2015-11-17 04:03:45| 13.9| 0.0|INFORM|Loaded module 'Python_func' from file '/data/fmeserver/Server/fme/plugins/python_func.so'

 

2015-11-17 04:03:45| 13.9| 0.0|INFORM|FME API version of module 'Python_func' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.0| 0.1|INFORM|Loaded module 'Geometry_func' from file '/data/fmeserver/Server/fme/plugins/geometry_func.so'

 

2015-11-17 04:03:46| 14.0| 0.0|INFORM|FME API version of module 'Geometry_func' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.0| 0.0|INFORM|NoFeaturesTester_StatisticsCalculator_Exploder(ElementFactory): LEAN_AND_MEAN processing enabled

 

2015-11-17 04:03:46| 14.0| 0.0|INFORM|Loaded module 'RasterProperties_func' from file '/data/fmeserver/Server/fme/plugins/rasterproperties_func.so'

 

2015-11-17 04:03:46| 14.0| 0.0|INFORM|FME API version of module 'RasterProperties_func' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.1| 0.0|INFORM|Loaded module 'RasterSubsetFactory' from file '/data/fmeserver/Server/fme/plugins/rastersubsetfactory.so'

 

2015-11-17 04:03:46| 14.1| 0.0|INFORM|FME API version of module 'RasterSubsetFactory' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.1| 0.0|INFORM|... Last line repeated 2 times ...

 

2015-11-17 04:03:46| 14.1| 0.0|WARN |ARCVIEWGRID reader: A dataset opened by GDALOpenShared should have the same filename (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65\\w001001.adf) and description (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65)

 

2015-11-17 04:03:46| 14.1| 0.0|INFORM|Predefined coordinate system `LL-WGS84' (WGS84 Lat/Longs) matches dataset coordinate system

 

2015-11-17 04:03:46| 14.1| 0.0|INFORM|The OGC definition of the FME coordinate system 'LL-WGS84' is 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9108"]],AUTHORITY["EPSG","4326"]]'

 

2015-11-17 04:03:46| 14.2| 0.1|INFORM|MULTI_WRITER: Output will be zipped

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|Creating writer for format: GeoTIFF (Geo-referenced Tagged Image File Format)

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|Trying to find a DYNAMIC plugin for writer named `GEOTIFF'

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|FME API version of module 'GEOTIFF' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|FME Configuration: Destination coordinate system set to `LL-WGS84'

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|Coordinate System `LL-WGS84' parameters: CS_NAME=`LL-WGS84' DESC_NM=`WGS84 Lat/Longs' DT_NAME=`WGS84' GROUP=`LL' MAP_SCL=`1' PROJ=`LL' QUAD=`1' SCL_RED=`1' UNIT=`DEGREE'

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|FME API version of module 'GEOTIFF' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.2| 0.0|INFORM|Writer `GEOTIFF_1' of type `GEOTIFF' using group definition keyword `GEOTIFF_1_DEF'

 

2015-11-17 04:03:46| 14.3| 0.0|INFORM|FME API version of module 'GEOTIFF' matches current internal version (3.7 20150407)

 

2015-11-17 04:03:46| 14.3| 0.0|INFORM|GEOTIFF writer: Writing to destination dataset '/mnt/fme_temp/fmeengines/localhost_Engine4/_auto_zip_dataset_1447733026462_405/DEM_CLIP.tif'

 

2015-11-17 04:03:46| 14.3| 0.0|WARN |ARCVIEWGRID reader: A dataset opened by GDALOpenShared should have the same filename (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65\\w001001.adf) and description (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65)

 

2015-11-17 04:03:46| 14.3| 0.0|STATS |ARCVIEWGRID_1ClippingFactoryPipeline::CLIPPER(ClippingFactory): Processed 1 input feature(s), of which 1 feature(s) were clipped, 0 feature(s) were totally inside and 0 feature(s) were totally outside

 

I can’t figure out why the larger zip files aren’t working. Is FME copying the data to a temp location (see line below from the log) before doing the processing? Perhaps it’s running out of memory when trying to process the larger zip files. Note that when the 00e86b75-tas5m-4b3b-b273-a047da02ea65 ESRI Grid is unpacked, it is 28.7GB.

 

ARCVIEWGRID reader: A dataset opened by GDALOpenShared should have the same filename (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65\\w001001.adf) and description (/mnt/fme_temp/fmeengines/localhost_Engine4/00e86b75-5920-4b3b-b273-a047da02ea65.zip_1447733006760_405/00e86b75-5920-4b3b-b273-a047da02ea65)

 

2015-11-17 04:03:46| 14.3| 0.0|STATS |ARCVIEWGRID_1ClippingFactoryPipeline

 

 

I’m using a Starter instance of FME Cloud with the following specs

 

 

 

Any thoughts?

 

 

Rob

2 replies

Userlevel 4
Badge +13

We've created a support case on this. Answer still welcome of course

Userlevel 4
Badge +13
Working through this issue in support we are fairly sure this problem comes from the fact that when FME reads source data from zip files we unzip the data into FME's temp directory.

 

Normally this is fine, but the files are very large (28.7 GB), and on FME Cloud the temp directory for FME is on the local instance disk and not the larger and resizable ESB disk. The instance disk is limited in size and therefore we are likely running out of temp space.

 

A solution we are exploring is using a python scripted parameter to unzip the source dataset before FME reads it into FME Server's Resources/Data directory which resides on the large and resizable ESB disk.

Reply