Skip to main content

We are uploading large zip files containing a folder structure like this, but much more folders and files in it.

Folder1

- file1

- file 2

Folder 2:

-file3

- file4

 

The upload is done with the Data Upload Service and we are using the parameter opt_extractarchive to automatically unzip all files.

In a next step, a FME service will go through the unzipped file structure on the server and process these files. We have realized that FME does not process part of the uploaded files, there is always a list of files which are not processed. But with every upload, the files not processed are changing, sometimes the first 30 files are processed well and the next 25 are missing, sometimes the first 35 files are ok...

Could it be that the unzip happens after the upload, and when we start the processing job, maybe not all files are unzipped yet?

We have this issue only with large zip folder structures (they contain a lot of pdf files and images in it).

How is that processing triggered? It could very well be that it is a timing issue, as unzipping a large archive can take quite a bit of time.


Hi @nsulzberger could you share a few more details about the issue. Such as how many files are in each folder. If you could share the log file from the workspace that would also be very helpful.


@daraghbroderick one example zip file consists of 160 folders (all in same parent folder, and in each of the folder there is one pdf file. THe size of the zip is ~60MB.

@redgeographics the unzip is triggered with the parameter opt_extractarchive during the upload. After we get a successful reponse from the upload we start the processing job. I would like to know if the unzip is part of the upload job, or if get the upload success message before the unzip actually happened.

 


Really interesting question, I suspect that FME sends the response upon successful upload rather than zipping although you would hope it was synchronous

 

 

I would suggest as a test you try to use a path reader to see if everything as been extracted - if they are missing then it's a good sign that there is a problem. If you really wanted to give this a proper try you could read the directory multiple times every second (use a decelerator) and see if you get more an more files with each read of the directory.

 

 

I have had some issues with reading multiple files in a single reader when making a REST call so it could be some funny business with the readers ad your REST call.

 

 


I guess I need to refactor it and do the unzipping as a part of the processing job with the Unzipper Transformer. Only this way I can make sure that the files are all unzipped before the processing starts.


I guess I need to refactor it and do the unzipping as a part of the processing job with the Unzipper Transformer. Only this way I can make sure that the files are all unzipped before the processing starts.

Yeah - The unzipper works well. It's what I use. I can also then check the extension and map it to a format for the generic reader which makes it nice and clean and lets you process a lot of formats with a similar workflow.


Reply