Solved

Extracting CSV files from large zip file

  • 26 March 2018
  • 8 replies
  • 81 views

Im simply trying to get fme to use FileCopy (so that I can copy csv files inside a large zip file (about 4 gigs)) to a network folder. I only used a reader with format set to CSV, dataset set to the path of the zip file, and workflow options set to Individual (connected to filecopy writer). Are there any issues with fme and large zip files? Is there a better method to extracting a specific file type from the zip than what I used? Thanks for any help

--latest error states: "failed to open file. failed to get any schema from reader". i am able to manually use filezila or 7 zip and perform this

icon

Best answer by takashi 27 March 2018, 05:12

View original

8 replies

Userlevel 4
Badge +30

Hi @jvickrey656,

Could you share with us your template file and logfile?

Thanks,

Danilo

Userlevel 2
Badge +17

Hi @jvickrey656, if you need to extract a CSV file from the Zip archive and save it to a specific directory, the ZipExtractor custom transformer from FME Hub might help you.

If you just need to copy the Zip to a specific directory without extracting, the File Copy writer does that.

Anyway you don't need to read data from the CSV file using the CSV reader.

Userlevel 2
Badge +17

Hi @jvickrey656, if you need to extract a CSV file from the Zip archive and save it to a specific directory, the ZipExtractor custom transformer from FME Hub might help you.

If you just need to copy the Zip to a specific directory without extracting, the File Copy writer does that.

Anyway you don't need to read data from the CSV file using the CSV reader.

By the way, what do you mean the "network folder" here? If you intend to upload the extracted CSV file to a directory in an FTP server, you need to save the extracted CSV file to a folder in your local machine temporarily and then upload it to the server with the FTPCaller.

 

Hi @jvickrey656, if you need to extract a CSV file from the Zip archive and save it to a specific directory, the ZipExtractor custom transformer from FME Hub might help you.

If you just need to copy the Zip to a specific directory without extracting, the File Copy writer does that.

Anyway you don't need to read data from the CSV file using the CSV reader.

Hi takashi- if I use ZipExtractor then how do I specify I want csv files out of the zip? I'll attach my fmw and log. And what I meant by "network folder" is just a shared folder on our office network (server). Based on your explanation it sounds like I need to use ZipExtractor but I need the correct syntax in order to tell that transformer to grab any CSV file within the zip and copy all of the CSV's to a directory

 

 

csv22filecopyheader.fmw

 

csv22filecopyheader.txt

 

 

Userlevel 2
Badge +17

Hi @jvickrey656, if you need to extract a CSV file from the Zip archive and save it to a specific directory, the ZipExtractor custom transformer from FME Hub might help you.

If you just need to copy the Zip to a specific directory without extracting, the File Copy writer does that.

Anyway you don't need to read data from the CSV file using the CSV reader.

Basically, just set the zip file path to the "Source Zip File" parameter and set the destination folder path to the "Destination Root Folder" parameter. However, I found the error message "BadZipfile: zipfiles that span multiple disks are not supported" in the log you have attached.

 

The ZipExtractor contains a Python script with the Python standard "zipfile" module, and the module doesn't support extracting files from a disk to another disk unfortunately.

 

A workaround I can think of is:

 

  1. Copy the source zip file to local disk (FeatureWriter with File Copy writer).
  2. Extract csv files from the zip file and save them into the same disk temporarily (ZipExtractor).
  3. Read the paths of the csv files (Directory and File Pathnames reader), then copy them to the destination folder (File Copy writer).
In addition, the TempPathnameCreator is convenient to make a temporary folder/file path. FME will automatically remove all files saved in the temporary path after the translation has completed.

 

 

Basically, just set the zip file path to the "Source Zip File" parameter and set the destination folder path to the "Destination Root Folder" parameter. However, I found the error message "BadZipfile: zipfiles that span multiple disks are not supported" in the log you have attached.

 

The ZipExtractor contains a Python script with the Python standard "zipfile" module, and the module doesn't support extracting files from a disk to another disk unfortunately.

 

A workaround I can think of is:

 

  1. Copy the source zip file to local disk (FeatureWriter with File Copy writer).
  2. Extract csv files from the zip file and save them into the same disk temporarily (ZipExtractor).
  3. Read the paths of the csv files (Directory and File Pathnames reader), then copy them to the destination folder (File Copy writer).
In addition, the TempPathnameCreator is convenient to make a temporary folder/file path. FME will automatically remove all files saved in the temporary path after the translation has completed.

 

 

Hi Takashi, do you have an example fmw I can look at it to see what you mean on your workaround? Did you mean FeatureReader to FileCopy? It makes sense what you are saying but having some trouble implementing that workaround

 

 

Userlevel 2
Badge +17

Hi @jvickrey656, if you need to extract a CSV file from the Zip archive and save it to a specific directory, the ZipExtractor custom transformer from FME Hub might help you.

If you just need to copy the Zip to a specific directory without extracting, the File Copy writer does that.

Anyway you don't need to read data from the CSV file using the CSV reader.

Probably this workflow works. Directory and File Pathnames (PATH) reader was not essential.

If the source zip file is saved in the local disk, the first TempPathnameCreator and the FeatureWriter are not necessary. You can remove them and then set the source zip file path to the Source Zip File parameter in the ZipExtractor.

 

 

 

Probably this workflow works. Directory and File Pathnames (PATH) reader was not essential.

If the source zip file is saved in the local disk, the first TempPathnameCreator and the FeatureWriter are not necessary. You can remove them and then set the source zip file path to the Source Zip File parameter in the ZipExtractor.

 

 

 

 

Thanks takashi

Reply