Skip to main content
Solved

Read contents of a Zip from inside a zip

  • January 18, 2016
  • 10 replies
  • 220 views

davideagle
Contributor
Forum|alt.badge.img+22

We have a need to read some spatial data that is zipped up, but several of the zips are then nested inside another zip, just to make things fun. Like this:

c:\\temp\\top.zip\\1.zip\\1.gml

c:\\temp\\top.zip\\2.zip\\2.gml

FME can read say GML from inside a Zip, but not GML from a Zip inside a Zip. Anyone got any magic to unpack the top level Zip? I suspect the answer is Python, perhaps as either a PythonCaller or Startup script and then I could use a FeatureReader to get at the data at the next level.

Seems a bit contrived but its a genuine requirement as its the way a product is shipped. 1 zip file was clearly not enough!

Thanks

Best answer by takashi

Hi @1spatialdave, I also sometimes come across a nested zip. I have posted an Idea before: Add ability to read features from Archived Dataset within Nested Zip File

As you mentioned, Python can do that. e.g.

# Script Example for PythonCreator
# Extract zip files nested in a zip file,
# create features containing an attribute that stores extracted zip file path.
# Not recursive. Applicable to just one level nesting.
import fmeobjects, zipfile, os
class NestedZipUnpacker(object):
    def close(self):
        folder = FME_MacroValues['OUTPUT_FOLDER_PATH']
        try:
            # Extract all files archived in the spacified zip file.
            with zipfile.ZipFile(FME_MacroValues['ZIPFILE_PATH']) as z:
                z.extractall(folder)
            # Create features for each extracted zip file path.
            for path in [os.path.join(folder, fname) for fname in os.listdir(folder)]:
                if zipfile.is_zipfile(path):
                    feature = fmeobjects.FMEFeature()
                    feature.setAttribute('_zip_path', path)
                    self.pyoutput(feature)
        except Exception as ex:
            logger = fmeobjects.FMELogFile()
            logger.logMessageString('%s' % ex, fmeobjects.FME_ERROR)

Assume that these two user parameters are defined in the workspace.

  • OUTPUT_FOLDER_PATH: existing folder path into which the extracted zip files will be saved
  • ZIPFILE_PATH: the top level zip file path

FYI.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

10 replies

erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • January 18, 2016

Have you tried c:\\temp\\top.zip\\**\\1.gml?


davideagle
Contributor
Forum|alt.badge.img+22
  • Author
  • Contributor
  • 578 replies
  • January 18, 2016

Have you tried c:\\temp\\top.zip\\**\\1.gml?

Yep, tried that first and a few variations in different versions of FME including the Beta... parsing error when trying to get inside the nested Zip.


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • January 18, 2016

David,

You can use the following:

Create a workspace reading the GML.

Create a master workspace using the Directory and Filepath reader like in the picture.

I tried this and it works:


takashi
Celebrity
  • 7843 replies
  • Best Answer
  • January 19, 2016

Hi @1spatialdave, I also sometimes come across a nested zip. I have posted an Idea before: Add ability to read features from Archived Dataset within Nested Zip File

As you mentioned, Python can do that. e.g.

# Script Example for PythonCreator
# Extract zip files nested in a zip file,
# create features containing an attribute that stores extracted zip file path.
# Not recursive. Applicable to just one level nesting.
import fmeobjects, zipfile, os
class NestedZipUnpacker(object):
    def close(self):
        folder = FME_MacroValues['OUTPUT_FOLDER_PATH']
        try:
            # Extract all files archived in the spacified zip file.
            with zipfile.ZipFile(FME_MacroValues['ZIPFILE_PATH']) as z:
                z.extractall(folder)
            # Create features for each extracted zip file path.
            for path in [os.path.join(folder, fname) for fname in os.listdir(folder)]:
                if zipfile.is_zipfile(path):
                    feature = fmeobjects.FMEFeature()
                    feature.setAttribute('_zip_path', path)
                    self.pyoutput(feature)
        except Exception as ex:
            logger = fmeobjects.FMELogFile()
            logger.logMessageString('%s' % ex, fmeobjects.FME_ERROR)

Assume that these two user parameters are defined in the workspace.

  • OUTPUT_FOLDER_PATH: existing folder path into which the extracted zip files will be saved
  • ZIPFILE_PATH: the top level zip file path

FYI.


takashi
Celebrity
  • 7843 replies
  • January 23, 2016

Hi @1spatialdave, I also sometimes come across a nested zip. I have posted an Idea before: Add ability to read features from Archived Dataset within Nested Zip File

As you mentioned, Python can do that. e.g.

# Script Example for PythonCreator
# Extract zip files nested in a zip file,
# create features containing an attribute that stores extracted zip file path.
# Not recursive. Applicable to just one level nesting.
import fmeobjects, zipfile, os
class NestedZipUnpacker(object):
    def close(self):
        folder = FME_MacroValues['OUTPUT_FOLDER_PATH']
        try:
            # Extract all files archived in the spacified zip file.
            with zipfile.ZipFile(FME_MacroValues['ZIPFILE_PATH']) as z:
                z.extractall(folder)
            # Create features for each extracted zip file path.
            for path in [os.path.join(folder, fname) for fname in os.listdir(folder)]:
                if zipfile.is_zipfile(path):
                    feature = fmeobjects.FMEFeature()
                    feature.setAttribute('_zip_path', path)
                    self.pyoutput(feature)
        except Exception as ex:
            logger = fmeobjects.FMELogFile()
            logger.logMessageString('%s' % ex, fmeobjects.FME_ERROR)

Assume that these two user parameters are defined in the workspace.

  • OUTPUT_FOLDER_PATH: existing folder path into which the extracted zip files will be saved
  • ZIPFILE_PATH: the top level zip file path

FYI.

Inspired from this discussion, published a custom transformer named ZipExtractor in the FME Store.


davideagle
Contributor
Forum|alt.badge.img+22
  • Author
  • Contributor
  • 578 replies
  • January 29, 2016

Inspired from this discussion, published a custom transformer named ZipExtractor in the FME Store.

@takashi, thanks so much for going to the trouble to create that Transformer. It does the job perfectly. All the best, David.


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • January 29, 2016

David,

You can use the following:

Create a workspace reading the GML.

Create a master workspace using the Directory and Filepath reader like in the picture.

I tried this and it works:

@1spatialdave Did you ever get it to work this way. I used it to process over 300K GML files (nested zipped) and it works without unzipping.


takashi
Celebrity
  • 7843 replies
  • January 30, 2016

@takashi, thanks so much for going to the trouble to create that Transformer. It does the job perfectly. All the best, David.

@1spatialdave, good to hear! It's my pleasure. Takashi


lau
Forum|alt.badge.img+3
  • 65 replies
  • July 19, 2016

Inspired from this discussion, published a custom transformer named ZipExtractor in the FME Store.

Thanks a lot for this custom transformer. It works perfectly!


ottadini
Supporter
Forum|alt.badge.img+5
  • Supporter
  • 28 replies
  • June 27, 2018

Inspired from this discussion, published a custom transformer named ZipExtractor in the FME Store.

hi @takashi, i made some mods to the zipextractor. i added some comments to the item on the Hub. If you like I can post you all the changes.

 

ben