Skip to main content
Question

How to know what is inside zip file for processing

  • 19 September 2016
  • 11 replies
  • 124 views

Hi all,

New on the forum but not on FME 🙂 (although not an expert). I come acrross a doubt on how to work with zip files. I know FME reads zip files directly but only if you know what kind of file is inside it. So If I have some URL with zip files but those can have dwg, shp, png...I cant really figure it out how to say FME ok, you shape file go this path, dwg this other path... I know I could use httpcaller and store them, unzip and work with them after unzipping by filtering out the inputs but I want to save disk space and dont really want to unzip.

Any idea will be welcome.

Thanks!

Hi @geodavid76, I think Python ZipFile.namelist() method would be a quick way. See here to learn more: Python documentation | ZipFile.namelist()


Hi @geodavid76, you could also use the list commend on 7z and read back the contents of the archive into the workspace.


I have tried with the zipfile.namelist() but I dont manage to get data out of it although seems to be working.

from zipfile import ZipFile

 

 

def fmeUnzip(fmeFeature):

 

zipFile = fmeFeature.getAttribute("_response_file_path")

 

z = ZipFile(zipFile)

 

z.namelist()

how can I print in python to a field in FME? This is what Im not doing well now...


I have tried with the zipfile.namelist() but I dont manage to get data out of it although seems to be working. 

from zipfile import ZipFile

 

 

def fmeUnzip(fmeFeature):

 

  zipFile = fmeFeature.getAttribute("_response_file_path")

 

  z = ZipFile(zipFile)

 

  z.namelist()

how can I print in python to a field in FME? This is what Im not doing well now...

@geodavid76, the namelist method returns a Python list containing archived file names as its elements, so you will have to retrieve each element. For example, a PythonCaller with this script outputs features having an attribute called "_name" that stores a file name for each.

 

# PythonCaller Script Example
from zipfile import ZipFile
class FeatureProcessor(object):
    def input(self, feature):
        path = feature.getAttribute("_response_file_path")
        with ZipFile(path, 'r') as z:
            for name in z.namelist():
                feature.setAttribute('_name', name)
                self.pyoutput(feature)

It works perfectly now!


Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 


Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 

Is this still the case, python is the only option to validate the contents of a zipfile?


Is this still the case, python is the only option to validate the contents of a zipfile?

I think so. I can see a number of different requests for this functionality, spread across several issues in our database. So I've pinged the developers to let them know and see if we can get a fix scheduled. fyi the new reference number is FMEENGINE-38193.

But as for now, I think Python is the most reliable 😒


Is this still the case, python is the only option to validate the contents of a zipfile?

I thought i might be able to do something clever by using the schema output on the FeatureReader, but I'd need some sort of schemas first option I think, otherwise I'd have to introduce a fairly convoluted workflow to ensure that features haven't started to be written before confirming that all the files in the zip file are present and correct. It's only a line of python to get the file list...


Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 

@mark2atsafe​ Hi Mark.

 

Is there any update on this, as I have got exactly the same issue as @geodavid76​ ?


It seems working now, at least inside a FeatureReader (FME Form 2024.1.1.0 (20240729 - Build 24619 - WIN64) and FME Flow 2024.1.1.1 Build 24620 - linux-x64) )

The ‘Directory and File Pathnames’ reader may need to be tricked, by setting the Dataset parameter to an attribute with the path to the .zip file. It nicely lists the contents of the .zip file.


Reply