Skip to main content
Question

How to know what is inside zip file for processing


Forum|alt.badge.img

Hi all,

New on the forum but not on FME :) (although not an expert). I come acrross a doubt on how to work with zip files. I know FME reads zip files directly but only if you know what kind of file is inside it. So If I have some URL with zip files but those can have dwg, shp, png...I cant really figure it out how to say FME ok, you shape file go this path, dwg this other path... I know I could use httpcaller and store them, unzip and work with them after unzipping by filtering out the inputs but I want to save disk space and dont really want to unzip.

Any idea will be welcome.

Thanks!

11 replies

takashi
Influencer
  • September 19, 2016

Hi @geodavid76, I think Python ZipFile.namelist() method would be a quick way. See here to learn more: Python documentation | ZipFile.namelist()


itay
Supporter
Forum|alt.badge.img+17
  • Supporter
  • September 19, 2016

Hi @geodavid76, you could also use the list commend on 7z and read back the contents of the archive into the workspace.


Forum|alt.badge.img
  • Author
  • September 20, 2016

I have tried with the zipfile.namelist() but I dont manage to get data out of it although seems to be working.

from zipfile import ZipFile

 

 

def fmeUnzip(fmeFeature):

 

zipFile = fmeFeature.getAttribute("_response_file_path")

 

z = ZipFile(zipFile)

 

z.namelist()

how can I print in python to a field in FME? This is what Im not doing well now...


takashi
Influencer
  • September 20, 2016
geodavid76 wrote:

I have tried with the zipfile.namelist() but I dont manage to get data out of it although seems to be working. 

from zipfile import ZipFile

 

 

def fmeUnzip(fmeFeature):

 

  zipFile = fmeFeature.getAttribute("_response_file_path")

 

  z = ZipFile(zipFile)

 

  z.namelist()

how can I print in python to a field in FME? This is what Im not doing well now...

@geodavid76, the namelist method returns a Python list containing archived file names as its elements, so you will have to retrieve each element. For example, a PythonCaller with this script outputs features having an attribute called "_name" that stores a file name for each.

 

# PythonCaller Script Example
from zipfile import ZipFile
class FeatureProcessor(object):
    def input(self, feature):
        path = feature.getAttribute("_response_file_path")
        with ZipFile(path, 'r'as z:
            for name in z.namelist():
                feature.setAttribute('_name', name)
                self.pyoutput(feature)

Forum|alt.badge.img
  • Author
  • September 21, 2016

It works perfectly now!


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • September 21, 2016
Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 21, 2020
mark2atsafe wrote:
Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 

Is this still the case, python is the only option to validate the contents of a zipfile?


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • October 21, 2020
ebygomm wrote:

Is this still the case, python is the only option to validate the contents of a zipfile?

I think so. I can see a number of different requests for this functionality, spread across several issues in our database. So I've pinged the developers to let them know and see if we can get a fix scheduled. fyi the new reference number is FMEENGINE-38193.

But as for now, I think Python is the most reliable 😒


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 21, 2020
ebygomm wrote:

Is this still the case, python is the only option to validate the contents of a zipfile?

I thought i might be able to do something clever by using the schema output on the FeatureReader, but I'd need some sort of schemas first option I think, otherwise I'd have to introduce a fairly convoluted workflow to ensure that features haven't started to be written before confirming that all the files in the zip file are present and correct. It's only a line of python to get the file list...


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14
mark2atsafe wrote:
Sadly the File/Directory reader won't read a list of files from a zip archive. It's already been filed as an enhancement request (PR#69077) and if/when we implemented that it would solve this problem without the need for Python.

 

@mark2atsafe​ Hi Mark.

 

Is there any update on this, as I have got exactly the same issue as @geodavid76​ ?


revesz
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 2, 2024

It seems working now, at least inside a FeatureReader (FME Form 2024.1.1.0 (20240729 - Build 24619 - WIN64) and FME Flow 2024.1.1.1 Build 24620 - linux-x64) )

The ‘Directory and File Pathnames’ reader may need to be tricked, by setting the Dataset parameter to an attribute with the path to the .zip file. It nicely lists the contents of the .zip file.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings