Skip to main content

Hi,

 

I would like to read a csv file (zipped) from a https site and expose the "date modified" value. I found this thread explaining how to use the Directory and File Pathnames - Reader but that does not work for https sources. https://knowledge.safe.com/answers/32678/view.html

 

Is there any workaround without downloading the file first? Thank you for your help.

Most filesystems have metadata associated with each file object, such as the date of the last modification, and the Directory and File Pathnames uses the filesystem API to retrieve these values.

The HTTP/S protocol has no such standardized mechanism for file metadata however.

If the web server doesn't explicitely provide you with that information in some documented way, I don't think this information will be available to you.


Most filesystems have metadata associated with each file object, such as the date of the last modification, and the Directory and File Pathnames uses the filesystem API to retrieve these values.

The HTTP/S protocol has no such standardized mechanism for file metadata however.

If the web server doesn't explicitely provide you with that information in some documented way, I don't think this information will be available to you.

I see. Is there a way include the download process in the workspace? It's not that I want to avoid downloading the file, I just don't want to do it as an extra step before running the workspace.

 


I see. Is there a way include the download process in the workspace? It's not that I want to avoid downloading the file, I just don't want to do it as an extra step before running the workspace.

 

You can do something like the following:

 

  • Creator
  • TempPathnameCreator to create a temporary filename (file is automatically deleted when the workspace ends)
  • HTTPCaller, save response body to file
  • FeatureReader, read from file

You can do something like the following:

 

  • Creator
  • TempPathnameCreator to create a temporary filename (file is automatically deleted when the workspace ends)
  • HTTPCaller, save response body to file
  • FeatureReader, read from file
I'm afraid I will need some help with your instructions. This is the ZIP containing the CH.csv file I want to use to extract the date modified value.

 

https://data.geo.admin.ch/ch.bfs.gebaeude_wohnungs_register/CSV/CH/CH.zip

 

Can you share a workspace? Only if it's not too much of a hassle of course.

You can either point the CSV reader directly at https://data.geo.admin.ch/ch.bfs.gebaeude_wohnungs_register/CSV/CH/CH.zip, in that case FME will automatically download and unzip the file for you.

Or you can do it "manually" in your workspace, see the attached example. Just be aware that in this case you'll have to manually expose the CSV attributes in the FeatureReader.

downlad-and-read-csv.fmwt


You can either point the CSV reader directly at https://data.geo.admin.ch/ch.bfs.gebaeude_wohnungs_register/CSV/CH/CH.zip, in that case FME will automatically download and unzip the file for you.

Or you can do it "manually" in your workspace, see the attached example. Just be aware that in this case you'll have to manually expose the CSV attributes in the FeatureReader.

downlad-and-read-csv.fmwt

Thanks, but how do I expose the "date modified" attribute from windows explorer after using the CSV reader? I'm facing the same problem as this thread but I can not use the Directory and File Pathnames - Reader. https://knowledge.safe.com/answers/32678/view.html

 

 


Thanks, but how do I expose the "date modified" attribute from windows explorer after using the CSV reader? I'm facing the same problem as this thread but I can not use the Directory and File Pathnames - Reader. https://knowledge.safe.com/answers/32678/view.html

 

 

What do you want to accomplish? The modification date of a locally downloaded file will be timestamp when the file was downloaded to your pc, not the last time the file was updated on the server.
What do you want to accomplish? The modification date of a locally downloaded file will be timestamp when the file was downloaded to your pc, not the last time the file was updated on the server.
Are you sure? When I download the file and extract the ZIP the CSV shows 11.03.2018 as modification date. I know that thats the currency of the data and as no other metadata exists that would be the only way to determine the currency. I hoped that there is an automatic way to fetch this info.

 

 


Are you sure? When I download the file and extract the ZIP the CSV shows 11.03.2018 as modification date. I know that thats the currency of the data and as no other metadata exists that would be the only way to determine the currency. I hoped that there is an automatic way to fetch this info.

 

 

Have you perhaps downloaded (and overwritten) the same file multiple times, and the first time was on 11.03.2018?

 

If I download CH.zip to an empty folder on my pc, the modification date equals the download date and time.

 

To the best of my knowledge, file downloads over http doesn't include metadata.
Are you sure? When I download the file and extract the ZIP the CSV shows 11.03.2018 as modification date. I know that thats the currency of the data and as no other metadata exists that would be the only way to determine the currency. I hoped that there is an automatic way to fetch this info.

 

 

The ZIP files modification date equals the download date for me as well but the contained CSV still keeps its original date. I'm aware that thats not "real" metadata but it's the closest thing until official metadata is added.

 

 


Insert the following into a PythonCaller:

from fmeobjects import *
import zipfile
import time, datetime

class GetZipFileDates(object):
 
    def input(self, feature):
        zip_file = feature.getAttribute('zip_file')
        fh = open(zip_file, 'rb') 
        z = zipfile.ZipFile(fh)
        for f in z.infolist():
            date_time = time.mktime(f.date_time + (0, 0, -1))
            dt = datetime.datetime.fromtimestamp(date_time)
            new_feature = FMEFeature()
            new_feature.setAttribute('filename', f.filename)
            new_feature.setAttribute('date_time', dt.strftime('%Y%m%d%H%M%S'))
            self.pyoutput(new_feature)

The PythonCaller will expect a feature with an attribute "zip_file" that contains the complete path and filename to a valid zip file. It will output one feature for each file inside the zip file, containing the filename and the modification date (FME datetime format), e.g.

Attribute(string): `date_time' has value `20180311125412'
Attribute(string): `filename' has value `CH.csv'

You can then e.g. use a Tester on "filename" to get only the file(s) you want.

The PythonCaller should be configured as follows:

0684Q00000ArKx2QAF.png


Insert the following into a PythonCaller:

from fmeobjects import *
import zipfile
import time, datetime

class GetZipFileDates(object):
 
    def input(self, feature):
        zip_file = feature.getAttribute('zip_file')
        fh = open(zip_file, 'rb') 
        z = zipfile.ZipFile(fh)
        for f in z.infolist():
            date_time = time.mktime(f.date_time + (0, 0, -1))
            dt = datetime.datetime.fromtimestamp(date_time)
            new_feature = FMEFeature()
            new_feature.setAttribute('filename', f.filename)
            new_feature.setAttribute('date_time', dt.strftime('%Y%m%d%H%M%S'))
            self.pyoutput(new_feature)

The PythonCaller will expect a feature with an attribute "zip_file" that contains the complete path and filename to a valid zip file. It will output one feature for each file inside the zip file, containing the filename and the modification date (FME datetime format), e.g.

Attribute(string): `date_time' has value `20180311125412'
Attribute(string): `filename' has value `CH.csv'

You can then e.g. use a Tester on "filename" to get only the file(s) you want.

The PythonCaller should be configured as follows:

0684Q00000ArKx2QAF.png

Here's a small demo workspace:

 

zip-contents-lister.fmwt
The ZIP files modification date equals the download date for me as well but the contained CSV still keeps its original date. I'm aware that thats not "real" metadata but it's the closest thing until official metadata is added.

 

 

That's an excellent point, have a look at the PythonCaller code I've posted above, hopefully it can help extract the CSV timestamp from the zip file.
Are you sure? When I download the file and extract the ZIP the CSV shows 11.03.2018 as modification date. I know that thats the currency of the data and as no other metadata exists that would be the only way to determine the currency. I hoped that there is an automatic way to fetch this info.

 

 

David, thank you for putting so much effort into helping me. When I run your skript it works if the ZIP file is stored locally but pointing the creator directly to the https path does not work unfortunately. Should that even be possible?

 

 


David, thank you for putting so much effort into helping me. When I run your skript it works if the ZIP file is stored locally but pointing the creator directly to the https path does not work unfortunately. Should that even be possible?

 

 

No, you'll have to download the file first, either using the workspace or by implementing it in the Python code. As it is, the file must exist locally, you cannot use a URL.
Are you sure? When I download the file and extract the ZIP the CSV shows 11.03.2018 as modification date. I know that thats the currency of the data and as no other metadata exists that would be the only way to determine the currency. I hoped that there is an automatic way to fetch this info.

 

 

Ok, thanks again for your help.

 

 


Reply