Solved

Use PythonCaller to get list of files in directory on FTP site


Badge +7

OK I give up - help me!

This is my first attempt to use the PythonCaller (since FTPCaller can't yet download directories or accept wildcards).  I've cobbled together some Python to retrieve a directory listing from an FTP site which I want to output (probably as a list) so I can generate the name of the file I want to download and pass it to FTPCaller.

I've done a whole lot of Googling but I can't fix this error on the feature.setAttribute line:

2016-10-12 17:46:48|   1.9|  0.1|WARN  |Python Exception <TypeError>: not all arguments converted during string formatting

2016-10-12 17:46:48|   1.9|  0.0|WARN  |Traceback (most recent call last):

  File "<string>", line 30, in input

TypeError: not all arguments converted during string formatting

2016-10-12 17:46:48|   1.9|  0.0|ERROR |Error encountered while calling method `input'

2016-10-12 17:46:48|   1.9|  0.0|FATAL |f_32(PythonFactory): PythonFactory failed to process feature

import fme
import fmeobjects
import ftplib
# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def processFeature(feature):
    pass
# Template Class Interface:
# When using this class, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
class GetCSVs(object):
    def __init__(self):
        CSVlist = []
    def input(self,feature):
        ftp = ftplib.FTP("REDACTED")
        ftp.login("REDACTED", "REDACTED")
        ftp.cwd("REDACTED")
        try:
            CSVlist = ftp.nlst()
        except ftplib.error_perm, resp:
            if str(resp) == "550 No files found":
                print "No files found"
            else:
                raise
        ftp.quit()
        for i in enumerate(CSVlist):
            feature.setAttribute('_list_CSVs{%d}' % i)
    def close(self):
        self.pyoutput(feature)

icon

Best answer by jdh 12 October 2016, 21:13

View original

44 replies

Badge

On line 28 of your snippet, you're calling 

feature.setAttribute()
, but it's missing the required second argument, which is for the attribute's value.

This is unrelated to the error you're seeing, but you should consider moving the 

CSVList
 instantiation out of 
__init__
 and make it the first line in 
input()
 instead, to ensure that it's defined even in your handled exception case.

Badge +22

Try

for i, v in enumerate(CSVlist):
feature.setAttribute('_list_CSVs{%d}' % i, v)

 you're trying to format a tuple that is part text, hence the error

Userlevel 2
Badge +17

In the line 30 ("close" method), you are going to output a "feature", but the local variable called "feature" is not defined in the "close" method. It causes a syntax error. Perhaps your intention is to output "feature" within the "input" method?

Badge +7
Thanks everyone for your answers so far. I'll try out your suggestions.

 

Part of the problem is my lack of knowledge about Python syntax. For example, I thought the "v" was a second pointer for nested arrays because that was the scenario I saw in most of the topics I found.

 

Apart from places like this and StackExchange which are answering specific problems, I haven't found any general Python help on the net that's particularly user friendly.

 

 

Badge +7

 

No Python errors now I've implemented all the suggestions, but I'm not getting a list. I've connected the output of the PythonCaller to a ListExploder but in the ListExploder properties it says there are "No List Attributes Available".

 

Userlevel 2
Badge +17

 

No Python errors now I've implemented all the suggestions, but I'm not getting a list. I've connected the output of the PythonCaller to a ListExploder but in the ListExploder properties it says there are "No List Attributes Available".

 

Workbench doesn't expose attribute/list names that you defined in the script. You have to expose the list name ("_list_CSVs{}" in this case) manually through the "Attributes to Expose" parameter in the PythonCaller.

 

Userlevel 4

I took the liberty of "fixing up" your code based on the suggestions here. This works for me using FME 2016:

import fmeobjects
import ftplib

class GetCSVs(object):
    def input(self, feature):
        CSVlist = []
        ftp = ftplib.FTP("REDACTED")
        ftp.login("REDACTED", "REDACTED")
        ftp.cwd("/pub")
        try:
            CSVlist = ftp.nlst()
        except ftplib.error_perm, resp:
            if str(resp) == "550 No files found":
                print "No files found"
            else:
                raise
        ftp.quit()
        for i, v in enumerate(CSVlist):
            feature.setAttribute('_list_CSVs{%d}' % i, v)
        self.pyoutput(feature)

0684Q00000ArKsDQAV.png

Badge +7

I took the liberty of "fixing up" your code based on the suggestions here. This works for me using FME 2016:

import fmeobjects
import ftplib

class GetCSVs(object):
    def input(self, feature):
        CSVlist = []
        ftp = ftplib.FTP("REDACTED")
        ftp.login("REDACTED", "REDACTED")
        ftp.cwd("/pub")
        try:
            CSVlist = ftp.nlst()
        except ftplib.error_perm, resp:
            if str(resp) == "550 No files found":
                print "No files found"
            else:
                raise
        ftp.quit()
        for i, v in enumerate(CSVlist):
            feature.setAttribute('_list_CSVs{%d}' % i, v)
        self.pyoutput(feature)

0684Q00000ArKsDQAV.png

 

Thanks.  This is pretty much where I'm at except I hadn't removed the other template code you get in the PythonCaller by default.  I've now done that.

 

Badge +7
Workbench doesn't expose attribute/list names that you defined in the script. You have to expose the list name ("_list_CSVs{}" in this case) manually through the "Attributes to Expose" parameter in the PythonCaller.

 

Hooray! Thanks Takashi :-)

 

I got so close - I exposed "_list_CSVs" not "_list_CSVs{}".

 

 

Userlevel 2
Badge +17

 

Thanks.  This is pretty much where I'm at except I hadn't removed the other template code you get in the PythonCaller by default.  I've now done that.

 

Hi, if your final goal is to create features each of which has a single filename attribute, you can replace the last three lines of @david_r's script example with this code snippet. If you do so, the ListExploder will not  be necessary any longer.

 

        for v in CSVlist:
            feature.setAttribute('_CSV', v)
            self.pyoutput(feature)
Userlevel 4
Hi, if your final goal is to create features each of which has a single filename attribute, you can replace the last three lines of @david_r's script example with this code snippet. If you do so, the ListExploder will not  be necessary any longer.

 

        for v in CSVlist:
            feature.setAttribute('_CSV', v)
            self.pyoutput(feature)
Agreed. It's probably also slightly faster like that.

 

 

Badge +7
Hi, if your final goal is to create features each of which has a single filename attribute, you can replace the last three lines of @david_r's script example with this code snippet. If you do so, the ListExploder will not  be necessary any longer.

 

        for v in CSVlist:
            feature.setAttribute('_CSV', v)
            self.pyoutput(feature)
Awesome.  Thanks everyone.

 

Incidentally, what is the {%d} bit in the list version?

 

 

Userlevel 2
Badge +17

 

Thanks. This is pretty much where I'm at except I hadn't removed the other template code you get in the PythonCaller by default. I've now done that.

 

In fact, a list element can be treated with the same manner as a non-list attribute. The difference from a non-list attribute is that the name of a list element is qualified by the index (0-based sequential number) which is surrounded by curly bracket. Here, the {%d} indicates the index part, and the %d will be replaced with the value of i (i.e. 0, 1, 2 ...) at run-time. It's the behavior of the % operator. See the Python documentations to learn more ;)

 

Userlevel 4
Agreed. It's probably also slightly faster like that.

 

 

It's just string formatting, it's explained in great detail here: https://pyformat.info/
Badge +7
In some ways this is actually better than the proposed download/upload of a folder:

 

https://knowledge.safe.com/questions/32041/download-and-upload-folder-from-ftp.html

 

I've added a Tester after the PythonCaller which tests if the CSV filename ends in ".csv":

 

@UpperCase(@Right(@Value(CSVFileName),4)) Like .CSV

 

So if the FTP directory has a mix of files, and the filename/extension values allow you to distinguish different types of file, you can process these in different ways, or ignore files you don't want to download.

 

The directory listing from the code I've used in the PythonCaller will include sub-directory names, so one of the things my Tester does is ignore sub-directories. Only filenames are sent to the FTPCaller.

 

Userlevel 2
Badge +17
In some ways this is actually better than the proposed download/upload of a folder:

 

https://knowledge.safe.com/questions/32041/download-and-upload-folder-from-ftp.html

 

I've added a Tester after the PythonCaller which tests if the CSV filename ends in ".csv":

 

@UpperCase(@Right(@Value(CSVFileName),4)) Like .CSV

 

So if the FTP directory has a mix of files, and the filename/extension values allow you to distinguish different types of file, you can process these in different ways, or ignore files you don't want to download.

 

The directory listing from the code I've used in the PythonCaller will include sub-directory names, so one of the things my Tester does is ignore sub-directories.  Only filenames are sent to the FTPCaller.

 

For what it's worth, if you replace the 11th line of David's script example with this statement, it performs filtering filenames by extension.

 

            CSVlist = [f for f in ftp.nlst() if f[-4:].lower()=='.csv']
See the List Comprehensions to learn more about the syntax.

 

Badge +7
For what it's worth, if you replace the 11th line of David's script example with this statement, it performs filtering filenames by extension.

 

            CSVlist = [f for f in ftp.nlst() if f[-4:].lower()=='.csv']
See the List Comprehensions to learn more about the syntax.

 

I guess it depends if you just want CSVs or want to send different files along different routes in your workspace.  For example if there were several unzipped ShapeFiles in the FTP directory (e.g. Rivers, Roads, Towns) you could use a Tester/TestFilter to download the 6+ files that comprise the ShapeFile to different folder.

 

If the destination folder had the same name as the ShapeFile, you could dispense with the Tester/TestFilter and use the ShapeFile name (minus extension via something like the @Left function) in the FTPCaller Target File parameter to download each ShapeFile to it's own folder e.g. C:\@Value(SHPName)\@Value(FTPFileName)

 

Userlevel 2
Badge +17
I guess it depends if you just want CSVs or want to send different files along different routes in your workspace. For example if there were several unzipped ShapeFiles in the FTP directory (e.g. Rivers, Roads, Towns) you could use a Tester/TestFilter to download the 6+ files that comprise the ShapeFile to different folder.

 

If the destination folder had the same name as the ShapeFile, you could dispense with the Tester/TestFilter and use the ShapeFile name (minus extension via something like the @Left function) in the FTPCaller Target File parameter to download each ShapeFile to it's own folder e.g. C:\\@Value(SHPName)\\@Value(FTPFileName)

 

The statement fetches only filenames ending with '.csv' in case-insensitive, and stores them into the "CSVlist" list. It's just an example. There should be a lot of variations depending on the actual requirement, and of course the approach using the Tester or TestFilter is also a good solution. There always is more than one way ;)

 

Badge +7
I guess it depends if you just want CSVs or want to send different files along different routes in your workspace. For example if there were several unzipped ShapeFiles in the FTP directory (e.g. Rivers, Roads, Towns) you could use a Tester/TestFilter to download the 6+ files that comprise the ShapeFile to different folder.

 

If the destination folder had the same name as the ShapeFile, you could dispense with the Tester/TestFilter and use the ShapeFile name (minus extension via something like the @Left function) in the FTPCaller Target File parameter to download each ShapeFile to it's own folder e.g. C:\\@Value(SHPName)\\@Value(FTPFileName)

 

The FTP Workspace is now called by a Workspace Runner. If the FTP directory is empty (no CSV files or any other type of file), nothing gets downloaded (obviously) and therefore no features get output from the PythonCaller. While this is not an issue for the FTP Workspace itself, I need to capture this scenario because it affects what happens after the Workspace Runner.

 

There's another Workspace Runner which will fail if there are no CSV files in the input folder it reads. I want to skip that subsequent Workspace Runner if no CSV files have been downloaded.

 

So I've added the code below to my FTP Python script:

 

if not CSVlist:

 

feature.setAttribute('FTPErr', 'No CSV files to download')

 

self.pyoutput(feature)

 

This ensures that if the FTP directory is empty, there will be 1 feature output from the PythonCaller which I can test and use a Terminator to terminate the translation.

 

Do you have an example how to take a set of shapefiles download them and take them to the next level by the shapefile and its corresponding downloaded files. By Now I had to take the list to the FTPCaller and then put up a reader from the download dir.

Badge
I've implemented the Python code from @david_r which works fine. However, if I exchange ftplib.nlst() for ftplib.dir() to give me a full directory listing (I need to file modified date) FME gives me the error in the screenshot below. Spent all afternoon trying to get around this without success. Any ideas??

 

Thanks

 

James

 

Userlevel 4
I've implemented the Python code from @david_r which works fine. However, if I exchange ftplib.nlst() for ftplib.dir() to give me a full directory listing (I need to file modified date) FME gives me the error in the screenshot below. Spent all afternoon trying to get around this without success. Any ideas??

 

Thanks

 

James

 

According to the documentation: "This method returns None" :-)

 

 

So basically, you cannot use this method as a replacement for nlst() without any further

 

changes.

 

 

You can, however, specify a callback function for dir() which will then be called once for every line returned by the ftp server. There's a short example here. You will then have to split up each line yourself to get the filenames, dates etc.

 

Badge
According to the documentation: "This method returns None" :-)

 

 

So basically, you cannot use this method as a replacement for nlst() without any further

 

changes.

 

 

You can, however, specify a callback function for dir() which will then be called once for every line returned by the ftp server. There's a short example here. You will then have to split up each line yourself to get the filenames, dates etc.

 

Thanks for the prompt reply. Are you able to elaborate on that a bit as being new to Python I can't see how that example relates to the code example you posted in this thread.

 

Thanks

 

 

Userlevel 4
Thanks for the prompt reply. Are you able to elaborate on that a bit as being new to Python I can't see how that example relates to the code example you posted in this thread.

 

Thanks

 

 

Maybe you could tell me what you intend to accomplish first? That way we can hopefully avoid going down the wrong path.

 

As you're implying, using callback functions is indeed a somewhat advanced topic if you're new to programming.

 

Badge
According to the documentation: "This method returns None" :-)

 

 

So basically, you cannot use this method as a replacement for nlst() without any further

 

changes.

 

 

You can, however, specify a callback function for dir() which will then be called once for every line returned by the ftp server. There's a short example here. You will then have to split up each line yourself to get the filenames, dates etc.

 

I'm trying retrieve a file from a Unix server. There are hundreds of similarly named files but I just need to set up an FME workflow which processes the file produced on the current day, hence the reason I was trying to get a full directory listing with the file created / modified date. Once I'd used FME to run through a returned directory listing and find the correct file I was going to use the Python caller again to actually download the file with ftp and pull the file back into FME for some string replacement functions I need to run on it.

 

Thanks

 

James

 

 

Reply