Question

Reading multiple files using a variable file path name


My workflow requires reading .zip files, containing multiple XML’s containing the data I require.

The .zip file name is a concatenation of the following information:

VariableProcessname1_VariableProcessname2_TransactionID_Date_Variable.zip

At this stage I have only been successful reading in multiple files using the direct file path followed by *.zip to reference all files.

However the folders which store the .zip files have thousands of items, and for this particular process I need to only use the .zip files which are of a set date range – ie. the last two days.

I have tried (unsuccessfully) the following methods to read the files I need:

  • 1. Use the StringConcatenator transformer to create an attribute to insert in file path. I get the following error.

  • 2. Create a user parameter, however again I have only been successful using wildcard as the file name.

I am thinking some python code (possibly in conjunction with a file path reader) may be the best solution? If someone is able to provide an example of the code I would really appreciate it.


3 replies

Badge +16

It think you are on the right track using a Directory and File Path reader, filtering out the results and passing that to the FeatureReader.

Badge +22

There are two options that should work.

 

 

1. Create a feature for each file you want to read in with an attribute containing the full file path and set the FeatureReader dataset to that attribute.

 

 

2. Use a Directory and File Path reader to read all the (xml) files in the directory and a tester to filter out only the ones you want to read in and send those to the FeatureReader.

So following on using the file path as an attribute for the reader (thanks for your help), I am now getting another error (which is possibly a separate issue related to handling the xml).

See image below - XML Parse error:

If I insert a * in text of the first instance from the XML file into the Elements to match and run the workspace, then it returns some information, however is missing the actual data?

Not sure how to get around this?

 

Reply