Question

Read and processing multiple files in a directory (but not together)

  • 12 December 2013
  • 10 replies
  • 36 views

Badge
Hi,

 

 

Can anyone advise me how I can configure a workbench to read all the files into a workbench? However, I do not want to read them all in at the same time (using the 'directory' reader feature) so that the features from different dataset are all running through at the same time.

 

 

In essence, I want to read the first file in directory, process and write out the results from the workbench, then and only then read in the next file and process, repeating until all files have processed. (Because the processes I want to run I think it will be too difficult if I try to process all the files at once, therefore I would rather that the datasets are handled in isolation.

 

 

 

 

 

Thanks in advance,

 

 

 

Rob

 

 

 


10 replies

Userlevel 2
Badge +17
Hi Rob,

 

 

I think batch processing with the WorkspaceRunner is suitable to your requirement. Create a second workspace; add a "Directory and File Pathnames (PATH)" reader to read every file path in the specified directory; add a WordspaceRunner to run the main workspace per file.

 

  This documentation can be helpful. 

 

FMEpedia > Batch Processing Using the WorkspaceRunner http://fmepedia.safe.com/articles/Samples_and_Demos/WorkspaceRunner

 

 

Just be aware that transformer names etc. might be different from current FME, since this documentation is relatively old.

 

 

Takashi
Badge
Hi Takashi,

 

 

Thanks for the response. I have completed a test using the workspace runner but i am not sure if i have set everything up quite right, can you please clarifiy a couple of points.

 

 

1, I have created a workspace runner bench, within this i have used the path and filename reader, to look at my input directory contained *.txt files. I have linked this to the workspace runner. Within the setting of this transformer I have entered the main processing workspace name and also set the wait for job to complete to yes, as I want the first file to run, and complete before the next one is called.

 

 

2. In the Main process bench (called by the runner) I have used a 'normal' txt file reader (set to look at the same directory as in the previous bench, this is set to read all files in the directory).

 

 

 

I have run the the workspace runner and it appeasr to be working, however, i am not sure whether the workspace runner is starting reading the first file then calling the main bench which then reads in all 3 files processes and 3 outputs (instead of 1). Then the 2 file is read by the runner , which calls the main bench processes opens the 3 files from input, processes and over writes the original output, Then teh runner repeats again for my final test file.

 

 

This may not be happening but I do not quite understamd how I need to set up the main process reader? It says on the various pages that the runner passes the first filename as a parameter to the main process, but when I look at the main bench I am not sure where to check to see if the reader options listed are as a consequence of the passed parameter or from the normal reader setting in the main bench.

 

 

I hope that sort of makes sense??

 

 

 

Thanks again,

 

 

Rob

 

Userlevel 2
Badge +17
I guess you have not specified parameter of the WorkspaceRunner for passing file path to the main workspace yet. You will have to pass file path ("path_windows" or "path_unix") read by the PATH reader to the Source Text File parameter of the TEXT reader in the main workspace through the WorkspaceRunner. This image is a setting example of the WorkspaceRunner.

 

Badge
Hi Takashi,

 

 

Many thanks for the further explanation and screen shot; you are right I had not set the parameter in the runner space. I had assumed that it would automatically be passed from space to space, but i understand how the elements fit together now.

 

 

Thanks again for your help.

 

 

Best wishes,

 

 

Rob
i take tiff file as an example. here is what i usually do to read  and process tiff file. maybe it will help you solve this problem.
Badge

Hi , I have a follow up question on this topic. I have workspace runner which runs a workspace which I hope to have attribute fanout to create a new gdb for each run. It's not doing this an only creates a single gdb as per the Runner's Destination Geodatabase parameter. What have I missed?

Badge +16

if the destination parameter is always the same then this behavior is to be expected.

Try using a writer fan out as the destination parameter.

Badge

Yes agreed, unfortunately I don't know how to do as you say. How do you set a fanout on a destination parameter when using a workspace runner. I have done this with a different workspace running standalone but I can't get the combination to work.

Badge +10

Create a user parameter in the child workbench, set the fanout on the writer to use this parameter.

Then within the workspace runner you can specify the value of this parameter, i.e. the value to be used to to name the gdb

Badge +16

Yes agreed, unfortunately I don't know how to do as you say. How do you set a fanout on a destination parameter when using a workspace runner. I have done this with a different workspace running standalone but I can't get the combination to work.

As stated below all parameters in the child ws will become available in the wsrunner

 

 

Reply