Skip to main content

I have two very large sets of files from 2 different years that I need to compare with each other. I already have a workspace set up that currently performs the comparison and gives me the desired results. Reader_1 in the workspace takes a file from the year_2013 set and Reader_2 takes the corresponding file from the year_2014 set

Reader_1 - file_ABC_year_2013.tif

Reader_2 - file_ABC_year_2014.tif

What I need to do now is run the process as a batch job on all of the data so that each file from year_2013 is processed with the corresponding file from year_2014. The two sets of data contain exactly the same number of files and share a naming convention with the year appended to the end of the file name.

Do I create a csv file that lists all of the files from year_2013 and the corresponding files from year_2014? If so how are these files 'feed' to the workspace? via workspacerunner?

Batch processing using the workspacerunner seems reasonably easy when only one reader is required, however setting up two readers that require files to be matched up does not appear to be as straight forward.

Hi,

Use published parameters and supply the parameter to workspace runner with the help of excel or csv. Each record from excel will supply the input & output for 1 iteration of workspace execution. If you want to execute for 38 files then 38 lines should have path of the readers and writers in excel file.

Pratap


Hi,

Use published parameters and supply the parameter to workspace runner with the help of excel or csv. Each record from excel will supply the input & output for 1 iteration of workspace execution. If you want to execute for 38 files then 38 lines should have path of the readers and writers in excel file.

Pratap

This is a good approach. Another way to drive things that might be easier is to again read things from a CSV file, one record per pair you want to read, and then feed that into 2 separate FeatureReader transformers. In modern times (for sure FME 2016, I think also FME 2015) you can set the dataset in the FeatureReader to be an attribute or a string editted concatenation of attributes - just check the pull down menu by the dataset in there). From there you'll have the data emerging for you to test etc.


or you can use a file/directory reader.

Use a stringsearcher (regexp= (.*)(\\d{4})) and tester

_matched_parts{1} = 2013 (or visas versa) to separate the years.

etc.

or use aggregator grouped by _matched_parts{0}, wich wil give u pairs. (assuming filenames are unique within a year...)

Then you dont first have to create a csv/txt file.


You can use the Directory and File Path reader to scan the existing files in the directory.

This will return the filenames.

Then use the WorkSpaceRunner to run your comparison workspace with the two selected files.

This will enable you to loop through a complete directory without you having to scan the directory for filenames.


This is a good approach. Another way to drive things that might be easier is to again read things from a CSV file, one record per pair you want to read, and then feed that into 2 separate FeatureReader transformers. In modern times (for sure FME 2016, I think also FME 2015) you can set the dataset in the FeatureReader to be an attribute or a string editted concatenation of attributes - just check the pull down menu by the dataset in there). From there you'll have the data emerging for you to test etc.

Thanks for the Help...

I ended up using two FeatureReader transformers to bring the file information in via *.csv files. I broke the data down so that a csv file only contains 1 pair of files (I can play around with this later). Once the csv file are feed in via a Workspace runner 8 processes we able to run at once, making things very quick.

Had a few issues fanning out the data at the end as I initially had nothing to work with for the fanout attribute. Eventually overcame this by using the AttributeExposer to get hold of the fme_basename.


Reply