Skip to main content

Hi all,

I'm sure I'm missing something simple here but I can't see a way to limit the number of features that are read when the Reader is pointing to a folder of files eg. c:\\folder\\**\\*.jpg

As far as I can tell, the Max Features to Read options are only applied within a dataset, not across a number of datasets. I'm currently using a Sampler to throttle the number of features that are being processed at any one time but it still has to read every file into the workspace first. As the folder in question has thousands of image files, I would like an easy way to ensure that the process doesn't collapse!

Cheers,

Barrett

Hi @barrett_h

 

Can you use the Directory and File Pathnames reader first, to read in folders and then pass the first X through to a FeatureReader?

 

You set it to only read in folder names, and set the maximum number of features to read (Folders).

 

If you want to read only X number of features per folder, then in the FeatureReader you can also set the number of features to read, which it would read for each input folder coming from the PATH reader.

 

 


Hi @barrett_h

 

Can you use the Directory and File Pathnames reader first, to read in folders and then pass the first X through to a FeatureReader?

 

You set it to only read in folder names, and set the maximum number of features to read (Folders).

 

If you want to read only X number of features per folder, then in the FeatureReader you can also set the number of features to read, which it would read for each input folder coming from the PATH reader.

 

 

Great suggestion @jlutherthomas

 


Hi @barrett_h,

Just to illustrate that the @jlutherthomas did write you:

1) Reader - Directory and File Pathnames

2) Transformer Counter to generate id for each file your folder

3) Transformer Tester to filter by attribute counter =< max_feature ( published parameters )

4) Transformer FeatureReader to get files using the attribute path_windows.

Attached the Workspace.

Thanks, workspace-max-features-to-read.fmw

Danilo


Thank you both @jlutherthomas and @danilo_inovacao. It's not exactly the solution I had in mind but it works great none the less! I was intent on trying to ensure the workspace read only a set number of features but by using the Directory and File Pathnames reader it still reads all the features, but with little impact on the performance of the workspace. I ended up with the following set up (the Directory and File Pathnames reader has the Allowed Path Types set to FILE):

I found that the Max Features to Read parameter in the FeatureReader does not work as I want it to ie. it is counting within a dataset not across all the datasets.

Thanks very much for the suggestions.


Reply