Question

Pathreader slow when recursive option is yes

4 years ago
July 28, 2020
7 replies
37 views

lizzygradidge
31 replies

When running a simple Pathreader on files on a network share it runs fine . But when I choose the recursive parameter the translation just gets to "retrieve file properties 'false' " and no further progress is made

The 2 warnings are :

Feature Caching is ON

The workspace may run slower because features are being recorded on all output ports.

any ideas why this should happen?

takashi
7683 replies
4 years ago
July 28, 2020

Hi @lizzygradidge, have you tried running the workspace unchecking the Enable Feature Caching option?

See here to learn more about Feature Caching: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Workbench/Workbench/feature_inspection_about.htm

lizzygradidge
Author
31 replies
4 years ago
July 28, 2020

takashi wrote:

Hi @lizzygradidge, have you tried running the workspace unchecking the Enable Feature Caching option?

See here to learn more about Feature Caching: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Workbench/Workbench/feature_inspection_about.htm

This does the same thing without the warnings

takashi
7683 replies
4 years ago
July 28, 2020

takashi wrote:

Hi @lizzygradidge, have you tried running the workspace unchecking the Enable Feature Caching option?

See here to learn more about Feature Caching: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Workbench/Workbench/feature_inspection_about.htm

Wasn't performance improved?

+19

fmelizard
Safer
3725 replies
4 years ago
July 31, 2020

Just curious if this directory has a huge huge number of subdirectories and files under it -- how many are you expecting? Possibly we're tunneling our way down a very deep a large set before we start returning things (not good) and we're not giving any feedback while we're thinking. Could you try starting the path reader further down at a deeper subdirectory and see if it returns anything there? And then bit by bit keep backing it out starting it at a higher and higher level?

lizzygradidge
Author
31 replies
4 years ago
August 6, 2020

That has helped identify the bottleneck.

I can get quick results with the following file structure

\\\\..volume1\\000\\016\\690\\ -lists the 115 files in the 690 directory.

\\\\..volume1\\000\\016\\ -lists the 115 directories in the 016 directory and the 437 files in these directories.

Then when I get to the top level directory \\\\..volume1\\000\\

it just sits and has not made any progress despite leaving it running overnight.

There are 345 directories in \\000 each of which could have over 100 directories some with one file some with 100's of files.

I obviously need to run this in stages as it can't deal with this huge number . How would I iterate through the sub directories of \\000 ? My task is to get a csv file of the filenames so I can identify a list of missing files against a data export.

How do I configure the Pathreader to read the contents of the first directory \\\\..volume1\\000\\016\\ and then read the contents of the next directory \\\\..volume1\\000\\017\\ etc ?

If I can get the list of directories in \\000 can the Pathreader be configured to read a csv file ?

hollyatsafe
719 replies
4 years ago
August 18, 2020

lizzygradidge wrote:

That has helped identify the bottleneck.

I can get quick results with the following file structure

\\\\..volume1\\000\\016\\690\\ -lists the 115 files in the 690 directory.

\\\\..volume1\\000\\016\\ -lists the 115 directories in the 016 directory and the 437 files in these directories.

Then when I get to the top level directory \\\\..volume1\\000\\

it just sits and has not made any progress despite leaving it running overnight.

There are 345 directories in \\000 each of which could have over 100 directories some with one file some with 100's of files.

How do I configure the Pathreader to read the contents of the first directory \\\\..volume1\\000\\016\\ and then read the contents of the next directory \\\\..volume1\\000\\017\\ etc ?

If I can get the list of directories in \\000 can the Pathreader be configured to read a csv file ?

Hi @lizzygradidge ,

I just came across this post while helping another user who was experiencing the same 'hanging' workspace when using the File and Pathnames Reader to read a large folder with many subdirectories and wanted to share my suggestion with you.

1. In the File and Pathnames Reader set the Reader Parameters Recurse Into Subfolders: Yes and Allowed Path Type: Directory (This way you are just getting a list of all of the different Directories you will eventually retrieve files from)

2. Add a FeatureReader after the PATH Reader and again set this to the Directory and File Pathnames Format. Now set the Dataset to the attribute path_windows and for the other Parameters you can use Recurse Into Subfolders: No and Allowed Path Type: File.

3. By default the values will come out of the <Generic> port of the FeatureReader so you'll want expose the attributes you will be using either in the FeatureReader, or afterwards using the AttributeExposer transformer. Alternatively in the FeatureReader change the Output to Specified: PATH then it should expose these automatically.

This will perform similar to your idea of reading from a CSV list and hopefully allow you to read in all the files without hanging. Please see attached for an example of how you would set this up - if you just set the Source Dataset published parameter to point to your folder you should be good to go.

lizzygradidge
Author
31 replies
4 years ago
August 25, 2020

Thanks for this. however I need to upgrade FME to look at it . I am having difficulty downloaded the latest FME. Working from home in rural Hampshire my internet speed is fairly poor,but once I succeed I will try this.

I resolved my issue previously by installing FME desktop on my Server and running the translation on the Server. I also edited the Source Folder and pathnames folders by listing the subdirectories in the text editor for source folder and path names folder.

initially I ran it in batches of 100 successfully .

Then I discovered that it would run listing all the sub directories from the top level and completed successfully in 20 minutes outputting a file of 520,259 records.

Thank you .

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Pathreader slow when recursive option is yes