Skip to main content
Question

PERFORMANCE OF FILE AND DIRECTORY PATH


gisbradokla
Enthusiast
Forum|alt.badge.img+16

If i run this command in the command prompt (OS) it takes about 15 seconds.

M:\\Survey\\'COGO DATA>DIR /S *.DWG

If I run the Directory and File Pathnames Reader with essentially the same parameters it takes 5 hrs! Why can't fme do this any faster?

16 replies

hlouie
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 2, 2020

I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 2, 2020
hlouie wrote:

I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.

Thank you for the direction but that did not make any difference for me.

The other thing is that i haven't ever seen any valid values come through the filepropertyextractor for mtime, ctime, atime


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 2, 2020
hlouie wrote:

I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.

did you not have to recurse into the folders? all your files are in 1 folder?


hlouie
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 2, 2020
gisbradokla wrote:

did you not have to recurse into the folders? all your files are in 1 folder?

recurse folder check is off


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 3, 2020
gisbradokla wrote:

did you not have to recurse into the folders? all your files are in 1 folder?

I guess that is the part that won't work for me. My files are in many subfolders. currently running 18hrs on this drive


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • September 3, 2020
gisbradokla wrote:

did you not have to recurse into the folders? all your files are in 1 folder?

Are you wanting a list of filepaths, or file properties or something else?

Have you tried using python to list the files?


mark2atsafe
Safer
Forum|alt.badge.img+43
  • Safer
  • September 3, 2020

Is there any other content in your workspace besides the Directory/Path reader? The log would be misleading as FME would start to process the files while it is still reading the list (i.e. it won't read the list and then start processing). If so, could you try with just the Directory/Path reader, and no other transformers? If FME is truly taking 5 hours to read a list of files, then I'm very alarmed and would love to be able to replicate that here.


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 3, 2020
gisbradokla wrote:

did you not have to recurse into the folders? all your files are in 1 folder?

I am using the filepath and properties in changedetector to keep a list synchronized.

I am not experienced with python enough to make that happen.


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 3, 2020
mark2atsafe wrote:

Is there any other content in your workspace besides the Directory/Path reader? The log would be misleading as FME would start to process the files while it is still reading the list (i.e. it won't read the list and then start processing). If so, could you try with just the Directory/Path reader, and no other transformers? If FME is truly taking 5 hours to read a list of files, then I'm very alarmed and would love to be able to replicate that here.

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3


mark2atsafe
Safer
Forum|alt.badge.img+43
  • Safer
  • September 4, 2020
gisbradokla wrote:

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

The reason for just running the Path reader is that sometimes the log (and workspace) doesn't really show where time is being used. But there's nothing else in your workspace that I would expect to take hours to run, no matter how many features there are.

So... it looks like a serious issue. I'm going to escalate this to a full support case. I've not tried this using our new community setup, but I hope that you'll get an email alerting you to the case. It would be really helpful if you could submit a full log file to us.

I'm suspecting that the M drive is a mapped network drive, and that is causing the problem. I can't say for sure, but I think with the log file the developers will be able to help.


bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • September 5, 2020

I've similarly noticed that in FME2018.1 the File and Path Reader performance versus getting the same list via the Windows API is quite poor (I am similarly recursing SubFolders in my use case), so I've had the same experience, although it makes the difference of ~2 minutes over ~4,000 files, so I tolerate it. It's surprising because I would have thought the Reader just hooked into the OS API to return the File/Folder list.

 

A very "hacky" workaround that takes a little effort, but SHOULD work is to use SystemCaller to use the native Operating System file listing commands, and send the output of this Operating System level command to a temporary text file, and read this into using a Text File FeatureReader. If you want FME to create and discard this Text File this can be achieved with a TempPathnameCreator.

 

Options in SystemCaller are for example, running the windows Command Line: DIR \\s\\b FolderPath >TempResultFirePath which will recursively list all File Paths through the specified Folder's Subfolders, and pipe this to a file. Similar can be achieved by calling a Powershell Command which would give further options on what file attributes to expose and how to list and sort with a variation on the Powershell scriptlet: Get-ChildItem -Path FolderPath

 

Or....another variation with SystemCaller is to use JAM Software's excellent FileList Command Line utility which is one of the fastest file/directory listers available. I've used this before on projects needing to recurse through millions of network files. You do need to download and place the EXE utility in a place you can call it/run by calling it through the SystemCaller, but does all the heavy lifting for you in getting a detailed, attributed file list, into a CSV format.

https://www.jam-software.com/filelist


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 9, 2020
gisbradokla wrote:

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

if the problem is a mapped drive. and i can go to the windows command and type dir /s *.dwg and get the entire list in a matter of just a few minutes but it takes 21 hrs to finish the workbench. I need a different way to get the information. I am not sure why being a mapped drive makes a difference to fme when it doesn't in the OS.


mark2atsafe
Safer
Forum|alt.badge.img+43
  • Safer
  • September 9, 2020
gisbradokla wrote:

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

To be honest, I don't know why that would cause an issue; I'm just speculating that it might be the difference. Anyway, I escalated this to a support case, so I'm hoping someone has been in touch with you about it. You should have received an automated email at least. The developers are definitely aware of your issue, because I posted the info to them, so I hope we'll have a reason or solution for you shortly.


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 10, 2020
gisbradokla wrote:

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

Thanks Mark, I did hear from support yesterday. and I had several interchanges with him today.

gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • December 8, 2022
gisbradokla wrote:

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

Still no fix on this issue. It is hard to determine if it is or is not the recurse feature. because when you don't recurse the number of files is naturally drastically reduced. so one way or the other I require Many files, many folders.

If someone has an elegant python solution i would be willing to try it. I am not python savvy, but may have a resource onsite that could help as long as i can see it in fme. They are python (not fme savvy).


albinepro
Supporter
Forum|alt.badge.img+7
  • Supporter
  • January 16, 2025

Im still encountering this issue while using Directory and File Pathnames reader recursing over subfolders in a network drive. While cmd call using Python caller is one and half minute in listing the files, the native reader was working more than 15 minutes without finishing.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings