Skip to main content

If i run this command in the command prompt (OS) it takes about 15 seconds.

M:\\Survey\\'COGO DATA>DIR /S *.DWG

If I run the Directory and File Pathnames Reader with essentially the same parameters it takes 5 hrs! Why can't fme do this any faster?

I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.


I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.

Thank you for the direction but that did not make any difference for me.

The other thing is that i haven't ever seen any valid values come through the filepropertyextractor for mtime, ctime, atime


I read a directory of 62,000+ PDF's nightly for Date Modified vs an ESRI SDE feature class for delta changes dates. I think this has something to do with OS restriction, similar to files with Creation/Modified date pre 1970, it's an OS problem. To over come the native PATH Reader delay I did ...

 

PATH Reader, limit Attribute to be exposed

PATH ReaderPATH Reader Para

 

 

To Custom Transformers FilePropertyExtractor (Python in the background)

FilePropertyExtractor

 

 

About 4mins on FME Desktop or 3min with FME Server vs 38min using native PATH reader with just 1 additional exposed attributes (path_modified_date), it's even long if I select more dates.

did you not have to recurse into the folders? all your files are in 1 folder?


did you not have to recurse into the folders? all your files are in 1 folder?

recurse folder check is off


did you not have to recurse into the folders? all your files are in 1 folder?

I guess that is the part that won't work for me. My files are in many subfolders. currently running 18hrs on this drive


did you not have to recurse into the folders? all your files are in 1 folder?

Are you wanting a list of filepaths, or file properties or something else?

Have you tried using python to list the files?


Is there any other content in your workspace besides the Directory/Path reader? The log would be misleading as FME would start to process the files while it is still reading the list (i.e. it won't read the list and then start processing). If so, could you try with just the Directory/Path reader, and no other transformers? If FME is truly taking 5 hours to read a list of files, then I'm very alarmed and would love to be able to replicate that here.


did you not have to recurse into the folders? all your files are in 1 folder?

I am using the filepath and properties in changedetector to keep a list synchronized.

I am not experienced with python enough to make that happen.


Is there any other content in your workspace besides the Directory/Path reader? The log would be misleading as FME would start to process the files while it is still reading the list (i.e. it won't read the list and then start processing). If so, could you try with just the Directory/Path reader, and no other transformers? If FME is truly taking 5 hours to read a list of files, then I'm very alarmed and would love to be able to replicate that here.

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3


cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

The reason for just running the Path reader is that sometimes the log (and workspace) doesn't really show where time is being used. But there's nothing else in your workspace that I would expect to take hours to run, no matter how many features there are.

So... it looks like a serious issue. I'm going to escalate this to a full support case. I've not tried this using our new community setup, but I hope that you'll get an email alerting you to the case. It would be really helpful if you could submit a full log file to us.

I'm suspecting that the M drive is a mapped network drive, and that is causing the problem. I can't say for sure, but I think with the log file the developers will be able to help.


I've similarly noticed that in FME2018.1 the File and Path Reader performance versus getting the same list via the Windows API is quite poor (I am similarly recursing SubFolders in my use case), so I've had the same experience, although it makes the difference of ~2 minutes over ~4,000 files, so I tolerate it. It's surprising because I would have thought the Reader just hooked into the OS API to return the File/Folder list.

 

A very "hacky" workaround that takes a little effort, but SHOULD work is to use SystemCaller to use the native Operating System file listing commands, and send the output of this Operating System level command to a temporary text file, and read this into using a Text File FeatureReader. If you want FME to create and discard this Text File this can be achieved with a TempPathnameCreator.

 

Options in SystemCaller are for example, running the windows Command Line: DIR \\s\\b FolderPath >TempResultFirePath which will recursively list all File Paths through the specified Folder's Subfolders, and pipe this to a file. Similar can be achieved by calling a Powershell Command which would give further options on what file attributes to expose and how to list and sort with a variation on the Powershell scriptlet: Get-ChildItem -Path FolderPath

 

Or....another variation with SystemCaller is to use JAM Software's excellent FileList Command Line utility which is one of the fastest file/directory listers available. I've used this before on projects needing to recurse through millions of network files. You do need to download and place the EXE utility in a place you can call it/run by calling it through the SystemCaller, but does all the heavy lifting for you in getting a detailed, attributed file list, into a CSV format.

https://www.jam-software.com/filelist


cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

if the problem is a mapped drive. and i can go to the windows command and type dir /s *.dwg and get the entire list in a matter of just a few minutes but it takes 21 hrs to finish the workbench. I need a different way to get the information. I am not sure why being a mapped drive makes a difference to fme when it doesn't in the OS.


cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

To be honest, I don't know why that would cause an issue; I'm just speculating that it might be the difference. Anyway, I escalated this to a support case, so I'm hoping someone has been in touch with you about it. You should have received an automated email at least. The developers are definitely aware of your issue, because I posted the info to them, so I hope we'll have a reason or solution for you shortly.


cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

Thanks Mark,
I did hear from support yesterday. and I had several interchanges with him today.

cogoscanneri am not sure why i would have a workbench with nothing but the path reader. but mine does have the following... filepropertyextractor, attributefilter, crccalculator, datetimestamper, featuremerger, datetimeconverter,stringreplacer,sorter, changedetector, attributemanager,attributeremover, attributesplitter, listelementextractor, xlswriter.

I have run it with cachinng turned off but did not get the correct amount of features. so i am currently running with caching (more than 18 hrs to complete!) But i can see that it is not processing past the FeatureMerger. I am currently running a wksp with ONLY the directory/path reader in it. and there is no change in the performance. the following is the configuration.

cogoscanner2cogoscanner3

Still no fix on this issue. It is hard to determine if it is or is not the recurse feature. because when you don't recurse the number of files is naturally drastically reduced. so one way or the other I require Many files, many folders.

If someone has an elegant python solution i would be willing to try it. I am not python savvy, but may have a resource onsite that could help as long as i can see it in fme. They are python (not fme savvy).


Reply