Question

directory and pathname reader performance lag

  • 1 December 2022
  • 9 replies
  • 12 views

Badge +9

I have been using this transformer for quite some time and have been living with the pain.

i have done some more testing with it and discovered that if i don't use recurse there is a significant performance gain with a giant functionality loss!

it can take anywhere from 5 minutes to HOURS to get a response from the initial read in dektop. slightly better in server however i am ussually not waiting on it at that point (if i need it to work i publish to server and walk away) it may be tomorrow before it completes.

are there any workarounds. i have considered trying to use a system caller to just do a dir and get really bad data back and rebuild it from that.

 


9 replies

Userlevel 3
Badge +26

Could you post your workspace, or at least the portion that you are using the reader? I've never had any performance issues, but I wasn't reading large directories either.

Badge +9

yes. but it doesn't matter how large the directory is. I am ultimately reading large datasets and using a file filter as well. I realize both of these are going to add bandwidth. as well as another requirement (running on server in final form so it is a network drive an di am searching on the unc not the mapped drive.

 

Badge +9

this reads the unc. converts to a mapped letter, creates some more attributes from the path. and then writes out to sql (deleted that connection)

Userlevel 4

Did you perhaps enable Retrieve File Properties in the reader parameters? That can make a substantial difference if there are a lot of files. Also make sure to leverage the Path Filter, if you can, rather than using e.g. a Tester in the workspace.

Badge +9

Did you perhaps enable Retrieve File Properties in the reader parameters? That can make a substantial difference if there are a lot of files. Also make sure to leverage the Path Filter, if you can, rather than using e.g. a Tester in the workspace.

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

Userlevel 4

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

I'm not sure that it's the recurse=yes/no that is causing the issue on its own, rather that the number of files to query the file properties is much larger that way. On Windows, retrieving file properties is relatively slow, and when you multiply it with a large number of files it makes a difference.

I haven't retried with the most recent versions of FME, but some years ago I found that iterating over large directories using Python (os.walk) was substantially faster than FME, even when requesting the file properties.

Badge +9

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

if i set the number if features to 1 and set the recurse yes it takes hours. if i set the number of features to 1 and set the recurse no it takes seconds.

 

Userlevel 4

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

The bottleneck is when reading the filesystem, which I believe happens before FME limits the number of features.

Badge +9

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

i understand there is a bottleneck. will the bottleneck ever be fixed?

there is a tool there. obviously if i (as suggested would learn to use it) python, or a dir command or many other methods it does not encounter the bottleneck that the directory and pathnames reader does. if it is not a pathnames reader for bulk data then name it the single pathname reader.

EDIT:** After 5 days no response... Just left hanging. not sure where to turn.

scan using pathname readercurrent time is 2022-12-7 9:04 still waitingscan using pathname reader2

Reply