Skip to main content
Question

directory and pathname reader performance lag


gisbradokla
Enthusiast
Forum|alt.badge.img+17

I have been using this transformer for quite some time and have been living with the pain.

i have done some more testing with it and discovered that if i don't use recurse there is a significant performance gain with a giant functionality loss!

it can take anywhere from 5 minutes to HOURS to get a response from the initial read in dektop. slightly better in server however i am ussually not waiting on it at that point (if i need it to work i publish to server and walk away) it may be tomorrow before it completes.

are there any workarounds. i have considered trying to use a system caller to just do a dir and get really bad data back and rebuild it from that.

 

9 replies

dustin
Influencer
Forum|alt.badge.img+31
  • Influencer
  • December 1, 2022

Could you post your workspace, or at least the portion that you are using the reader? I've never had any performance issues, but I wasn't reading large directories either.


gisbradokla
Enthusiast
Forum|alt.badge.img+17
  • Author
  • Enthusiast
  • December 1, 2022

yes. but it doesn't matter how large the directory is. I am ultimately reading large datasets and using a file filter as well. I realize both of these are going to add bandwidth. as well as another requirement (running on server in final form so it is a network drive an di am searching on the unc not the mapped drive.

 


gisbradokla
Enthusiast
Forum|alt.badge.img+17
  • Author
  • Enthusiast
  • December 1, 2022

this reads the unc. converts to a mapped letter, creates some more attributes from the path. and then writes out to sql (deleted that connection)


david_r
Celebrity
  • December 2, 2022

Did you perhaps enable Retrieve File Properties in the reader parameters? That can make a substantial difference if there are a lot of files. Also make sure to leverage the Path Filter, if you can, rather than using e.g. a Tester in the workspace.


gisbradokla
Enthusiast
Forum|alt.badge.img+17
  • Author
  • Enthusiast
  • December 2, 2022
david_r wrote:

Did you perhaps enable Retrieve File Properties in the reader parameters? That can make a substantial difference if there are a lot of files. Also make sure to leverage the Path Filter, if you can, rather than using e.g. a Tester in the workspace.

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.


david_r
Celebrity
  • December 2, 2022
gisbradokla wrote:

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

I'm not sure that it's the recurse=yes/no that is causing the issue on its own, rather that the number of files to query the file properties is much larger that way. On Windows, retrieving file properties is relatively slow, and when you multiply it with a large number of files it makes a difference.

I haven't retried with the most recent versions of FME, but some years ago I found that iterating over large directories using Python (os.walk) was substantially faster than FME, even when requesting the file properties.


gisbradokla
Enthusiast
Forum|alt.badge.img+17
  • Author
  • Enthusiast
  • December 2, 2022
gisbradokla wrote:

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

if i set the number if features to 1 and set the recurse yes it takes hours. if i set the number of features to 1 and set the recurse no it takes seconds.

 


david_r
Celebrity
  • December 2, 2022
gisbradokla wrote:

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

The bottleneck is when reading the filesystem, which I believe happens before FME limits the number of features.


gisbradokla
Enthusiast
Forum|alt.badge.img+17
  • Author
  • Enthusiast
  • December 2, 2022
gisbradokla wrote:

I do require the file properties. However i can watch the difference between recurse yes and recurse no. yes takes several hours, while no runs immediately.

i understand there is a bottleneck. will the bottleneck ever be fixed?

there is a tool there. obviously if i (as suggested would learn to use it) python, or a dir command or many other methods it does not encounter the bottleneck that the directory and pathnames reader does. if it is not a pathnames reader for bulk data then name it the single pathname reader.

EDIT:** After 5 days no response... Just left hanging. not sure where to turn.

scan using pathname readercurrent time is 2022-12-7 9:04 still waitingscan using pathname reader2


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings