Skip to main content

Hi

I am having trouble speeding up the ShortestPathFinder. The input is about 7 million from-to-lines and I was hoping to speed up the calculations by setting the "Parallel Processing" to medium on my 8 core system. According to https://docs.safe.com/fme/2019.0/html/FME_Desktop_Documentation/FME_Transformers/parallel_processing/parallel_processing.htm , that shoud render the exact number of cores. Still, I don´t notice any speed improvements and the Windows task manager reports about 6% CPU load.

 

At http://docs.safe.com/fme/2018.1/html/FME_Desktop_Documentation/FME_Transformers_HelpPane/Transformers/shortestpathfinder.htm , it says "If it is enabled, a process will be launched for each group specified by the Group By parameter." However, the most recent documentation, http://docs.safe.com/fme/2019.0/html/FME_Desktop_Documentation/FME_Transformers/Transformers/shortestpathfinder.htm , does not mention parallel processing at all.

 

Am I missing something? How do I enable parallel processing in this case?

 

Regards,

Magnus

Do you have a single network and 7 million From To lines? No group by parameter?

 

If the above is the case, I recommend splitting your from to lines into groups and then use a workspace runner to run the original workspace independently for each group of from to lines


fyi If you have either feature caching or breakpoints activated then parallel processing won't work.


@magnus If you have 7 million from-to vectors you probably also have a large network so as @ebygomm suggests, try and be smart about splitting up both your network and from-to vectors. If your from-to tend to converge on the same node, then you could use the node and then the boundary of those common vectors to group portions of the data and send those through to a shortest path worker via workspacerunner.

Also, if your from-to are not just vectors but include intermediate points, or if your using the optimization, you can expect the transformer to be slower

We removed parallel processing from the ShortestPathFinder (and all other transformers) in 2019.0 so perhaps try upgrading your transformer.


I think I have a solution that will help. See today's Question-of-the-Week for more information.


From a recent project, I found that by far the largest performance increase in ShortestPathFinder came instead from using "Group By" rather than Parallel Processing.

However, the reason I could do this is my network logically split up into much smaller subnetworks as my tracing paths were only contained as being largely restricted within a single street's subnetwork. In this project I put artificial "breaks" in the network to break it into smaller pieces and then used TopologyCalculator to first attribute each network link with a "SubNetworkID" and then this become the "Group By" attribute in ShortestPathFinder. Dividing into subnetworks and then using these as the Group in ShortestPathFinder sped up the transformer perhaps ~100 times.

By comparison, parallel prococessing on the subnetwork ID attribute didn't give me much improvement in performance, and as @mark2atsafe has blogged, with Feature Caching/Partial Runs enabled, performance actually can go backwards which is a consideration when doing partial runs when first troubleshooting things like the "NoPathFound" results in ShortestPathFinder.

I guess it shows the one thing that is largely true in all data analytics, that a highly constrained query running on a single CPU thread can outperform an unconstrained query running over several CPU threads by a handy margin.


In FME 2021 (beta) ShortestPathFinder has been updated with several changes to improve performance:

  • We have changed the underlying algorithm to A*. A simple discussion comparing algorithms here 
  • added multi-threading - this will use all CPUs and consume more memory
  • added an optimization for Attribute Accumulation - if you do NOT check Generate List if you don't need the lists then ShortestPathFinder is smarter
  • added an optimization for single node from-to vectors, i.e. if all the vectors terminate at a common node - which is your case

Example performance gain for about 4000 vectors from 170secs to about 15secs


Reply