Question

How to use the PointOnPointOverlayer (POPO) with 600,000+ records without having "creating spatial index" writing to the logfile for 7+ hours.

  • 26 October 2021
  • 3 replies
  • 1 view

Badge +6

When using PointOnPointOverlayer (POPO), after FME 2018, I experience performance issues. With FME 2018.1.2.0 (20190225 - Build 18590 - WIN64) I could use the POPO with 600,000 records and this portion of the job use to run in 30+ minutes. Now with FME 2020.2.1.0 (20201130 - Build 20806 - WIN64) it takes 7+ hours. I also tested briefly with FME 2021.1.3 (Build 21631 - win64) but saw no significant improvement.

What options are available for removing this additional 6.5 hours of processing time?

 

Similar issue to this article from 2017.

 

Thank you in advance for your time :)

 

Julia

 

Highlights from the logfile:

2021-10-21 14:11:59|  0.4| 0.4|INFORM|FME 2020.2.1.0 (20201130 - Build 20806 - WIN64)

2021-10-21 14:11:59|  0.4| 0.0|INFORM|FME_HOME is 'C:\\Program Files\\FME2020\\'

2021-10-21 14:11:59|  0.4| 0.0|INFORM|FME Desktop ESRI Edition (floating)

.

.

.

2021-10-21 14:12:50| 50.4| 0.0|INFORM|Creating spatial index of 2 point features

2021-10-21 14:12:50| 50.4| 0.0|INFORM|Completed spatial index creation

2021-10-21 14:12:50| 50.4| 0.0|INFORM|Creating spatial index of 1 point features

2021-10-21 14:12:50| 50.4| 0.0|INFORM|Completed spatial index creation

.

.

.

2021-10-21 21:25:57|4641.4| 0.0|INFORM|Creating spatial index of 1 point features

2021-10-21 21:25:57|4641.4| 0.0|INFORM|Completed spatial index creation

2021-10-21 21:25:57|4641.4| 0.0|INFORM|Creating spatial index of 1 point features

2021-10-21 21:25:57|4641.4| 0.0|INFORM|Completed spatial index creation

2021-10-21 21:25:57|4641.4| 0.0|INFORM|PointOnPointOverlayer_10 (OverlayFactory): Group 626779 / 626779: Finished processing group 626779 / 626779

2021-10-21 21:25:57|4641.4| 0.0|INFORM|PointOnPointOverlayer_10 (OverlayFactory): Group 626779 / 626779: 100% complete

 

Took 7+ hours to complete PoPo on 626,000+ records


3 replies

Userlevel 4
Badge +26

Hmm, are you using the "process by group" here in the POPO? From the log it looks like each point is being treated as its own group?​

If you disable everything after this part of the process is it still really slow? In some cases performance issues can be further on but appear to be at a different spot.

Another thing to note, you can see the number of seconds in the log there "​4641.4". This suggests that FME process time (the time fme has been actively using the CPU) was only 77mins. The rest of the time is likely something else - if a database or network is involved these are common culprits.

To see where the time is going you need to take a look at the timestamp. Look for big jumps in the timestamps with minimal cpu time.

The end of the log file should provide a summary of job time vs cpu time. If the cpu and total time are different it means that fme is waiting at some point during the process.

Badge +6

Hmm, are you using the "process by group" here in the POPO? From the log it looks like each point is being treated as its own group?​

If you disable everything after this part of the process is it still really slow? In some cases performance issues can be further on but appear to be at a different spot.

Another thing to note, you can see the number of seconds in the log there "​4641.4". This suggests that FME process time (the time fme has been actively using the CPU) was only 77mins. The rest of the time is likely something else - if a database or network is involved these are common culprits.

To see where the time is going you need to take a look at the timestamp. Look for big jumps in the timestamps with minimal cpu time.

The end of the log file should provide a summary of job time vs cpu time. If the cpu and total time are different it means that fme is waiting at some point during the process.

Thank you @virtualcitymatt​ for your questions & comments. Here are my responces:

 

Yes, the POPO is using a "Group By".

 

It is definitely the POPO that is being processed when writing out the spatial index information to the logfile. At the bottom of FME Desktop there is status that says the following while processing the POPO:

PointOnPointOverlayer_10 (OlverlayFactory): Group 1000 /2844: Finished processing group 1000 / 2844 (this was with a subset of the full dataset).

 

When you mentioned the CPU run time, I wondered if it might be our network that is slowing things down: The Workbench is saved on the network, not locally. So, I did a quick test:

CACHED JOB ON THE NETWORK:

2021-10-26 13:35:33| 37.2| 0.0|INFORM|FME Session Duration: 2 minutes 44.1 seconds. (CPU: 19.6s user, 15.4s system)

 

CACHED JOB SAVED LOCALLY:

2021-10-26 13:37:58| 27.7| 0.0|INFORM|FME Session Duration: 42.9 seconds. (CPU: 15.1s user, 10.5s system)

 

This seems to have highlighted a significant portion of the Performance Issue.

Userlevel 4
Badge +26

Thank you @virtualcitymatt​ for your questions & comments. Here are my responces:

 

Yes, the POPO is using a "Group By".

 

It is definitely the POPO that is being processed when writing out the spatial index information to the logfile. At the bottom of FME Desktop there is status that says the following while processing the POPO:

PointOnPointOverlayer_10 (OlverlayFactory): Group 1000 /2844: Finished processing group 1000 / 2844 (this was with a subset of the full dataset).

 

When you mentioned the CPU run time, I wondered if it might be our network that is slowing things down: The Workbench is saved on the network, not locally. So, I did a quick test:

CACHED JOB ON THE NETWORK:

2021-10-26 13:35:33| 37.2| 0.0|INFORM|FME Session Duration: 2 minutes 44.1 seconds. (CPU: 19.6s user, 15.4s system)

 

CACHED JOB SAVED LOCALLY:

2021-10-26 13:37:58| 27.7| 0.0|INFORM|FME Session Duration: 42.9 seconds. (CPU: 15.1s user, 10.5s system)

 

This seems to have highlighted a significant portion of the Performance Issue.

Interesting finding about running from a local workspace vs running from one on the network.

 

While it may look like the POPO is causing the slowdown here because of the log it doesn't always mean that it is. FME processes by feature (or in some case, group). It will process a feature as far as it can go before starting on the next feature. For the POPO, it's hard to say what is really happening. The POPO will process the first group and these features will continue to be processed until they are blocked by either a blocking transformer or (depending on the format) a writer. This means that there are other features waiting in the POPO not being processed. At least that it my understanding.

 

The only way to really know is to disable everything after the POPO and just add a logger in. If you still get 7 hours then then it's probably fair to assume the POAO. If you get an hour or less then there is something else going on.

 

The best test would be to try and run the same workspace in FME 2018 and then again in FME 2021. This would without a doubt point to the newer version of FME being the issue here.

Reply