Question

Performance enhancements with UpdateDetector and FeatureMerger

  • 23 April 2018
  • 7 replies
  • 13 views

Badge

I have a workbench that has to process about 600,000 lines and apply updates to a master dataset (.gdb). The attached workbench is a very small sample of what has to be completed. The main issue I am running into is the update detector is extremely slow. A sample run of about 1000 lines took just over 2 hours. So any suggestions of things I can do to speed it up would be much appreciated.

 

I have also found that when I try to run the whole the workbench with the complete dataset the FeatureMerger rejects the suppliers with the rejection code EXTRA_REFERENCEE_FEATURE. This only seems to occur when the number of suppliers gets quite large. I have seen this same rejection code mentioned in reference to duplicates but I don't have any duplicates in the complete supplier dataset.

Any help on either issues would be much appreciated.

Sample of bench

Pic of whole bench


7 replies

Badge +16

As you're dealing with polyline data in GDB then at the risk of the Safe moderator's ire you might like to take a look at Esri's own Detect Feature Changes tool:

Badge

As you're dealing with polyline data in GDB then at the risk of the Safe moderator's ire you might like to take a look at Esri's own Detect Feature Changes tool:

thanks @bruceharold, fair point, but unfortunately not all inputs are GDB polylines.

 

Userlevel 4
Badge +13

I'm surprized 1000 lines would take that long. Any chance you could send us in a sample so we could examine (via support@safe.com or to this forum)?

Without checking into your workflow, if there was any way to use the new FeatureJoiner to reduce the amount of data going into the UpdateDetector, that might help. FeatureJoiner is drastically faster than FeatureMerger, with a different model for doing the work but in practice most FeatureMerger problems can be expressed using a FeatureJoiner (if you have FME 2018 handy)

Badge +2

@cj You might try using CRCCalculator as described here.

 

If you are matching geometry for your updates then reducing the geometry to a single CRC code attribute value allows you to use attribute match instead of a geometry match - which is generally more efficient.
Badge

@cj You might try using CRCCalculator as described here.

 

If you are matching geometry for your updates then reducing the geometry to a single CRC code attribute value allows you to use attribute match instead of a geometry match - which is generally more efficient.
Thanks @MarkAtSafe. I am already using a CRC code for the geometry matching as you describe. I am extracting the true geometry to the _geom attribute, then rounding the coordinates before creating a geometry only CRC.

 

Badge

I'm surprized 1000 lines would take that long. Any chance you could send us in a sample so we could examine (via support@safe.com or to this forum)?

Without checking into your workflow, if there was any way to use the new FeatureJoiner to reduce the amount of data going into the UpdateDetector, that might help. FeatureJoiner is drastically faster than FeatureMerger, with a different model for doing the work but in practice most FeatureMerger problems can be expressed using a FeatureJoiner (if you have FME 2018 handy)

Thanks @daleatsafe. I need to take a look at the FeatureJoiner, as that sounds like a good option. As I was using the CRC code as the join key / key attribute in both the FeatureMerger and then the UpdateDetector I figured (and tested) that any features that exited the FeatureMerger through the Unmerged-requestor port did not need to go to the UpdateDetector as it is performing the same match, which will fail, so it will exit via the Insert port. So these features i by-passed the UpdateDetector - which improved the performance a bit.

 

Badge

Thanks for your suggestions @bruceharold, @MarkAtSafe, and @daleatsafe. I found some significant performance improvements by creating a CRC code based on the selected attributes I was wanting to compare only. I created this CRC code between the FeatureMerger and the UpdateDetector. Therefore in the UpdateDetector I could use the geometry based CRC field as the key feature and check for updates only in the geom CRC field and the attribute CRC field. So it was only having to check two fields instead 60+ fields that the full dataset had. That plus by-passing the UpdateDetector for all features that exited the FeatureMerger from the Unmerged-requestor port, got the processing time down from 2+ hours to 3 minutes for about 1000 lines. Would still like to look into the FeatureJoiner as that sounds like it could improve things further.

Reply