Skip to main content

I've been working on a new workflow and in 2020 I've been excited to see the increased support for Bulk Mode - It's awesome.

 

 

However, I feel like in the workflow that I've been creating Bulk Mode has been causing some performance issues.

 

 

In a few places in the log I see things like this

 

2020-06-23 09:47:51|   5.8|  1.4|INFORM|Aggregator_4 (AggregateFactory): Group 1 / 1: Preparing to divide 11 features into groups
2020-06-23 09:47:52|   6.1|  0.3|INFORM|Aggregator_4 (AggregateFactory): Group 1 / 1: Dividing 11 features into groups
2020-06-23 09:47:52|   6.3|  0.2|INFORM|Aggregator_4 (AggregateFactory): Group 1 / 1: Splitting bulk mode features into features
2020-06-23 09:47:53|   8.0|  1.8|INFORM|Aggregator_6 (AggregateFactory): Group 1 / 1: Preparing to divide 4 features into groups
2020-06-23 09:47:54|   8.5|  0.5|INFORM|Aggregator_6 (AggregateFactory): Group 1 / 1: Dividing 4 features into groups
2020-06-23 09:47:54|   8.7|  0.2|INFORM|Aggregator_6 (AggregateFactory): Group 1 / 1: Splitting bulk mode features into features
2020-06-23 09:48:05|  19.9| 11.2|INFORM|SQLExecutor_15 (QueryFactory): Splitting bulk mode features into features

The SQLExecutor here take 11 seconds - Here it's one feature going in and ~20,000 coming out (I'm merging attributes)

 

 

Admittedly there are quite a large number of attributes that I'm working with here. 

 

 

There is no way for me to only process in Bulk Mode due to the transformers not being supported

 

 

What I'm after is for some tips that people might have to get the most out of bulk mode and reduce (as much as possible) the effects from splitting the features up.

 

 

Thanks in advance

I know @daleatsafe was looking into things that 'break' bulk mode, things like conditional testing in AttributeManager. I like InlineQuerier myself, at the cost of figuring out a fancy SQL statement you can get amazing performance. PythonCaller similarly if your data lends itself to that.


@virtualcitymatt For those not familiar with Bulk Mode in FME check the FME documentation: How FME Improves Performance with Bulk Mode

Not all transformers, readers or writers support bulk mode. At the time of writing (FME 2020) the general rule for Bulk Mode (Feature Tables) is that most transformers that support attribute transformations are bulk mode enabled. exceptions are AttributeManager / AttributeCreator / Tester where more complex FME Functions or Conditionals are used. (AttributeKeeper will support Bulk Mode in FME 2020.1).

Most transformers that carry out geometry operations (Aggregator, Offsetter, PointOnAreaOverlayer are examples) are not Bulk Mode enabled ad will split the Feature tables into single features.


@virtualcitymatt For those not familiar with Bulk Mode in FME check the FME documentation: How FME Improves Performance with Bulk Mode

Not all transformers, readers or writers support bulk mode. At the time of writing (FME 2020) the general rule for Bulk Mode (Feature Tables) is that most transformers that support attribute transformations are bulk mode enabled. exceptions are AttributeManager / AttributeCreator / Tester where more complex FME Functions or Conditionals are used. (AttributeKeeper will support Bulk Mode in FME 2020.1).

Most transformers that carry out geometry operations (Aggregator, Offsetter, PointOnAreaOverlayer are examples) are not Bulk Mode enabled ad will split the Feature tables into single features.

Thanks Mark, yeah I read that doc page and it was pretty helpful for sure. Ironically I've been using the Aggregator (process by ordered group) to reduce the number calls to the database to improve performance, but maybe there is a smarter way to build and IN statement from a list of id's - I'll ask the community

 

 

As far as I can tell from my workflow there is little chance of me to preserve the feature tables all the way through - Good news about the AttributeKeeper though - that will help with my workflow I think, especially for building the IN Statement as I only need one attribute there.

 

 

I found that when I use the group-by with the Sampler this also split the features tables. Duplicate filter had a much better result.

 

 

Do you guys have specifics on how much of an influence the number (and type) of attributes might effect the process?

 

I'm reading from a number of related tables which have quite a few different attributes so there is a big mixture of missing and attributes. Some are character varying 4000...

I just want to know If I should put more effort into pruning the unneeded attributes when I can or if the effort will produce


I know @daleatsafe was looking into things that 'break' bulk mode, things like conditional testing in AttributeManager. I like InlineQuerier myself, at the cost of figuring out a fancy SQL statement you can get amazing performance. PythonCaller similarly if your data lends itself to that.

Hiya @bruceharold - Thanks a bunch fir your input. We kiwis got to help each other out ;-)

 

 

Yes, Great tip about the InlineQuerier - I will have to see if there is use for it in the workflow. I suspect there is some good improvements to be made. Making use of the PythonCaller, however, I'm sure there are more places where it could really help.

 

 


Reply