Solved

Workspace Performance Tuning: When and how to use Attribute Management Transformers

  • 13 December 2019
  • 13 replies
  • 12 views

Badge +3

So I'm usually fairly aggressive in large workspaces in keeping the in-workspace Schema tidy and as small as possible after running data through various Transformers that all like to add their own default Attributes and Lists etc.

This trimming of the Attributes and Lists at mid-points through the Workspace is under the belief the passing features with unnecessary attributes and lists that are no longer required through the Workspace would cause overall significant performance penalties. This makes something like AttributeKeeper a very common Transformer in my large or data heavy projects.

However, I find that using AttributeKeeper, AttributeManager and AttributeRemover all have their own overheads to run, and you could slow your Workspace down by using them too often (well so I believe anyway!)

The question is: When and how often should you use these in large projects/workspaces? Is there a best practice in FME for getting a balance on these?

icon

Best answer by markatsafe 19 December 2019, 23:23

View original

13 replies

Userlevel 4

I rarely remove unused attributes more than a couple of times per workspace, but of course it depends. I'm particularly aggressive about removing attributes with large values, e.g. blobs or serialized geometries, as well lists with a lot of members.

Mostly this is done before merge/aggregate or list building operations, where the benefits of a reduced schema are the greatest.

Badge +21

Hi @bwn

I always have an attributeremover/keeper right after a reader and right before a writer. I also try to remove any lists right after they are not needed anymore

Badge +3

Hi @bwn

I always have an attributeremover/keeper right after a reader and right before a writer. I also try to remove any lists right after they are not needed anymore

I similarly do this, although the step prior to the writer is more for final schema tidy up rather than for performance reasons. I don't know if it actually causes any performance issues having the transformer(s) connected to the writer inputs putting out more raw data on their ports than what is needed? I would have thought the writer itself would perform the same regardless but I haven't tested.

Badge +3

I rarely remove unused attributes more than a couple of times per workspace, but of course it depends. I'm particularly aggressive about removing attributes with large values, e.g. blobs or serialized geometries, as well lists with a lot of members.

Mostly this is done before merge/aggregate or list building operations, where the benefits of a reduced schema are the greatest.

I guess my limited performance tests suggest that yes, keeping lists on the attributes is expensive if this are not removed before feeding into most Transformers, and I try to get rid of those at the earliest opportunity and yes the presence of lists particularly seems to slow down things like FeatureJoiner. Complicated geometries similarly I fork out into their own Geometry only pathway and do geometry-less attribute manipulation on a separate track before merging back in at the end, but I tend to not to remove simple geometries of Point features seeing as data wise it is only removing a couple of floating point coordinates from the data and GeometryRemover seems to be one of the slower schema manipulation transformers. But that's only after some trial and error, I still don't fundamentally understand what does and does not cause the greatest performance increases or penalties from either schema clean-ups in the workflow, or just letting the in-built parallelism in FME just do its thing with maybe untidier data, but less transformers having to trim the data down periodically in the middle of workflows.

Badge +21

I similarly do this, although the step prior to the writer is more for final schema tidy up rather than for performance reasons. I don't know if it actually causes any performance issues having the transformer(s) connected to the writer inputs putting out more raw data on their ports than what is needed? I would have thought the writer itself would perform the same regardless but I haven't tested.

I like to be "schema independent"

So I try to always use the "Automatic". This way I have to remove the unneccessary attributes and be aware of the attributes at every stage.

 

The other approach is to be "schema dependent" and try to define the schema.

Both approaches have their advantage and disadvantages.

Userlevel 4

I guess my limited performance tests suggest that yes, keeping lists on the attributes is expensive if this are not removed before feeding into most Transformers, and I try to get rid of those at the earliest opportunity and yes the presence of lists particularly seems to slow down things like FeatureJoiner. Complicated geometries similarly I fork out into their own Geometry only pathway and do geometry-less attribute manipulation on a separate track before merging back in at the end, but I tend to not to remove simple geometries of Point features seeing as data wise it is only removing a couple of floating point coordinates from the data and GeometryRemover seems to be one of the slower schema manipulation transformers. But that's only after some trial and error, I still don't fundamentally understand what does and does not cause the greatest performance increases or penalties from either schema clean-ups in the workflow, or just letting the in-built parallelism in FME just do its thing with maybe untidier data, but less transformers having to trim the data down periodically in the middle of workflows.

If you haven't found it already, there are some great tips here:

https://knowledge.safe.com/articles/579/performance-tuning-fme.html

In particular, I suspect that you'll find the tip about the almost-undocumented FME_PROFILE_RESULT_CSV header directive to be of interest (see under "Monitoring Performance & Profiling").

Badge +2

@bwn In 2019, pretty well all attribute handling transformers have been upgraded to use Bulk Mode. This will greatly improve performance if your readers (most databases, csv, shape, etc.) also read the data in Bulk Mode. There are some cases when an attribute handling transformer will not support Bulk Mode (if they use certain functions or adjacent feature handling).

The one exception is AttributeKeeper. AttributeKeeper will split Feature Tables that have been created by Bulk Mode enabled readers or transformers. So if performance is an issue - avoid AttributeKeeper for now. Use AttributeManager to drop attributes.

Why? To support Bulk Mode in AttributeKeeper requires some tough, underlying code changes to FME, but in FME 2020 it will be one of the few transformers that creates Feature Tables so other transformers will be able to perform at their best - using Bulk Mode.

At the moment, its pretty safe to say that most attribute handling transformers support Bulk Mode (AttributeKeeper excepted). Any transformer (or function) that touches geometry (i.e. Offsetter, PointOnAreaOverlayer) do not support Bulk Mode and will split Feature Tables.

 

Badge +3

@bwn In 2019, pretty well all attribute handling transformers have been upgraded to use Bulk Mode. This will greatly improve performance if your readers (most databases, csv, shape, etc.) also read the data in Bulk Mode. There are some cases when an attribute handling transformer will not support Bulk Mode (if they use certain functions or adjacent feature handling).

The one exception is AttributeKeeper. AttributeKeeper will split Feature Tables that have been created by Bulk Mode enabled readers or transformers. So if performance is an issue - avoid AttributeKeeper for now. Use AttributeManager to drop attributes.

Why? To support Bulk Mode in AttributeKeeper requires some tough, underlying code changes to FME, but in FME 2020 it will be one of the few transformers that creates Feature Tables so other transformers will be able to perform at their best - using Bulk Mode.

At the moment, its pretty safe to say that most attribute handling transformers support Bulk Mode (AttributeKeeper excepted). Any transformer (or function) that touches geometry (i.e. Offsetter, PointOnAreaOverlayer) do not support Bulk Mode and will split Feature Tables.

 

That's an extremely valuable Answer @markatsafe with the comparison of AttributeManager vs AttributeKeeper! Since I'd (it seems incorrectly) viewed AttributeManager as a more expensive and more complex coded AttributeKeeper, then I'd been biasing my workspaces towards AttributeKeeper when choosing between the two.

What about similar transfomers like GeometryRemover, AttributeRemover, AttributeRenamer, BulkAttribute...? Are any of these similarly performance limited?

Badge +3

@bwn In 2019, pretty well all attribute handling transformers have been upgraded to use Bulk Mode. This will greatly improve performance if your readers (most databases, csv, shape, etc.) also read the data in Bulk Mode. There are some cases when an attribute handling transformer will not support Bulk Mode (if they use certain functions or adjacent feature handling).

The one exception is AttributeKeeper. AttributeKeeper will split Feature Tables that have been created by Bulk Mode enabled readers or transformers. So if performance is an issue - avoid AttributeKeeper for now. Use AttributeManager to drop attributes.

Why? To support Bulk Mode in AttributeKeeper requires some tough, underlying code changes to FME, but in FME 2020 it will be one of the few transformers that creates Feature Tables so other transformers will be able to perform at their best - using Bulk Mode.

At the moment, its pretty safe to say that most attribute handling transformers support Bulk Mode (AttributeKeeper excepted). Any transformer (or function) that touches geometry (i.e. Offsetter, PointOnAreaOverlayer) do not support Bulk Mode and will split Feature Tables.

 

My initial testing on using AttributeManager shows this is a great performance tuning tip, so I will mark this as Best Answer :)

The only caveat is that I find that you have to use AttributeManager in a mode without any Attribute Creation within it, otherwise it acts just like an AttributeKeeper and AttributeCreator, without any performance benefits.

Badge +21

@bwn In 2019, pretty well all attribute handling transformers have been upgraded to use Bulk Mode. This will greatly improve performance if your readers (most databases, csv, shape, etc.) also read the data in Bulk Mode. There are some cases when an attribute handling transformer will not support Bulk Mode (if they use certain functions or adjacent feature handling).

The one exception is AttributeKeeper. AttributeKeeper will split Feature Tables that have been created by Bulk Mode enabled readers or transformers. So if performance is an issue - avoid AttributeKeeper for now. Use AttributeManager to drop attributes.

Why? To support Bulk Mode in AttributeKeeper requires some tough, underlying code changes to FME, but in FME 2020 it will be one of the few transformers that creates Feature Tables so other transformers will be able to perform at their best - using Bulk Mode.

At the moment, its pretty safe to say that most attribute handling transformers support Bulk Mode (AttributeKeeper excepted). Any transformer (or function) that touches geometry (i.e. Offsetter, PointOnAreaOverlayer) do not support Bulk Mode and will split Feature Tables.

 

This is very valuable. I wish there would be a way to get this inofrmation directly within FME. Perhaps adding a new color - or a mark on Transformers that are "Bulk Mode compatible"?

Badge +22

@bwn In 2019, pretty well all attribute handling transformers have been upgraded to use Bulk Mode. This will greatly improve performance if your readers (most databases, csv, shape, etc.) also read the data in Bulk Mode. There are some cases when an attribute handling transformer will not support Bulk Mode (if they use certain functions or adjacent feature handling).

The one exception is AttributeKeeper. AttributeKeeper will split Feature Tables that have been created by Bulk Mode enabled readers or transformers. So if performance is an issue - avoid AttributeKeeper for now. Use AttributeManager to drop attributes.

Why? To support Bulk Mode in AttributeKeeper requires some tough, underlying code changes to FME, but in FME 2020 it will be one of the few transformers that creates Feature Tables so other transformers will be able to perform at their best - using Bulk Mode.

At the moment, its pretty safe to say that most attribute handling transformers support Bulk Mode (AttributeKeeper excepted). Any transformer (or function) that touches geometry (i.e. Offsetter, PointOnAreaOverlayer) do not support Bulk Mode and will split Feature Tables.

 

So if I understand correctly, right now, AttributeKeepers take a big performance hit compared to AttributeManagers, but in 4 months or so when FME 2020 is released that will no longer be the case?

Badge +2

So if I understand correctly, right now, AttributeKeepers take a big performance hit compared to AttributeManagers, but in 4 months or so when FME 2020 is released that will no longer be the case?

@jdh yes - I hope that will be the case. In addition, AttributeKepper will build the Feature Table so all downstream transformers will be able to benefit by operating in Bulk Mode.

Badge +2

Is my transformer operating in Bulk Mode? Just watch the feature counts. Readers & Transformers operating in Bulk Mode tend to emit features in blocks of 100000. NonBulked transformers emit features individually. The attached video illustrates this:

2019 (AttributeKeeper) vs 2019(AttributeManager:

Your browser does not support HTML5 video.

Reply