Skip to main content

I've just had a support request which reminded me of an issue I had a few months ago using a looping transformer to page through HTTP requests, where I was ending up with more data than expected. The cause was the reusing a list attribute and I'd like the communities opinion on whether this behavior is a bug or just something to be aware of.

You can see what is happening with a simple workspace like

when the _list attribute is reused in the second AttributeSplitter it is not getting cleared before hand so it is retaining the 4 and 5 from the first AttributeSplitter.

It's easy to correct once you know what is happening but it can cause some head scratching to debug, particularly in a looping transformer. Should anything that creates a list clear out any existing data first or are their situations were this behavior is wanted?

Hmmm, interesting thought. It depends whether you are used to thinking of Lists as like an array, or just a special type of attribute that otherwise behaves like a conventional FME Feature Attribute. I similarly found I had to adjust my thinking of Lists being arrays, to instead just being a collection of Attributes that additionally happen to have an Index property.

Outside FME, we are used to operations that create an "array" will typically destroy the original array object, however the data concepts in FME are a bit different. For instance, in FME, renaming an Attribute to the name of an existing Attribute doesn't destroy all the original Attributes necessarily since an Attribute exists as an object that is local to the each individual feature vs a Table Field that is a global object to an entire table and all contained records (at least for conventional database formats that use this concept)

Setting Transformers to output to Attributes with a name that has already previously been used acts more like an Update operation: For any given Feature, if the "Updating" Transformer doesn't output an Attribute value, despite the user setting the output Attribute Name to be the same as one that already exists, there is nothing to do any update/replacement operation with. The Target Attribute of that Feature will not get a "Missing" value (ie. Won't be destroyed), it will simply retain its original value and remain intact.

Similarly with Lists, it could be argued that the second AttributeSplitter isn't actually deleting and replacing a List "object" from scratch, and is displaying behaviour consistent with how almost all other Transfomers that output Feature-by-Feature Attribute values work. What it is instead doing is updating any List Attributes of the same name, but only if it outputs a List Attribute of the same name. If AttributeSplitter_2 outputs an Attribute with a name of "_list{4}" then it will update any existing Attribute named "_list{4}", but if it doesn't output an Attribute of "_list{4}" then no Attribute update will occur and any existing Attribute of that name will remain unaffected.

The same behaviour happens if you use AttributeCreator or AttributeManager to calculate a value but choose to output to an an Attribute with the same Name as an already existing Attribute. If they evaluate to "Missing" for any particular feature, then the original Attribute Value (if it exists) will remain unupdated. ie. It won't be removed.

It's just that we are so used to thinking that data is "structured" and consists of "tables", "fields", and "arrays" that we kind of expect FME operations to operate like we are used to seeing, but that is not how FME treats data at all and is instead dealing with it in an unstructured way.

The workarounds of course are to either:

  • Don't reuse list names unless you want to do a List update
  • Remove lists with AttributeRemover as soon as you are finished with them, and is good general practice because Lists/Array Collections can often be performance killers if a Transfomer that doesn't need them still has to process them and output them.

 


Reply