We all want better performance from our workspaces and often that comes down to our design. Sometimes very subtle design changes can produce a big performance difference and this is one example of that.
Workspace Comparison
Here is a workspace that renames incoming attributes with the AttributeManager:
That workspace takes 1.1 seconds to run. Here's another workspace that produces exactly the same output:
Instead of using the AttributeManager to rename the attributes, they are renamed on the writer schema and a value assigned to it.
That workspace takes 5.5 seconds to run!
Explanation
So the second workspace is five times (is that 500%?) slower. Unsurprisingly the time lost in the second workspace is caused by the writer:
Fast Workspace
2022-01-06 13:17:22| 0.8| 0.1|INFORM|Creating writer for format: CSV (Comma Separated Value)
2022-01-06 13:17:22| 0.8| 0.0|INFORM|CSV writer: Writing 100000 feature(s) to file 'CSV.csv'
2022-01-06 13:17:23| 1.0| 0.0|INFORM|CSV writer: Writing 58691 feature(s) to file 'CSV.csv'
Slow Workspace
2022-01-06 13:16:20| 3.8| 3.3|INFORM|Creating writer for format: CSV (Comma Separated Value)
2022-01-06 13:16:20| 3.8| 0.0|INFORM|CSV writer: Writing 100000 feature(s) to file 'CSV.csv'
2022-01-06 13:16:22| 5.9| 1.8|INFORM|CSV writer: Writing 58691 feature(s) to file 'CSV.csv'
Actually, it's the RoutingFactory, but the effect is the same.
The problem is that the writer doesn't understand that it is renaming attributes. The writer believes that it is reassigning values, and that takes more effort.
Also, I don't think that functionality is capable of using "bulk mode" (feature tables) in the same way. So the lesson is that when designing a workspace we should use the proper attribute renaming tools, rather than a workaround like this.
It's a very specific scenario, but it's the sort of technicality that might stick in your memory and actually be useful in other setups.
Other Scenarios
This got me wondering about a number of other scenarios.
Q) What if only one attribute is reassigned in that way? Is that still a problem?
A) Yes, but not as much. This one ran in 2.5 seconds. So it definitely depends on how many attributes are re-assigned.
Q) So does that mean the whole Edit Value tool is something to avoid?
A) Actually, no. It's fine if you set a fixed value. It's only when you reference another value that the issue occurs.
Q) What about if I re-assign values the same way, but in the AttributeManager?
A) That's fine too. I can't explain why, but I suspect that the AttributeManager is smarter than the CSV writer and understands what is happening.
Q) What about manual connections?
A) That too, is fine. FME understands that attributes are to be renamed and not re-assigned.
Q) But we're only talking about a few seconds, right?
A) In my scenario (150,000 features and 8 attributes). But if you have millions of features with a hundred or so attributes, then the time lost is going to be more than a few seconds. One example I saw took 32 hours instead of 2 hours!