Feature Caching woes - reuse large attributes/turn some caching off ?

Related products: FME Form

Hi.

I just experienced FME crashing in a fairly limited translation. The problem was the feature caching.

I used a HttpCaller to fetch a GeoTIFF into an attribute, tested it twice (for HTTP result code, and for it being an exception) before writing it to a file. I.e. the output from the HttpCaller is cached by 4 different transformers.

The problem is, that the GeoTIFF in question was 1.5 Gb in size, and I had 6 feature instances with this attribute. I.e. that FME needed to cache approximately 36 Gb in memory. This made it crash off course. Running without feature caching solved the problem.

Howeever, the 4 times cached value for each feature is really unneccessary, since it is the exact same attribute value for each feature.

My suggestion is:

- to only cache such very large feature attributes once, and just link to it in each transformer cache ?

- alternatively, and maybe simpler, enable turning off feature caching on individual transformers ?

Cheers

To second on your suggestions, I would also add to your suggestions:

- Disable feature caching on individual output port. As many transformers will output (cache) unused(filtered out) data repeatedly, which can slow down the authoring of the translation.

Another hack that I learned from @RyanAtSafe is to 'collapse' the bookmark that includes the heavy caching transformers setting inside the bookmark (FME 2018), while being collapsed, the bookmark is now acting a bit like a transformer, so any disconnected output ports will not be caching the data.


We were afraid of this scenario. There is work ongoing to share large attributes so they'd only be written once, and that would help a great deal. We will keep examining but in the meantime, best bet is using a collapsed bookmark to turn this off.


I just posted an article about Feature Caching and performance that might be of help. In general, there are many ways to manage caching, but really the key here is to avoid caching too much data. Until we get Dale's idea implemented, large datasets may always cause issues like this.


https://knowledge.safe.com/articles/79739/feature-caching-and-performance.html?


I, too would love to have caching controllable by transformer, and control by output port would be even better. I am typically working with massive datasets that translate into millions of features being processed in a workspace. I also typically use caching to test workspaces. To reduce processing time during testing, I also use sampling. Testing workspaces by sampling only goes so far, though, because significant versions of features are lost during sampling, making it difficult to identify all the cases of feature versions to handle. For testing, I would like to have the ability to run all features but to narrowly target caching to one or a few transformers.