I have a workbench that works for 300k entries but seems to run out memory for 1500k entries. The problem happens at the StatisticsCalculator level where I ask to calculte a mean using a group by an attribute. Does someone have an idea on how I can proceed in order to move further.. Thanks.
Would trying the 64-bit version of FME be an option for you?
Would it be possible to read just one group at a time using the WHERE clause in the Reader.
Or Write the groups to different FFS files and use the StatisticsCalculator on each group in a separate workspace.
Both methods are designed to limit the total number of objects in the workspace and loop through the same workspace multiple times.
You could use the WorkspaceRunner to loop.
Have you tried running it with parallel processing?
Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.
Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.
I can run 2.3 million features through the statistics calculator, calculating a mean with a group by with no issue.
It does take 19 minutes v. 6.87 seconds doing it via an SQLcreator however.
Another suggestion, that was not mentioned here before, is removing all attributes and geometry that is not used in the process. If unused attributes and geometries are part of the feature, they consume memory.
The transformers to use are AttributeKeeper and GeometryRemover.