Question

StatisticsCalculator Limits

  • 11 August 2016
  • 9 replies
  • 4 views

I have a workbench that works for 300k entries but seems to run out memory for 1500k entries. The problem happens at the StatisticsCalculator level where I ask to calculte a mean using a group by an attribute. Does someone have an idea on how I can proceed in order to move further.. Thanks.


9 replies

Userlevel 4

Would trying the 64-bit version of FME be an option for you?

Userlevel 2
Badge +16

Would it be possible to read just one group at a time using the WHERE clause in the Reader.

Or Write the groups to different FFS files and use the StatisticsCalculator on each group in a separate workspace.

Both methods are designed to limit the total number of objects in the workspace and loop through the same workspace multiple times.

You could use the WorkspaceRunner to loop.

Userlevel 4
Badge +25
Can you share the log file with us? It would really help to debug what is going on. Does FME just slow down or does it fail with an "out of memory" error (which can indicate bad data as much as a memory issue)?

 

 

Badge +16

Have you tried running it with parallel processing?

Userlevel 2
Badge +17

Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.

Userlevel 4
Badge +25

Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.

Good one. And the InlineQuerier could be used to create a temporary database inside FME.

 

Userlevel 1
Badge +21

I can run 2.3 million features through the statistics calculator, calculating a mean with a group by with no issue.

It does take 19 minutes v. 6.87 seconds doing it via an SQLcreator however.

Userlevel 2
Badge +16

Another suggestion, that was not mentioned here before, is removing all attributes and geometry that is not used in the process. If unused attributes and geometries are part of the feature, they consume memory.

The transformers to use are AttributeKeeper and GeometryRemover.

@lam did you manage to find a solution to your problem with the statisticscalculator? I am experiencing issues with a python overflow error, maybe indicating memory issues.

 

 

Reply