Skip to main content
Question

StatisticsCalculator Limits

  • August 11, 2016
  • 9 replies
  • 52 views

I have a workbench that works for 300k entries but seems to run out memory for 1500k entries. The problem happens at the StatisticsCalculator level where I ask to calculte a mean using a group by an attribute. Does someone have an idea on how I can proceed in order to move further.. Thanks.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

9 replies

david_r
Celebrity
  • 8394 replies
  • August 11, 2016

Would trying the 64-bit version of FME be an option for you?


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • August 11, 2016

Would it be possible to read just one group at a time using the WHERE clause in the Reader.

Or Write the groups to different FFS files and use the StatisticsCalculator on each group in a separate workspace.

Both methods are designed to limit the total number of objects in the workspace and loop through the same workspace multiple times.

You could use the WorkspaceRunner to loop.


mark2atsafe
Safer
Forum|alt.badge.img+56
  • Safer
  • 2554 replies
  • August 11, 2016
Can you share the log file with us? It would really help to debug what is going on. Does FME just slow down or does it fail with an "out of memory" error (which can indicate bad data as much as a memory issue)?

 

 


itay
Supporter
Forum|alt.badge.img+18
  • Supporter
  • 1442 replies
  • August 11, 2016

Have you tried running it with parallel processing?


takashi
Celebrity
  • 7843 replies
  • August 13, 2016

Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.


mark2atsafe
Safer
Forum|alt.badge.img+56
  • Safer
  • 2554 replies
  • August 15, 2016

Hi @lam, If the source dataset is a database, I would try to calculate mean values for each group with a SQL statement using the SQLCreator or SQLExecutor. Even if the source is not a database, creating a temporary database for statistics calculation might be effective.

Good one. And the InlineQuerier could be used to create a temporary database inside FME.

 


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3434 replies
  • August 15, 2016

I can run 2.3 million features through the statistics calculator, calculating a mean with a group by with no issue.

It does take 19 minutes v. 6.87 seconds doing it via an SQLcreator however.


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • August 15, 2016

Another suggestion, that was not mentioned here before, is removing all attributes and geometry that is not used in the process. If unused attributes and geometries are part of the feature, they consume memory.

The transformers to use are AttributeKeeper and GeometryRemover.


  • 5 replies
  • December 15, 2017
@lam did you manage to find a solution to your problem with the statisticscalculator? I am experiencing issues with a python overflow error, maybe indicating memory issues.