Skip to main content
Question

Workspace temp files management


langley.select
Contributor
Forum|alt.badge.img+5

I have a workspace that is moving 18million records between databases.  Part of the translation involves updating geometry and reprojecting.  I am running the workspace WITHOUT the option to cache features; however the GeometryReplacer transformer seems to be writing quite a bit of data to memory (expected) and that is not purged (reading 100k features at a time). 

 

Because it’s not cleaning up itself, the workspace is just erroring out because it’s running out of disk space.  I can stop the translation and update the start feature number to continue, but this is cumbersome. 

 

Any ideas how to fix this so I can run this unattended?

 

 

16 replies

nielsgerrits
VIP
Forum|alt.badge.img+54

I would try a parent workspace with workspacerunner to run the actual workspace but in smaller chunks to check / test if the temp files will be cleared if the partial run is done.


hkingsbury
Celebrity
Forum|alt.badge.img+53
  • Celebrity
  • April 14, 2025

My understanding of how FME works with handling memory/temp is:

  • A feature is read in from source to memory
  • it is processed until it either
    • gets written out with a writer
    • reaches a blocking transformer
  • When a feature is written out, it is no longer needed and therefore removed from memory
  • If a feature reaches a blocking transformer, it needs to be kept in memory until all features are recieved
  • When memory is exhausted, data is written to the temp drive

What I suspect is that you have a blocking transformer somewhere that is causing all 18m records to written to a temp location. A couple of potential solutions are:

  • Optimise order of operations to reduce the need for blocking transformers
  • Reduce the size of each feature by removing unneeded attributes
  • Batch the process into smaller chunks. Use a parent workspace with a workspace runner to run subsets
  • Increase temp storage

nielsgerrits
VIP
Forum|alt.badge.img+54
hkingsbury wrote:

...What I suspect is that you have a blocking transformer somewhere that is causing all 18m records to written to a temp location…

Ah yes, if it is a design “flaw”, optimising the workflow is a better solution.

 


takashi
Influencer
  • April 14, 2025

Hi ​@langley.select ,

See the translation log to check the current available temorary disk space. There should be a message in the 30th line or so, something like this.

System Status: 812.72 GB of disk space available in the FME temporary folder (...)

 

If the available disk space was too small unexpectedly, increasing the space as ​@hkingsbury mensioned could be a quick solution.

cf. Setting the Temporary Folder
https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Form/QuickTranslator/Temporary_Folder_Determination.htm


langley.select
Contributor
Forum|alt.badge.img+5

I have 30GB of disk space available to me and 32GB memory.  The workflow is:

 

postgres reader → attribute manager (30 attributes) → coordinate system setter → geometry replacer → reprojector → sde writer

I able to “batch” it by incrementing the start feature and max features to read parameters on the postgres reader, but it takes an increasingly long time to read with each iteration.  it doesn't seem to matter the values here as it seems to be reading in and processsing 100k features at a time.  I actually don't know where that is specified.  I tried setting the “features to read at a time’” to a lower value but it seems like that only bundles them together at the reader before passing them on to the attribute manager.  

 

The values above reflect my attempt to manually ‘batch’ things; it's not a great solution though.

it’s not reading all 18 million at a time cause I can see them flowing to the writer 100k at a time.  I can also see that the memory/disk space issue is related to the geometry replacer transformer as that’s where it starts triggering the “optimizing memory” messages in the log.  

Does anyone have any examples of batching this that I can reference?


takashi
Influencer
  • April 15, 2025

I don't think GeometryReplacer causes memory issue since it's not a blocking-transformer. Rather than that, SDE writer could cause the issue since it caches features depending on transaction size.

Do you need to keep "shape" attribute value after replacing it with geometry?
If not, you can set Yes to the Remove Attribute parameter in the GeometryReplacer and it could be effect to increase performance.

If you would like to perform batch, firstly create published user parameters that link to reader parameters - Start Feature, Max Features to Read. You can then pass preferable value to the parameter through WorkspaceRunner in other workspace. 

To create a published user parameter linked to a reader parameter, right click on the reader parameter > Create User Parameter.

See also the attached two workspaces that demonstrate how you can run a workspace through WorkspaceRunner in another one for each chunk.


virtualcitymatt
Celebrity
Forum|alt.badge.img+35
langley.select wrote:

I have 30GB of disk space available to me and 32GB memory.

This could indeed be a little higher - I always like to have at least about 3 - 4 times the amount of disk space available as I have memory. 

Windows will use your disk as virtual memory. “Memory” is constantly swapped into and out of RAM via a process called swapping. 

When your data hits the writer the memory increases, this memory increase will force the “Memory” from other unused (but running apps) apps to be swapped onto the disk. 

In addition FME has it’s own built in memory mechanisms to do essentially the same thing. If FME detects a low memory configuration it will activity start dumping data to the FME_TEMP location if it can. 

From what I remember I think writers will wait until all features are processed before they begin to be written/committed. I do know that FeatureWriters can help work around this issue in some cases. You could look at trying to swap out your Writer for a FeatureWriter instead, but it may not help. 

If it were me I would look to start by freeing up space before trying anything else here.

 


takashi
Influencer
  • April 15, 2025

I agree with ​@virtualcitymatt , feel 30GB is a bit small for a large scale data translation. I would like to keep hundreds GB - a few TB for FME temporary disk space if possible.


langley.select
Contributor
Forum|alt.badge.img+5

I honestly don’t understand why anything is being cached.  The workflow doesn’t care about other features.  Each feature should be able to flow through, ideally, as it is read.  Several things confuse me, most significantly, why the workflow waits until 100k features are read to pass them to the next transformer, and second why it’s holding them in memory at all.  As they are written, I don’t care about them anymore.   

Someone said earlier that it was probably a design flaw.  I probably agree, but as I'm relatively new to FME I don’t have any idea where to modify things.  


virtualcitymatt
Celebrity
Forum|alt.badge.img+35

Right, the 100,000 feature count is an indicator that FME is processing the data in bulk mode. It's much, much faster than the old method in the majority of cases. Previously FME would process each Feature one by one and it was just slow by comparison. 

 

I think you should find that the writer in your workspace won't start writing the data to the DB until all features are read and processed (I could be wrong though). This is definitely not ideal. Is that what you've seen?

 

This was where my suggestion of trying out the FeatureWriter Transformer as an alternative may help. The idea being that it may instead start writing data preventing the memory clog. Certainly the option to process by ordered group should help there.

 


langley.select
Contributor
Forum|alt.badge.img+5
virtualcitymatt wrote:

Right, the 100,000 feature count is an indicator that FME is processing the data in bulk mode. It's much, much faster than the old method in the majority of cases. Previously FME would process each Feature one by one and it was just slow by comparison. 

 

I think you should find that the writer in your workspace won't start writing the data to the DB until all features are read and processed (I could be wrong though). This is definitely not ideal. Is that what you've seen?

 

This was where my suggestion of trying out the FeatureWriter Transformer as an alternative may help. The idea being that it may instead start writing data preventing the memory clog. Certainly the option to process by ordered group should help there.

 

I have 18M records.  It’s reading and processing through in batches of 100k.  The problem though is it’s holding on to the processed features in memory even though I have feature caching disabled and no blocking transformers.  Trying to find a way to efficiently process all the data. 

I’ve increased drive space to 70GB which allows me to run 800k records before it runs out of space.  Each time I restart though it takes longer and longer to get to where it left off - I'm setting the “Start Feature” parameter on the reader, but it has to read up to that feature before it starts.  at 3M it’s taking about an hour to resume translation.  Ideally I would just pass an offset argument to the sql statement but not sure I can do that with the database reader.  Basically I’m just getting increasingly stuck here and not sure I have a viable way forward. 


virtualcitymatt
Celebrity
Forum|alt.badge.img+35

Ok looking at the data types here there is something indeed funny going in. Thats a huuuuge string - this must be where all the memory is going. I've never seen a varchar of that length before. I didn't even know  that could happen. This would also explain what it's so slow.

Indeed if you're able to convert these into fme geometries and then extract them again as stings again I think that should "fix" the strings.

But there must be a better way to request these data from the DB.

 

 


langley.select
Contributor
Forum|alt.badge.img+5
virtualcitymatt wrote:

Ok looking at the data types here there is something indeed funny going in. Thats a huuuuge string - this must be where all the memory is going. I've never seen a varchar of that length before. I didn't even know  that could happen. This would also explain what it's so slow.

Indeed if you're able to convert these into fme geometries and then extract them again as stings again I think that should "fix" the strings.

But there must be a better way to request these data from the DB.

 

 

you know what? I think that’s it!  Well in part.  There’s still the question of why it’s not releasing stuff from memory… 


nielsgerrits
VIP
Forum|alt.badge.img+54

I think you want to use a PostGIS reader, not a postgres reader. This will just read features with geometry. Then set the source coordinate system in the reader and the target coordinate system in the writer. This should eliminate all transformers?


virtualcitymatt
Celebrity
Forum|alt.badge.img+35
nielsgerrits wrote:

I think you want to use a PostGIS reader, not a postgres reader. This will just read features with geometry. Then set the source coordinate system in the reader and the target coordinate system in the writer. This should eliminate all transformers?

What he said


nielsgerrits
VIP
Forum|alt.badge.img+54
virtualcitymatt wrote:
nielsgerrits wrote:

I think you want to use a PostGIS reader, not a postgres reader. This will just read features with geometry. Then set the source coordinate system in the reader and the target coordinate system in the writer. This should eliminate all transformers?

What he said

Well, you started it with suggesting the large varchars were… special :)


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings