Skip to main content
Archived

Add option 'Data is ordered by fanout' to Writers

Related products:Transformers
siennaatsafe
nathanatsafe
david_r
redgeographics
+30
  • hollyatsafe
  • siennaatsafe
    siennaatsafe
  • nathanatsafe
    nathanatsafe
  • david_r
    david_r
  • redgeographics
    redgeographics
  • takashi
    takashi
  • danilo_fme
    danilo_fme
  • jdh
    jdh
  • erik_jan
    erik_jan
  • tcrossman
    tcrossman
  • thijsknapen
    thijsknapen
  • tomf
    tomf
  • stalknecht
    stalknecht
  • jelle
    jelle
  • arnovananrooij
    arnovananrooij
  • kennyo
    kennyo
  • chriswilson
    chriswilson
  • revesz
    revesz
  • jwe
    jwe
  • bakkermans
  • cwarren
    cwarren
  • lottegis
    lottegis
  • _trent
  • kd
  • maayke
    maayke
  • ekkischeffler
    ekkischeffler
  • gabriel_hirsch
    gabriel_hirsch
  • kim
  • wicki
    wicki
  • g_karssenberg
    g_karssenberg
  • soeren
    soeren
  • baldervisser
  • marike
  • cillingworth81
  • matsatcparta

Whilst working with the Text Writer I noticed a large memory performance difference between having dataset fanout on or off. With fanout peak memory usage was 563332 kB, without it was only 154180 kB. I'm assuming FME only starts to write the first fanout file and keeps the other features in memory until the end, just like with multiple Writers only the first Writer will start immediately.
In my case the data was grouped by fanout value, meaning that it could be written sequentially only having a single file open at a time and opening each file only once. I think it would be good to have an option 'Input is ordered by fanout' on the Writer, similar in functionality to 'Input is ordered by group' on the Aggregator. This would remove the need to keep all features other than those from the first fanout group in memory.

8 replies

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • December 20, 2017

I'm running into a similar issue, where I actually run out of memory due to a dataset fanout. Without the fanout there is no issue processing the data.


fmelizard
Safer
Forum|alt.badge.img+19
  • Safer
  • December 22, 2017
This is a fantastic idea.

 

 


lottegis
Forum|alt.badge.img+1
  • February 26, 2018

+1

Better fanout handling will be nice. I can avoid implementing a parent-child workspace setup that not only increases workspace management, but initiates an additional process which can be valuable for another job to run.


soeren
Contributor
Forum|alt.badge.img+6
  • Contributor
  • December 3, 2018

I think this is a very important feature.

Next step would be to have a parameter to allow the writer to write multiple files at the same time.


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Contributor
  • January 8, 2020

I would also really like an option like this. I've also encountered two situations where an option like this would be very useful;

a) I had a workspace that periodically checked an API to monitor running jobs, in which case I wanted to save the job result for each newly finished job. So features would need to be written out one by one as they passed along.

(whereas using a regular writer jobs results would only get stored in case the workspace detects that all jobs are finished (i.e. in case no new 'finished job features' would enter the writer)).

 

I ended up tackling this issue by using a seperate workspacerunner just to write out the job results. I guess this workaround might also be possible to write out features ordered by groups.

 

Unfortunately this workaround did provide a new issue for me. As I was running this main workspace in multiple parallel streams, when I was using say 5 workspaces at the same time, there would only be 2 available FME.exe processes left for the subworkspace of the mainworkspace. So when more than 3 of the mainworkspaces would be processing the subworkspace, the maximum allowed amount of 7 FME.exe processes became a problem, and not all of the features would be written out. (of course I might be able to do something with looping failed subworkspaces, but I didn't go so far to build a workaround within a workaround).

 

b) I had issues where I had a pretty large amount of features/data that i wanted to write out to multiple formats. Writing to one format was possible, but connecting multiple writers to the same large feature/data stream caused them to be kept into memory until no new features were passed along to any of the writers. Unfortunately this 'default multi writer routine' of FME would take up too much memory of my pc, causing the workspace to crash.

 

For this issue I initially tried to use the 'connection runtime order' of FME to properly regulate the order in which features exit an 'outgoing connection node/junction'. Although this is a nice option in the 'mid translation', at the 'translation end' (with the multiple writers), to my knowledge there is no option to write out features one writer at a time. Thus as mentioned, at the 'translation end' the amount of features (c.q. the total volume of the data) ended up becoming so large that it couldn't be stored in the available system memory, end the workspace crashed.

For this case I ended up using a recorder - featurewriter - featurereader setup.

So first I used a recorder to create a backup of the features/data in the form of an ffs file. Then I wrote out the features using a featurewriter. The benefit being that after the featurewriter is finished it will clean up the available memory. So I used the output port of this initial featurewriter (in combination with a sampler) to read the backed up ffs file of the features (using the featurereader) and write the features out to the next featurewriter, (if needed 'and so on...', luckily not in my case). This workaround thus ensures that only a 'single group/set' of the large feature/data set will be kept in memory for the translation.


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Contributor
  • January 7, 2021

Just noticed that idea seems to be linked to: https://community.safe.com/s/idea/0874Q000000Tl0vQAC/detail


LizAtSafe
Safer
Forum|alt.badge.img+15
  • Safer
  • April 5, 2025
OpenArchived

LizAtSafe
Safer
Forum|alt.badge.img+15
  • Safer
  • April 5, 2025
Using a FeatureWriter with Group By enabled allows for data to be ordered when using fanout.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings