Archived

Add option 'Data is ordered by fanout' to Writers

Related products:Transformers

9 years ago
April 5, 2016
8 replies
16 views

kim
54 replies

Whilst working with the Text Writer I noticed a large memory performance difference between having dataset fanout on or off. With fanout peak memory usage was 563332 kB, without it was only 154180 kB. I'm assuming FME only starts to write the first fanout file and keeps the other features in memory until the end, just like with multiple Writers only the first Writer will start immediately.
In my case the data was grouped by fanout value, meaning that it could be written sequentially only having a single file open at a time and opening each file only once. I think it would be good to have an option 'Input is ordered by fanout' on the Writer, similar in functionality to 'Input is ordered by group' on the Aggregator. This would remove the need to keep all features other than those from the first fanout group in memory.

+28

jdh
Contributor
1981 replies
7 years ago
December 20, 2017

I'm running into a similar issue, where I actually run out of memory due to a dataset fanout. Without the fanout there is no issue processing the data.

+19

fmelizard
Safer
3725 replies
7 years ago
December 22, 2017

This is a fantastic idea.

lottegis
31 replies
7 years ago
February 26, 2018

Better fanout handling will be nice. I can avoid implementing a parent-child workspace setup that not only increases workspace management, but initiates an additional process which can be valuable for another job to run.

soeren
Contributor
10 replies
6 years ago
December 3, 2018

I think this is a very important feature.

Next step would be to have a parameter to allow the writer to write multiple files at the same time.

+10

thijsknapen
Contributor
154 replies
5 years ago
January 8, 2020

I would also really like an option like this. I've also encountered two situations where an option like this would be very useful;

a) I had a workspace that periodically checked an API to monitor running jobs, in which case I wanted to save the job result for each newly finished job. So features would need to be written out one by one as they passed along.

(whereas using a regular writer jobs results would only get stored in case the workspace detects that all jobs are finished (i.e. in case no new 'finished job features' would enter the writer)).

I ended up tackling this issue by using a seperate workspacerunner just to write out the job results. I guess this workaround might also be possible to write out features ordered by groups.

Unfortunately this workaround did provide a new issue for me. As I was running this main workspace in multiple parallel streams, when I was using say 5 workspaces at the same time, there would only be 2 available FME.exe processes left for the subworkspace of the mainworkspace. So when more than 3 of the mainworkspaces would be processing the subworkspace, the maximum allowed amount of 7 FME.exe processes became a problem, and not all of the features would be written out. (of course I might be able to do something with looping failed subworkspaces, but I didn't go so far to build a workaround within a workaround).

b) I had issues where I had a pretty large amount of features/data that i wanted to write out to multiple formats. Writing to one format was possible, but connecting multiple writers to the same large feature/data stream caused them to be kept into memory until no new features were passed along to any of the writers. Unfortunately this 'default multi writer routine' of FME would take up too much memory of my pc, causing the workspace to crash.

For this issue I initially tried to use the 'connection runtime order' of FME to properly regulate the order in which features exit an 'outgoing connection node/junction'. Although this is a nice option in the 'mid translation', at the 'translation end' (with the multiple writers), to my knowledge there is no option to write out features one writer at a time. Thus as mentioned, at the 'translation end' the amount of features (c.q. the total volume of the data) ended up becoming so large that it couldn't be stored in the available system memory, end the workspace crashed.

For this case I ended up using a recorder - featurewriter - featurereader setup.

So first I used a recorder to create a backup of the features/data in the form of an ffs file. Then I wrote out the features using a featurewriter. The benefit being that after the featurewriter is finished it will clean up the available memory. So I used the output port of this initial featurewriter (in combination with a sampler) to read the backed up ffs file of the features (using the featurereader) and write the features out to the next featurewriter, (if needed 'and so on...', luckily not in my case). This workaround thus ensures that only a 'single group/set' of the large feature/data set will be kept in memory for the translation.

+10

thijsknapen
Contributor
154 replies
4 years ago
January 7, 2021

Just noticed that idea seems to be linked to: https://community.safe.com/s/idea/0874Q000000Tl0vQAC/detail

+15

LizAtSafe
Safer
1505 replies
2 months ago
April 5, 2025

Open→Archived

+15

LizAtSafe
Safer
1505 replies
2 months ago
April 5, 2025

Using a FeatureWriter with Group By enabled allows for data to be ordered when using fanout.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Add option 'Data is ordered by fanout' to Writers