Skip to main content

how would i alter a workspace to export files as they are finished rather than holding them all in memory / temp files until everything is processed?

the input is about 200gb of Mapinfo files so this is taxing my system quite a bit. FME seems to not write anything until its completely gone through all the files.

I have a feature reader that processes all the *.tab files in a folder and clips them where they intersect an input polygon it then saves the files into an identical folder / name structure.

FME 2016 ESRI Edition - 16gb ram i7 4790k

You could use a master/child approach. A master workspace with a "Directory and File Pathnames" reader calls a child workspace using a WorkspaceRunner. That child workspace is called once per feature, i.e. once for every .tab file, with that filename as parameter. The child workspace has a MapInfo reader and does the actual work. That way you don't have to read all of the 200Gb in memory at the same time.


I was going to suggest using a FeatureWriter, but upon testing I believe it exhibits the same behaviour -- in a dataset fanout case, we only have 1 actual writer going at a time. For now, the best option does seem to be @redgeographics above. Sorry. We'll keep working...


@redgeographics @daleatsafe

I've tried to get this working using a workspace runner and i can't seem to quite make it work.

The inputs are a MapInfo Reader - just a polygon with my region of interest

and more importantly the FeatureReader that contains the path and wildcard to parse through all the tab files in the directory.

when i try to do a workspace runner with the path and directory reader i think its only choosing the variable (windows_path) for the input clip region. as soon as i run it it says complete but doesn't seem to actually do anything.


Here's what I came up with, using the FME sample data which has a collection of shapefiles with contours of Vancouver, that I'm clipping using a neighborhood boundary. So different formats and a lot less data but in broad terms it's the same as what you're trying to do.

All of the reading is happening in the child workspace, the master workspace is just there to start the child workspace once per input file. Important to keep in mind here is that I've set the WorkspaceRunner to wait for the current job (child process) to complete before starting a new one. You can try running multiple job simultaneously and save time, but you may run into memory problems that way.

master.fmw

child.fmw


Reply