You could use a master/child approach. A master workspace with a "Directory and File Pathnames" reader calls a child workspace using a WorkspaceRunner. That child workspace is called once per feature, i.e. once for every .tab file, with that filename as parameter. The child workspace has a MapInfo reader and does the actual work. That way you don't have to read all of the 200Gb in memory at the same time.
I was going to suggest using a FeatureWriter, but upon testing I believe it exhibits the same behaviour -- in a dataset fanout case, we only have 1 actual writer going at a time. For now, the best option does seem to be @redgeographics above. Sorry. We'll keep working...
@redgeographics @daleatsafe
I've tried to get this working using a workspace runner and i can't seem to quite make it work.
The inputs are a MapInfo Reader - just a polygon with my region of interest
and more importantly the FeatureReader that contains the path and wildcard to parse through all the tab files in the directory.
when i try to do a workspace runner with the path and directory reader i think its only choosing the variable (windows_path) for the input clip region. as soon as i run it it says complete but doesn't seem to actually do anything.
Here's what I came up with, using the FME sample data which has a collection of shapefiles with contours of Vancouver, that I'm clipping using a neighborhood boundary. So different formats and a lot less data but in broad terms it's the same as what you're trying to do.
All of the reading is happening in the child workspace, the master workspace is just there to start the child workspace once per input file. Important to keep in mind here is that I've set the WorkspaceRunner to wait for the current job (child process) to complete before starting a new one. You can try running multiple job simultaneously and save time, but you may run into memory problems that way.
master.fmw
child.fmw