When processing 1000's of files, how can I create output in stages so that I don't loose everything if something goes wrong?

Question

I use FME to extract geometry from 1000's of cad files and save the results in a single SHP file. The source files are listed in a CSV file, and they exist on servers in three different citiies which are all connected with a VPN. It takes a bit of time for FME to read each cad file, filter out the layers and geometry required, and then save them to a SHP file with attributes that point to the original source cad file. Last time I did this, it took about 6 hours to complete, and we have added a lot of data since then. I am worried about what would happen if the VPN connection to a server goes down. Will FME time-out and loose all the data that may have been written to the SHP file?

I am thinking that perhaps I should add a counter and create a new SHP file for every 200 or so source files. That way if something goes wrong, I can figure out where to re-start the process.

thijsknapen · Answer

I'm not sure what will happen if your VPN connection goes down...

If you have doubt, it might indeed be safe to create output in stages. Depending on the type of writer, FME waits until all data is received, before it creates the output dataset. I'm not sure if this is also the case for a Shapefile. I think your idea to 'add a counter and create a new SHP file for every 200 or so source files' is a good one.

In that case I would make sure to create an 'incremental' Batch_id attribute, defined e.g. as '@Evaluate(@floor(@Value(_count)/200))'. If you then use a FeatureWriter, you can enable 'Group Processing', group by this 'Batch_id' attribute, and set Complete Groups to 'When Group Changes (Advanced)'. In that way, for each batch a dataset is created.

That said, if your workspace takes that long to run, maybe it's a good idea to see if you can improve performance somehow. Maybe there are options to apply parallel processing? Easiest would probably be to use a WorkspaceRunner.

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded