Skip to main content

I use FME to extract geometry from 1000's of cad files and save the results in a single SHP file. The source files are listed in a CSV file, and they exist on servers in three different citiies which are all connected with a VPN. It takes a bit of time for FME to read each cad file, filter out the layers and geometry required, and then save them to a SHP file with attributes that point to the original source cad file. Last time I did this, it took about 6 hours to complete, and we have added a lot of data since then. I am worried about what would happen if the VPN connection to a server goes down. Will FME time-out and loose all the data that may have been written to the SHP file?

 

I am thinking that perhaps I should add a counter and create a new SHP file for every 200 or so source files. That way if something goes wrong, I can figure out where to re-start the process.

 

 

I'm not sure what will happen if your VPN connection goes down...

 

If you have doubt, it might indeed be safe to create output in stages. Depending on the type of writer, FME waits until all data is received, before it creates the output dataset. I'm not sure if this is also the case for a Shapefile. I think your idea to 'add a counter and create a new SHP file for every 200 or so source files' is a good one.

In that case I would make sure to create an 'incremental' Batch_id attribute, defined e.g. as '@Evaluate(@floor(@Value(_count)/200))'. If you then use a FeatureWriter, you can enable 'Group Processing', group by this 'Batch_id' attribute, and set Complete Groups to 'When Group Changes (Advanced)'. In that way, for each batch a dataset is created.

 

That said, if your workspace takes that long to run, maybe it's a good idea to see if you can improve performance somehow. Maybe there are options to apply parallel processing? Easiest would probably be to use a WorkspaceRunner.


I'm not sure what will happen if your VPN connection goes down...

 

If you have doubt, it might indeed be safe to create output in stages. Depending on the type of writer, FME waits until all data is received, before it creates the output dataset. I'm not sure if this is also the case for a Shapefile. I think your idea to 'add a counter and create a new SHP file for every 200 or so source files' is a good one.

In that case I would make sure to create an 'incremental' Batch_id attribute, defined e.g. as '@Evaluate(@floor(@Value(_count)/200))'. If you then use a FeatureWriter, you can enable 'Group Processing', group by this 'Batch_id' attribute, and set Complete Groups to 'When Group Changes (Advanced)'. In that way, for each batch a dataset is created.

 

That said, if your workspace takes that long to run, maybe it's a good idea to see if you can improve performance somehow. Maybe there are options to apply parallel processing? Easiest would probably be to use a WorkspaceRunner.

Thanks for the suggestions. I never knew about WorkspaceRunner before and will try it next time. The grouping option means I don't have to do it manually which I am doing now!

The process did crash on me when the FeatureReader came across a DWG file that is corrupt. FME seems to time out while trying to read the file, and even though I have the workspace set to continue processing on errors, it terminates. My current solution is to remove that file from the input file after I discover which one it is and then re-running the workspace.


Reply