Question

Is batch deploy with attribute fanout possible?

  • 15 January 2016
  • 4 replies
  • 1 view

Badge

I am converting Ordnance Survey mastermap in gml format (which comes in chunks) to .dwg format for use in AutoCAD with the added twist of creating a separate .dwg file for every OS 1:1250 grid. To achieve this I am reading in an grid file and then using fanout by attribute (the OS grid e.g. SE0007NE). I would like to produce a series of folders named after the OS chunk with the corresponding .dwg files inside.

When I use batch deploy and select 'retain source dataset basename in destination path' a folder is created for the first record but it is empty and all the .dwg file fall outside it. I have see other posts on here which suggesting using fme_basename for the fanout, but I'd like to retain my original fanout (OS grid). Is there a way of fanning out by basename directory and another attribute?

Any advice will be much appreciated. Thanks :)


4 replies

Badge +14

A few things to be aware of:

1) My preference for this kind of task is to use the WorkspaceRunner. It's simpler to use and will allow you to process 7 DWG files concurrently.

2) An initial load to a set of staging tables will allow your process to be more efficient, without the need for blocking transformers.

3) Fanout will also need to be handled at the dataset level when writing to DWG.

Consider the following approach (though it depends on the volume of GZs your processing as to the exact workflow:

  • Workspace 1, load all GZs for your region to a database. PostGIS, SQlite or even FME's FFS. Use this process to tag all features with the 1250 tiles they are inside. Read the Grid in as the first reader, use the Clipper with 'Clippers first' defined to cut each feature and assign the grid. Apply any necessary styling and any other values to the OSMM features.
  • Workspace 2, setup to read your grid file tile names and form a SQL Statement or value to populate your child workspace parameters with on a per tile basis to pass to the WorkspaceRunner. This will allow all child workspaces to read in ONLY 1 tile worth of data at a time and write to DWG... but you'll be able to run upto 7 at once.
  • Workspace 3, the child workspace with the database reader and a DWG writer. The writer set to write out in dynamic mode so all necessary layers are written to each DWG. Then set Fanout on the database and use the tilename attribute so that you get 1 DWG per tile.
Badge +14

A few things to be aware of:

1) My preference for this kind of task is to use the WorkspaceRunner. It's simpler to use and will allow you to process 7 DWG files concurrently.

2) An initial load to a set of staging tables will allow your process to be more efficient, without the need for blocking transformers.

3) Fanout will also need to be handled at the dataset level when writing to DWG.

Consider the following approach (though it depends on the volume of GZs your processing as to the exact workflow:

  • Workspace 1, load all GZs for your region to a database. PostGIS, SQlite or even FME's FFS. Use this process to tag all features with the 1250 tiles they are inside. Read the Grid in as the first reader, use the Clipper with 'Clippers first' defined to cut each feature and assign the grid. Apply any necessary styling and any other values to the OSMM features.
  • Workspace 2, setup to read your grid file tile names and form a SQL Statement or value to populate your child workspace parameters with on a per tile basis to pass to the WorkspaceRunner. This will allow all child workspaces to read in ONLY 1 tile worth of data at a time and write to DWG... but you'll be able to run upto 7 at once.
  • Workspace 3, the child workspace with the database reader and a DWG writer. The writer set to write out in dynamic mode so all necessary layers are written to each DWG. Then set Fanout on the database and use the tilename attribute so that you get 1 DWG per tile.

Please note there is also both a GML Reader and an OS (GB) MasterMap Reader in FME. The former reads the GML in its raw state, the latter does some schema mapping behind the scenes and also deals with some of the geometry, for example does some work on CartoText for you.

Badge

Many thanks for your advice @1spatialdave I have managed to achieve what I wanted to with this. I ended up calling FME from a .bat file and specifying the --SourceDataset and --DestDataset for each OS gml chunk. The resultant .bat is long and probably a bit clumsy but it gets the job done! I have taken your comments on board for future workspaces! thanks

Badge +14

Many thanks for your advice @1spatialdave I have managed to achieve what I wanted to with this. I ended up calling FME from a .bat file and specifying the --SourceDataset and --DestDataset for each OS gml chunk. The resultant .bat is long and probably a bit clumsy but it gets the job done! I have taken your comments on board for future workspaces! thanks

No problem, your approach is absolutely acceptable and a good way to manage memory use. One tweak to your current method that may increase your speed (hardware dependant) is to create several batch files and split the jobs across each batch. Then run them all at the same time.

In FME Desktop you can launch 8 concurrent fme.exe instances before you hit the license cap. So lets say you have a 6 core machine, you could comfortably build 4 x batch files, split the jobs across them and run them all at once. You'll need to see how your RAM is handled but in theory you could be looking at a process that did take 8 hours, now taking 2ish.

Hope that helps.

Reply