Question

Downloaded zipped GML files to file geodatabase


Badge

Hi all,

I am trying to create a workbench that will read/download 52 individual URLs and write them as 52 individual feature classes in a file geodatabase as well as 52 individual dwgs both using part of the original URL as the file name and for the file geodatabase as an attribute under the feature class. I will hopefully have this run monthly as a batch file.

I have a list of 52 URLs that are stored in a spreadsheet (example of a few attached). Each URL is a download link for a .zip file that contains an individual GML file. I can pass the URLs successfully to my HTTPCaller which saves the response body to file.

An example of the original URL called is this below:

http://data.inspire.landregistry.gov.uk/Kensington_and_Chelsea.zip

Whereas the response file that is saved is called

'http_download_1497369123968_8888.zip'

I need to have a way to transfer the actual URL name from the spreadsheet so that it can be an attribute of my output. I'm not sure how to do this?

Below is a screenshot of my workbench as it is currently, when I try to run it the three problems I get are:

  1. An error stating “FileGDB Writer: A feature with feature type `PREDEFINED' could not be written”.
  2. I don’t know how to relate the response file path to the URL name so that I know what local authority each refers.
  3. I don’t know how to write each as an individual feature class. I haven’t even looked at the dwg part yet.

Any help anybody can give me would be greatly appreciated, thanks.


14 replies

Userlevel 4
Badge +25

Hi @dunuts

That's an interesting task, and I thought at first that you are really close.

The FeatureReader says for attributes "Only Use Result". If you want to keep Actual_URL you need to make that "Merge Initiator and Result". Then you'll get that attribute. So that's one problem out of the way.

Because this is a dynamic translation you'll need to tell the writer where to get the schema from. I thought you could just use the attribute for that, but apparently you can't. I think we'll need a schema reader somewhere to get that information, and a FeatureReader can't pass it on to the dynamic writer.

OK, this is getting more complicated. Give me a short while and I'll look into how best to do this. It might be that we need a two-workspace solution here.

Mark

Userlevel 4
Badge +25

OK. This is complicated and there are a number of questions and issues.

I can't get this to be truly dynamic. Truly dynamic would mean that we download the GML file, read its schema with the schema reader, and apply that schema to the output (dynamic) dataset; simply using two FeatureReader transformers.

However, I can't get that to work. I think one problem is that FME won't read the schema from a GML dataset if it's in a zip file (I've filed that problem with our developers as PR#78109).

Additionally, do you want the output to all go into a table called PREDEFINED? I can get it to go into such a table, with an attribute that defines the authority the data comes from, but I can't get it to write to a separate table for each authority because the source layer/table does not have that name (so the dynamic writer can't find a schema to match).

Anyway, I got it to work as a two-workspace solution. The first downloads the data, then passes the filename on to the second to process it. It's the only way I could make it dynamic. I don't know if you can get it to work in a single workspace because to set the output schema requires a dataset path that isn't supplied by a FeatureReader.

So, workspaces are attached: authorityprocessing.zip.

Notice the published parameter in the child workspace to pass the authority name. The FilenameExtractor transformer is something I use to decode the URL and extract the authority name from it. It seems to work quite well!

Also notice that - because it runs the same workspace multiple times - you can't use the Overwrite Existing Geodatabase setting. Or, at least, you can't set it to yes. So you need to make sure the Geodatabase is deleted before you run the process, or you could add a Python startup script to the first workspace to delete it.

The WorkspaceRunner I have set to wait until one process is done before the next starts. Geodatabase being a database, you might be able to have multiple connections all writing simultaneously, but I didn't want to chance it.

Hope this helps! You might need to do some minor touchups to the output (eg the authority field is set to only 20 characters) but this should be a good start.

Mark

Userlevel 2
Badge +17

Hi @dunuts, it seems that all the GML datasets only have a single feature type called PREDEFINED, and its schema is identical and fixed. I suppose that you intend to read the PREDEFINED features from the 52 URLs (zip files) and write them into each feature class with the same name as the zip file root name (e.g. "Islington", "Kensington_and_Chelsea", etc.).

If I understand your requirement correctly, you can write the features into your desired destination feature classes using feature type Fanout mechanism. e.g.

0684Q00000ArLDDQA3.png

If you set this regex to the destination writer feature type name (Feature Class name),  each feature will be written into a feature class whose name is the source zip file name without extension.

@ReplaceRegEx(@Value(Actual_URL),^.+/(.+?)\.zip$,\1)

Result from the 4 sample URLs.

0684Q00000ArLVtQAN.png

Userlevel 2
Badge +17

Hi @dunuts, it seems that all the GML datasets only have a single feature type called PREDEFINED, and its schema is identical and fixed. I suppose that you intend to read the PREDEFINED features from the 52 URLs (zip files) and write them into each feature class with the same name as the zip file root name (e.g. "Islington", "Kensington_and_Chelsea", etc.).

If I understand your requirement correctly, you can write the features into your desired destination feature classes using feature type Fanout mechanism. e.g.

0684Q00000ArLDDQA3.png

If you set this regex to the destination writer feature type name (Feature Class name),  each feature will be written into a feature class whose name is the source zip file name without extension.

@ReplaceRegEx(@Value(Actual_URL),^.+/(.+?)\.zip$,\1)

Result from the 4 sample URLs.

0684Q00000ArLVtQAN.png

You can also configure dynamic writer feature type if necessary.

 

 

0684Q00000ArMoBQAV.png

Dynamic Writer Feature Type Configuration

 

  • Feature Class: @ReplaceRegEx(@Value(Actual_URL),^.+/(.+?)\.zip$,\1)
  • Geometry: First Feature Defines Geometry Type
  • Schema Source: "Schema From Schema Feature"
  • Schema Definition Name: fme_feature_type

 

Userlevel 2
Badge +17

OK. This is complicated and there are a number of questions and issues.

I can't get this to be truly dynamic. Truly dynamic would mean that we download the GML file, read its schema with the schema reader, and apply that schema to the output (dynamic) dataset; simply using two FeatureReader transformers.

However, I can't get that to work. I think one problem is that FME won't read the schema from a GML dataset if it's in a zip file (I've filed that problem with our developers as PR#78109).

Additionally, do you want the output to all go into a table called PREDEFINED? I can get it to go into such a table, with an attribute that defines the authority the data comes from, but I can't get it to write to a separate table for each authority because the source layer/table does not have that name (so the dynamic writer can't find a schema to match).

Anyway, I got it to work as a two-workspace solution. The first downloads the data, then passes the filename on to the second to process it. It's the only way I could make it dynamic. I don't know if you can get it to work in a single workspace because to set the output schema requires a dataset path that isn't supplied by a FeatureReader.

So, workspaces are attached: authorityprocessing.zip.

Notice the published parameter in the child workspace to pass the authority name. The FilenameExtractor transformer is something I use to decode the URL and extract the authority name from it. It seems to work quite well!

Also notice that - because it runs the same workspace multiple times - you can't use the Overwrite Existing Geodatabase setting. Or, at least, you can't set it to yes. So you need to make sure the Geodatabase is deleted before you run the process, or you could add a Python startup script to the first workspace to delete it.

The WorkspaceRunner I have set to wait until one process is done before the next starts. Geodatabase being a database, you might be able to have multiple connections all writing simultaneously, but I didn't want to chance it.

Hope this helps! You might need to do some minor touchups to the output (eg the authority field is set to only 20 characters) but this should be a good start.

Mark

Hi @Mark2AtSafe, regarding the PR#78109, it seems that the FeatureReader can read schema features if the zip file has been downloaded by the preceding HTTPCaller and saved into a temporary path create by the TempPathnameCreator. However, if you set the URL to the Dataset parameter of the FeauteReader directly, the FeatureReader can download the zip file and read data features successfully, but won't read schema features.

 

Badge

Hi @dunuts, it seems that all the GML datasets only have a single feature type called PREDEFINED, and its schema is identical and fixed. I suppose that you intend to read the PREDEFINED features from the 52 URLs (zip files) and write them into each feature class with the same name as the zip file root name (e.g. "Islington", "Kensington_and_Chelsea", etc.).

If I understand your requirement correctly, you can write the features into your desired destination feature classes using feature type Fanout mechanism. e.g.

0684Q00000ArLDDQA3.png

If you set this regex to the destination writer feature type name (Feature Class name),  each feature will be written into a feature class whose name is the source zip file name without extension.

@ReplaceRegEx(@Value(Actual_URL),^.+/(.+?)\.zip$,\1)

Result from the 4 sample URLs.

0684Q00000ArLVtQAN.png

Thank you very much, do you know if there would be a way in the same workspace to convert the feature classes in the file geodatabase to separate AutoCAD DWGs? I do not need to keep any of the attributes, just the geometry.

 

 

Badge

OK. This is complicated and there are a number of questions and issues.

I can't get this to be truly dynamic. Truly dynamic would mean that we download the GML file, read its schema with the schema reader, and apply that schema to the output (dynamic) dataset; simply using two FeatureReader transformers.

However, I can't get that to work. I think one problem is that FME won't read the schema from a GML dataset if it's in a zip file (I've filed that problem with our developers as PR#78109).

Additionally, do you want the output to all go into a table called PREDEFINED? I can get it to go into such a table, with an attribute that defines the authority the data comes from, but I can't get it to write to a separate table for each authority because the source layer/table does not have that name (so the dynamic writer can't find a schema to match).

Anyway, I got it to work as a two-workspace solution. The first downloads the data, then passes the filename on to the second to process it. It's the only way I could make it dynamic. I don't know if you can get it to work in a single workspace because to set the output schema requires a dataset path that isn't supplied by a FeatureReader.

So, workspaces are attached: authorityprocessing.zip.

Notice the published parameter in the child workspace to pass the authority name. The FilenameExtractor transformer is something I use to decode the URL and extract the authority name from it. It seems to work quite well!

Also notice that - because it runs the same workspace multiple times - you can't use the Overwrite Existing Geodatabase setting. Or, at least, you can't set it to yes. So you need to make sure the Geodatabase is deleted before you run the process, or you could add a Python startup script to the first workspace to delete it.

The WorkspaceRunner I have set to wait until one process is done before the next starts. Geodatabase being a database, you might be able to have multiple connections all writing simultaneously, but I didn't want to chance it.

Hope this helps! You might need to do some minor touchups to the output (eg the authority field is set to only 20 characters) but this should be a good start.

Mark

Thank you Mark for your work, much appreciated. For some reason (perhaps because I'm running FME 2016) the parent runs but the child workspace will not run, the log file just says that a fatal error occurred and doesn't give any further detailed information (that I can see), should the log file go into more detailed information? Thanks

 

 

Userlevel 2
Badge +17
Thank you very much, do you know if there would be a way in the same workspace to convert the feature classes in the file geodatabase to separate AutoCAD DWGs? I do not need to keep any of the attributes, just the geometry.

 

 

Do you need to create 52 DWG files for each source URL (zipped GML file)? What file names and layer names do you want?

 

Badge
Do you need to create 52 DWG files for each source URL (zipped GML file)? What file names and layer names do you want?

 

Yes, I need 52 individual DWG files for each source, the file name and layer name (if possible) need to have the local authority name (as in the same naming convention for the feature classes you helped with earlier). Any help would be great, thank you very much.

 

 

Userlevel 2
Badge +17
Thank you very much, do you know if there would be a way in the same workspace to convert the feature classes in the file geodatabase to separate AutoCAD DWGs? I do not need to keep any of the attributes, just the geometry.

 

 

Branch the output from the FeatureTypeFilter and connect to a DWG writer feature type.To separate destination DWG files (datasets) according to the source zip file name, set the Fanout Expression in the writer to configure Dataset Fanout. The destination layer (feature type) name can be set with Feature Type Fanout. See here to learn more about Dataset Fanout and Feature Type Fanout.

 

FME Workbench | Separating Output Data with FanoutIn this workflow, the zip file name without extension is saved as an attribute called "_rootname" by the AttributeCreator, and it will be used in the GDB / DWG writer feature types and DWG Dataset Fanout Expression.

 

 

Badge
Branch the output from the FeatureTypeFilter and connect to a DWG writer feature type.To separate destination DWG files (datasets) according to the source zip file name, set the Fanout Expression in the writer to configure Dataset Fanout. The destination layer (feature type) name can be set with Feature Type Fanout. See here to learn more about Dataset Fanout and Feature Type Fanout.

 

FME Workbench | Separating Output Data with FanoutIn this workflow, the zip file name without extension is saved as an attribute called "_rootname" by the AttributeCreator, and it will be used in the GDB / DWG writer feature types and DWG Dataset Fanout Expression.

 

 

@takashi

 

Thank you very much that worked perfectly. Last question, If I run this as a batch file every so often what do I need to do to get it to overwrite the previously created dwgs and file geodatabase? Thanks
Userlevel 2
Badge +17
Branch the output from the FeatureTypeFilter and connect to a DWG writer feature type.To separate destination DWG files (datasets) according to the source zip file name, set the Fanout Expression in the writer to configure Dataset Fanout. The destination layer (feature type) name can be set with Feature Type Fanout. See here to learn more about Dataset Fanout and Feature Type Fanout.

 

FME Workbench | Separating Output Data with FanoutIn this workflow, the zip file name without extension is saved as an attribute called "_rootname" by the AttributeCreator, and it will be used in the GDB / DWG writer feature types and DWG Dataset Fanout Expression.

 

 

DWG files are overwritten always when you write features into the same name files. For the GDB dataset, you can set overwriting mode to the GDB writer through the Overwrite Existing Geodatabase parameter shown on the Navigator.

 

 

Badge
Thank you very much, do you know if there would be a way in the same workspace to convert the feature classes in the file geodatabase to separate AutoCAD DWGs? I do not need to keep any of the attributes, just the geometry.

 

 

@takashi It all worked fine and was writing the CAD files until I got the following error:

 

 

AutoCAD Writer: An error occurred for dataset 'I:\\M drive copy\\Drawings\\LR INSPIRE index polygons\\Automate\\Inspire_CAD\\Lewisham.dwg', in function RealDWGWriter::saveDatabaseAs: 'eFileSharingViolation'

 

AutoCAD Writer: Failed to open file file path 'I:\\M drive copy\\Drawings\\LR INSPIRE index polygons\\Automate\\Inspire_CAD\\Lewisham.dwg'. It is possible that the file is open or in use by AutoCAD or another program. Please terminate any external program access to the file and try again

 

A fatal error has occurred. Check the logfile above for details

 

... Last line repeated 2 times ...

 

 

I'm going to try it again and hope it works, it has worked for the previous 31 dwgs, it wrote them perfectly.

 

 

Userlevel 2
Badge +17
@takashi It all worked fine and was writing the CAD files until I got the following error:

 

 

AutoCAD Writer: An error occurred for dataset 'I:\\M drive copy\\Drawings\\LR INSPIRE index polygons\\Automate\\Inspire_CAD\\Lewisham.dwg', in function RealDWGWriter::saveDatabaseAs: 'eFileSharingViolation'

 

AutoCAD Writer: Failed to open file file path 'I:\\M drive copy\\Drawings\\LR INSPIRE index polygons\\Automate\\Inspire_CAD\\Lewisham.dwg'. It is possible that the file is open or in use by AutoCAD or another program. Please terminate any external program access to the file and try again

 

A fatal error has occurred. Check the logfile above for details

 

... Last line repeated 2 times ...

 

 

I'm going to try it again and hope it works, it has worked for the previous 31 dwgs, it wrote them perfectly.

 

 

I suspect other application is using the dwg file exclusively. Firstly make sure that you have not opened the file with any application.

 

 

Reply