Solved

reading uk land registry data

  • 23 September 2014
  • 8 replies
  • 14 views

Hey Everyone,

 

 

i am trying to read some INSPIRE uk Land Registry data (.GML)

 

 

i have read this very helpfull doc

 

http://cdn.safe.com/training/course-materials/fmeuc/Complex-GML-manual.pdf

 

 

in this file, it says i need to load a local Custom .XSD

 

i have downloaded the FMEDATA2014 folder but i cant seem to find the Schema they use as the example.

 

 

can anyone point me in the right direction to use the .XSD that they use in this example,.

 

 

cheers, thanks alot for your help

 

 

For your reference

 

https://www.gov.uk/government/collections/download-inspire-index-polygons
icon

Best answer by davideagle 24 September 2014, 12:01

View original

8 replies

Badge +10
Hi Jimbob - Its much easier than that. You'll note that the Land Registry data includes an incorrect hardcoded XSD path. Clearly a mistake when the data was published. Safe Software got wise to that so in the INSPIRE reader make sure you turn on the parameter to 'Ignore Schema Location in Dataset' and then FME will root around in the install directory and pull out the XSD for the Land Reg data that Safe now ship with FME for you.

 

 

When the data was hosted on the Land Registry's site things weren't quite so easy but since its moved to data.gov.uk you can download the whole country and process it all with 1 FME workspace. Saves you A LOT of clicking to download all that data given the way the downloads have been structured!

 

 

Cheers, Dave
Badge
Hi Jimbob - Its much easier than that. You'll note that the Land Registry data includes an incorrect hardcoded XSD path. Clearly a mistake when the data was published. Safe Software got wise to that so in the INSPIRE reader make sure you turn on the parameter to 'Ignore Schema Location in Dataset' and then FME will root around in the install directory and pull out the XSD for the Land Reg data that Safe now ship with FME for you.

 

 

When the data was hosted on the Land Registry's site things weren't quite so easy but since its moved to data.gov.uk you can download the whole country and process it all with 1 FME workspace. Saves you A LOT of clicking to download all that data given the way the downloads have been structured!

 

 

Cheers, Dave
@1spatialdave

 

Hi Dave,

 

You mentioned that with the way the data is structured on the gov.uk site that it's possible to download and process it all in one workbench. Any pointers on this? I've tried a combination of a csv reader and either a feature reader or HTTP caller but I can't seem to get anywhere. Also, would it be some form of splitter and string concatenator to get the correct URLs for each?

 

Thanks,

 

 

Dónal

 

 

 

Badge +7

@1spatialdave thanks for making me aware of the INSPIRE GML Reader. I was trying to use the standard GML Reader and having problems trying to read the data from inside a ZIP file.

I've also tried to read the data direct from the Gov.UK website by using the URL in my Reader e.g. http://data.inspire.landregistry.gov.uk/Elmbridge.zip/Land_Registry_Cadastral_Parcels.gml

I've tried a backslash and a forward slash after the Zip file name but neither works. I get "HTTP/1.1 403 Forbidden" in the FME log. Is it possible to do this sort of thing? Are there some other Reader parameters I need to set?

If not, I can simply have a text file e.g. CSV containing the list of local authorities I want to download matching the Gov.UK Zip file names and use the HTTPCaller to download them. Then another Workspace can process the local copy of the files. A parent Workspace with WorkspaceRunners can bring it all together.

I discovered that if you expand the Parameters section of the Reader, there's an "INSPIRE Themes" box where you can choose from a whole number of themes. I picked "CadastralParcels (v4.0)" rather than "CadastralParcels (v3.0)" but I can't find anything that tells me which one I should use for the latest data (August 2017). Any ideas?

I've tried both with and without your tip about "Ignore Schema Location in Dataset" and just get this sort of error when processing a local file:

2017-08-14 18:07:33| 1.0| 0.0|ERROR |XML Parser error: 'unable to open primary document entity 'X:\\LR_polys\\Elmbridge.zip\\Land_Registry_Cadastral_Parcels.gml''

Badge

@1spatialdave thanks for making me aware of the INSPIRE GML Reader. I was trying to use the standard GML Reader and having problems trying to read the data from inside a ZIP file.

I've also tried to read the data direct from the Gov.UK website by using the URL in my Reader e.g. http://data.inspire.landregistry.gov.uk/Elmbridge.zip/Land_Registry_Cadastral_Parcels.gml

I've tried a backslash and a forward slash after the Zip file name but neither works. I get "HTTP/1.1 403 Forbidden" in the FME log. Is it possible to do this sort of thing? Are there some other Reader parameters I need to set?

If not, I can simply have a text file e.g. CSV containing the list of local authorities I want to download matching the Gov.UK Zip file names and use the HTTPCaller to download them. Then another Workspace can process the local copy of the files. A parent Workspace with WorkspaceRunners can bring it all together.

I discovered that if you expand the Parameters section of the Reader, there's an "INSPIRE Themes" box where you can choose from a whole number of themes. I picked "CadastralParcels (v4.0)" rather than "CadastralParcels (v3.0)" but I can't find anything that tells me which one I should use for the latest data (August 2017). Any ideas?

I've tried both with and without your tip about "Ignore Schema Location in Dataset" and just get this sort of error when processing a local file:

2017-08-14 18:07:33| 1.0| 0.0|ERROR |XML Parser error: 'unable to open primary document entity 'X:\\LR_polys\\Elmbridge.zip\\Land_Registry_Cadastral_Parcels.gml''

@tim_wood

 

Before the Land Registry forbade access to the page of XML of their Amazon S3 bucket (I've currently got a ticket raised with them about this) I used a workbench to get all of the download links for each local authority. I use a workflow where I pass the URLs for the local authorities I need from a spreadsheet to a feature reader which reads the GML files directly from the zip files and I output this to a file geodatabase. I use an attribute creator to create the local authority name that will go in the attribute table and expose the relevant attributes in the feature reader. I have it automated now and it's working quite well, took a bit of time to setup though.

 

Badge +10

Hi @tim_wood, as @dunuts says you can't now just parse the Land Registry's file paths from their XML any more so I use the attached approach to get the file names required and pass them to a WorkspaceRunner or a FMEServerJobSubmitter that in turn hands off the zip files to a waiting slave Workspace that concurrently loads the data to my database.

Here you go to see how I do it.

readindexpolypaths.fmw

Badge +7

Hi @tim_wood, as @dunuts says you can't now just parse the Land Registry's file paths from their XML any more so I use the attached approach to get the file names required and pass them to a WorkspaceRunner or a FMEServerJobSubmitter that in turn hands off the zip files to a waiting slave Workspace that concurrently loads the data to my database.

Here you go to see how I do it.

readindexpolypaths.fmw

Thanks @1spatialdave but I can do that bit OK - I only need 11 districts so I can just store the names in a CSV. Once I've downloaded the Zip files, what I'm struggling with is using the INSPIRE GML Reader. Either I get zero schemas found and no Feature Types, or (if I use one of the CadastralParcels Themes), I get Feature Types but no data is read when I run the translation - I get errors in the log file like the one I posted previously.

 

 

Previously, I used the standard GML Reader and got a few Feature Types, from which I worked out that I needed the one called PREDEFINED. If I can't get the INSPIRE GML Reader to work, I'll go back to the standard one. But I feel the INSPIRE one ought to give me something better...

 

Badge +10
Thanks @1spatialdave but I can do that bit OK - I only need 11 districts so I can just store the names in a CSV. Once I've downloaded the Zip files, what I'm struggling with is using the INSPIRE GML Reader. Either I get zero schemas found and no Feature Types, or (if I use one of the CadastralParcels Themes), I get Feature Types but no data is read when I run the translation - I get errors in the log file like the one I posted previously.

 

 

Previously, I used the standard GML Reader and got a few Feature Types, from which I worked out that I needed the one called PREDEFINED. If I can't get the INSPIRE GML Reader to work, I'll go back to the standard one. But I feel the INSPIRE one ought to give me something better...

 

No Theme required if By Theme is used or you can also read By XSD. Either way it works, see attached.

 

readlr.fmw

 

 

Badge +7
No Theme required if By Theme is used or you can also read By XSD. Either way it works, see attached.

 

readlr.fmw

 

 

Thanks again @1spatialdave

 

So it seems there's no difference between using the standard GML Reader and the INSPIRE one. Either way you specify the "PREDEFINED" Feature Type.

 

So I'll have one Workspace that downloads the Zip files, then another which loads the data. Now I've reversed out of the Themes cul-de-sac, loading is pretty easy - just point the GML Reader at "X:\\LR_polys\\*.zip\\Land_Registry_Cadastral_Parcels.gml"

 

Reply