Skip to main content

Hi all,

I've got a problem that I'm trying to solve. I need to download a large number (circa 50) zip files from this website. They all follows the same format, it's

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip. In each zip file is a GML for that local authority area. I need to download each of them and merge them as one feature in a file geodatabase. I have already tried going through manually and saving the URLs to a CSV and then using the below workbench.

1. CSV Reader.

2. HTTP Caller with the following settings:

3. Feature Reader with the following settings:

3. Attribute Creator using the 'fme_dataset' as the new attribute called 'Local Authority'. This obviously creates a large file path in the following format:

I:\\UK\\OutsideLondon\\Land_Ownership\\Out_East_1\\Data\\Inspire\\Gravesham.zip\\Land_Registry_Cadastral_Parcels.gml.

I would then like to use a StringSearcher to strip out everything except 'Gravesham' so that I only have the local authority name as an attribute.

I then use a ESRIReprojector to set the projection and finally a file geodatabase writer with the geometry as a polygon and the user attributes set to automatic.

This is the whole workbench (minus the StringSeacher because I haven't worked on the regex yet.

When I try and run this I get the following error:

XML Parser error: 'Error in input dataset: file:///xxx/yyyy/zzzz/GIS/Data/Inspire/http_download_1493802040496_7056.html' line:1 column:103 message:unable to connect socket for URL

Along with the error, the files get read as far as the FeatureReader but all end up at the rejected port and when I inspect them there's no geometry present.

The questions I have are:

1. Is there something obvious that I'm doing wrong?

2. Is there any way to get a list of all the URLs together to be able to download?

3. Is there any source of help I can get for the regex on my point 3 above?

Thanks for any help anybody can give me.

The file extension for the HTTPCaller should be .zip instead of .zp above.


Hi @dunuts,

I have downloaded the zip file you mention and read it correctly with the following settings in a GML reader:

Thsi should also wirk in the FeatureReader for all the file downloaded.

Hope this helps.


The fact that you got an html file rather than a zip file back from the HTTPCaller makes it seem like your http request was badly formed, or that some other error occurred. You could set a breakpoint (also called inspection point in earlier versions of FME) just after the HTTPCaller and run the workspace. When it stops at the breakpoint, look at the file contents of the file given in _response_file_path.


Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.


Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

This does help. Can you recommend any resource to get started with S3 downloader etc.? Not sure where to even begin. Thanks

 


This does help. Can you recommend any resource to get started with S3 downloader etc.? Not sure where to even begin. Thanks

 

This is a good starting point:

 

https://knowledge.safe.com/articles/24146/s3objectlister-s3downloader-and-s3uploader-transfo.html

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

 

Actually I was able to download a zip file from this URL using the HTTPCaller. 

 

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

 

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

 


The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

 

Actually I was able to download a zip file from this URL using the HTTPCaller.

 

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

 

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

 

In addition, this workflow extracts 350 Contents from this URL (XML).

 

URL: http://data.inspire.landregistry.gov.uk/

 


Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

@takashi Thank you. I used your below workflow and got the 350 zip files that I wrote to a CSV. Do you know how I could connect the XML fragmenter to a feature reader etc. in order to be able to download the zip files? Thank you.

0684Q00000ArMNxQAN.png


@takashi Thank you. I used your below workflow and got the 350 zip files that I wrote to a CSV. Do you know how I could connect the XML fragmenter to a feature reader etc. in order to be able to download the zip files? Thank you.

That's easy just concatenate the beginning of the url (http://data.inspire.landregistry.gov.uk/) with the Key attribute to form the correct download link. This can all be done in the HTTPCaller's request url parameter.

 

Hope this helps.

 

 


The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

 

Actually I was able to download a zip file from this URL using the HTTPCaller.

 

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

 

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

 

was my initial idea, but got the error mentioned above...

 

 


Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

This screenshot illustrate the following procedure.

 

Request URL: http://data.inspire.landregistry.gov.uk/@Value(Key)

 

Note that there could be non-zip files among the contents, so you will have to filter them by checking the extension, for example.

 

0684Q00000ArMHzQAN.png

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up
2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

@takashi, @ I've tried the exact workbench and setting you've described and I get the below error. Any ideas how I can fix this? Thank you.

0684Q00000ArMbiQAF.png


@takashi, @ I've tried the exact workbench and setting you've described and I get the below error. Any ideas how I can fix this? Thank you.

Looks like HTTP requests for some zip files could not get expected response from the server. The exact reason cannot be identified, but I'm wondering if the zip file does exist in the correct URL actually. Check if the zip file can be downloaded using a web browser.

 

 


@takashi I have tried one or two of the URLs in the browser and they download the zip files without a problem.Would you have any idea if there's anything else I can try? Thanks


@takashi I have tried one or two of the URLs in the browser and they download the zip files without a problem.Would you have any idea if there's anything else I can try? Thanks

Well, does the HTTPCaller download a zip file if you set a known URL to the Request URL? If this couldn't, there could be an issue on network environment.

 

 


Well, does the HTTPCaller download a zip file if you set a known URL to the Request URL? If this couldn't, there could be an issue on network environment.

 

 

@takashi I can download a normal file but still getting the error with this. Thanks for all your help.

Reply