Question

Downloading Multiple ZIP files from URLs

8 years ago
May 3, 2017
17 replies
309 views

dunuts
18 replies

Hi all,

I've got a problem that I'm trying to solve. I need to download a large number (circa 50) zip files from this website. They all follows the same format, it's

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip. In each zip file is a GML for that local authority area. I need to download each of them and merge them as one feature in a file geodatabase. I have already tried going through manually and saving the URLs to a CSV and then using the below workbench.

1. CSV Reader.

2. HTTP Caller with the following settings:

3. Feature Reader with the following settings:

3. Attribute Creator using the 'fme_dataset' as the new attribute called 'Local Authority'. This obviously creates a large file path in the following format:

I:\\UK\\OutsideLondon\\Land_Ownership\\Out_East_1\\Data\\Inspire\\Gravesham.zip\\Land_Registry_Cadastral_Parcels.gml.

I would then like to use a StringSearcher to strip out everything except 'Gravesham' so that I only have the local authority name as an attribute.

I then use a ESRIReprojector to set the projection and finally a file geodatabase writer with the geometry as a polygon and the user attributes set to automatic.

This is the whole workbench (minus the StringSeacher because I haven't worked on the regex yet.

When I try and run this I get the following error:

XML Parser error: 'Error in input dataset: file:///xxx/yyyy/zzzz/GIS/Data/Inspire/http_download_1493802040496_7056.html' line:1 column:103 message:unable to connect socket for URL

Along with the error, the files get read as far as the FeatureReader but all end up at the rejected port and when I inspect them there's no geometry present.

The questions I have are:

1. Is there something obvious that I'm doing wrong?

2. Is there any way to get a list of all the URLs together to be able to download?

3. Is there any source of help I can get for the regex on my point 3 above?

Thanks for any help anybody can give me.

dunuts
Author
18 replies
8 years ago
May 3, 2017

The file extension for the HTTPCaller should be .zip instead of .zp above.

+17

itay
Supporter
1441 replies
8 years ago
May 3, 2017

Hi @dunuts,

I have downloaded the zip file you mention and read it correctly with the following settings in a GML reader:

Thsi should also wirk in the FeatureReader for all the file downloaded.

Hope this helps.

david_r
8355 replies
8 years ago
May 3, 2017

The fact that you got an html file rather than a zip file back from the HTTPCaller makes it seem like your http request was badly formed, or that some other error occurred. You could set a breakpoint (also called inspection point in earlier versions of FME) just after the HTTPCaller and run the workspace. When it stops at the breakpoint, look at the file contents of the file given in _response_file_path.

+17

itay
Supporter
1441 replies
8 years ago
May 3, 2017

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

dunuts
Author
18 replies
8 years ago
May 3, 2017

itay wrote:

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

This does help. Can you recommend any resource to get started with S3 downloader etc.? Not sure where to even begin. Thanks

david_r
8355 replies
8 years ago
May 3, 2017

dunuts wrote:

This does help. Can you recommend any resource to get started with S3 downloader etc.? Not sure where to even begin. Thanks

This is a good starting point:

https://knowledge.safe.com/articles/24146/s3objectlister-s3downloader-and-s3uploader-transfo.html

takashi
7718 replies
8 years ago
May 3, 2017

itay wrote:

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

Actually I was able to download a zip file from this URL using the HTTPCaller.

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

takashi
7718 replies
8 years ago
May 3, 2017

takashi wrote:

The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

Actually I was able to download a zip file from this URL using the HTTPCaller.

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

In addition, this workflow extracts 350 Contents from this URL (XML).

URL: http://data.inspire.landregistry.gov.uk/

dunuts
Author
18 replies
8 years ago
May 3, 2017

itay wrote:

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

@takashi Thank you. I used your below workflow and got the 350 zip files that I wrote to a CSV. Do you know how I could connect the XML fragmenter to a feature reader etc. in order to be able to download the zip files? Thank you.

+17

itay
Supporter
1441 replies
8 years ago
May 3, 2017

dunuts wrote:

That's easy just concatenate the beginning of the url (http://data.inspire.landregistry.gov.uk/) with the Key attribute to form the correct download link. This can all be done in the HTTPCaller's request url parameter.

Hope this helps.

+17

itay
Supporter
1441 replies
8 years ago
May 3, 2017

takashi wrote:

The XML document seems to be just a list of contents within a S3 bucket, and I think each content (*.zip) can be downloaded via general HTTP GET request.

Actually I was able to download a zip file from this URL using the HTTPCaller.

http://data.inspire.landregistry.gov.uk/Abertawe_-_Swansea.zip

Anyway, first of all, make sure that the URL read from the CSV table is the correct location of a zip file on web.

was my initial idea, but got the error mentioned above...

takashi
7718 replies
8 years ago
May 3, 2017

itay wrote:

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

This screenshot illustrate the following procedure.

Request URL: http://data.inspire.landregistry.gov.uk/@Value(Key)

Note that there could be non-zip files among the contents, so you will have to filter them by checking the extension, for example.

dunuts
Author
18 replies
8 years ago
May 4, 2017

itay wrote:

Agreed, the url http://data.inspire.landregistry.gov.uk/ is actually a S3 bucket and accessing it via the HTTPCaller doesnt work as expected and stops with the following error:

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): HTTP/FTP transfer error: 'Couldn't connect to server'

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): Please ensure that your network connection is properly set up

2017-05-03 13:53:11|  45.6|  0.0|ERROR |HTTPCaller(HTTPFactory): No proxy settings have been entered.  If you require a proxy to access external URLs, please ensure the appropriate information has been entered

Since I dont have any idea if the connection is set properly, I would suggest trying the S3 transformers to get the data.

For the LocalAuthority I would use the AttributeSplitter on the path and grab the correct element and clean it up.

I:\UK\OutsideLondon\Land_Ownership\Out_East_1\Data\Inspire\Gravesham.zip\Land_Registry_Cadastral_Parcels.gml. > split

Gravesham.zip > clean (remove .zip) > result is Gravesham

Hope this helps.

@takashi, @ I've tried the exact workbench and setting you've described and I get the below error. Any ideas how I can fix this? Thank you.

takashi
7718 replies
8 years ago
May 4, 2017

dunuts wrote:

@takashi, @ I've tried the exact workbench and setting you've described and I get the below error. Any ideas how I can fix this? Thank you.

Looks like HTTP requests for some zip files could not get expected response from the server. The exact reason cannot be identified, but I'm wondering if the zip file does exist in the correct URL actually. Check if the zip file can be downloaded using a web browser.

dunuts
Author
18 replies
8 years ago
May 5, 2017

@takashi I have tried one or two of the URLs in the browser and they download the zip files without a problem.Would you have any idea if there's anything else I can try? Thanks

takashi
7718 replies
8 years ago
May 5, 2017

dunuts wrote:

@takashi I have tried one or two of the URLs in the browser and they download the zip files without a problem.Would you have any idea if there's anything else I can try? Thanks

Well, does the HTTPCaller download a zip file if you set a known URL to the Request URL? If this couldn't, there could be an issue on network environment.

dunuts
Author
18 replies
8 years ago
May 10, 2017

takashi wrote:

Well, does the HTTPCaller download a zip file if you set a known URL to the Request URL? If this couldn't, there could be an issue on network environment.

@takashi I can download a normal file but still getting the error with this. Thanks for all your help.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Downloading Multiple ZIP files from URLs

17 replies

Reply

Helpful Members This Week

Recently Solved Questions

Read Settings from Delimited Text File

Generic source file name confusion? Or bad workflow?

Truncate SDE table with archiving enabled

Dissolver - Attributes to Sum and Multi Polygons:1+2 = 5

How to see which features have invalid source datasets when using a FeatureWrite?

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

BPF formaticon

New Reade/Writer: BPF (Binary Point File)

Single Upload-Field for multiple file formats in FME Flow or a Workspace-Appicon

Download multiple file formats from SFTP - Iterate through subfoldersicon

Generic Writer - LAS format / Allow user to select formaticon

Helpful Members This Week

Recently Solved Questions

Read Settings from Delimited Text File

Generic source file name confusion? Or bad workflow?

Truncate SDE table with archiving enabled

Dissolver - Attributes to Sum and Multi Polygons:1+2 = 5

How to see which features have invalid source datasets when using a FeatureWrite?

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings