Question

Download multiple files using HTTPCaller from URLs stored in CSV

  • 17 November 2015
  • 9 replies
  • 240 views

Badge
Hello!

 

I'm a first time poster and newbie FME user and hoping someone might have some ideas to help me solve what it a fairly simple task but incredibly time intensive without coding or the use of FME.

 

 

I need to download and then process just over 150 shapefiles that are posted on an open website. The state of Massachusetts posts building footprints on their website for download. I need to download each file and combine them into one large geodatabase. I have some additional processing and attribute changes I need to do in the middle but have a good handle on that stuff.

 

 

What I'm thinking is:

 

- Store the URL strings for each of the file downloads in a CSV file

 

- Somehow leverage the HTTP caller to iterate through each of the URLs in the CSV and pull the shapefiles into my workspace.

 

- From there feed each file into a shapefile reader and then move forward with my processing and writing to geodatabase.

 

 

Here is the website where the data are housed - http://goo.gl/Jzo0QK

 

 

Any thoughts on how to pull each file and combine it into one geodatabase would be greatly appreciated.

 

 

Thanks,

 

~Ben

 

 

 


9 replies

Badge +22
That should be relatively easy.

 

 

Read in the csv, send the feature to the HTTPCaller (save response body to file, new file per feature).

 

Use a feature reader to read in the downloaded shapefiles, process as desired.

 

 

You will probably need a few transformers to parse your downloaded file name to send to the feature reader as an attribute.
Badge
Thank you @jdh

 

That is incredibly straight forward! I am running my first shot workspace right now to see what I get. I really appreciate this forum and the chance to get answers from folks like yourself!

 

 

Thanks,

 

~Ben
Badge
@jdh

 

Well, I guess I tragically overlooked a few things and now must come back for another round of assistance. I am struggling with how to go from the HTTPCaller to a reader. When I use the httpcaller it seems to be storing my data in something other than a shapefile. How do I go about hooking up the httpcaller transformer to a reader that will recognize the downloaded data for what it truly is and then push it to a merged geodatabase.

 

 

Userlevel 2
Badge +17
Hi,

 

 

The HTTPCaller downloads the zip file into your specified folder and will store the file path into the "File Path Attribute" (called "_response_file_path" by default), but the download file name will not have extension unless you set the "Output Filename" parameter explicitly. The FeatureReader cannot assume that the download file is a zip file if it doesn't have the extension ".zip".

 

Therefore, you may have to create an attribute which stores download file name (full path) corresponding to each URL beforehand. For example:

 

  1. AttributeSplitter: Split URL by / (slash) and store the components into a list attribute.
  2. ListIndexer: Set -1 to the "List Index" parameter to demote the last element (i.e. zip file name) of the list to a regular attribute.
  3. StringConcatenator: Concatenate a directory path in your disk system, a path delimiter, and the zip file name (*.zip) to create full path attribute.
Then connect the HTTPCaller. Set "No" to the "Create a New File Per Feature" parameter, and set the full path attribute to the "Output Filename" parameter. You can then pass the full path to the "Dataset" parameter of the FeatureReader to read shape file which has been archived in the downloaded zip file.

 

 

Another thought.

 

If you don't need to save download zip files, you can also pass the URL directly to "Dataset" of the FeatureReader. Generally FME readers have ability to read features from URL.

Takashi
Userlevel 4
Badge +13
Hi,

 

 

The HTTPCaller downloads the zip file into your specified folder and will store the file path into the "File Path Attribute" (called "_response_file_path" by default), but the download file name will not have extension unless you set the "Output Filename" parameter explicitly. The FeatureReader cannot assume that the download file is a zip file if it doesn't have the extension ".zip".

 

Therefore, you may have to create an attribute which stores download file name (full path) corresponding to each URL beforehand. For example:

 

  1. AttributeSplitter: Split URL by / (slash) and store the components into a list attribute.
  2. ListIndexer: Set -1 to the "List Index" parameter to demote the last element (i.e. zip file name) of the list to a regular attribute.
  3. StringConcatenator: Concatenate a directory path in your disk system, a path delimiter, and the zip file name (*.zip) to create full path attribute.
Then connect the HTTPCaller. Set "No" to the "Create a New File Per Feature" parameter, and set the full path attribute to the "Output Filename" parameter. You can then pass the full path to the "Dataset" parameter of the FeatureReader to read shape file which has been archived in the downloaded zip file.

 

 

Another thought.

 

If you don't need to save download zip files, you can also pass the URL directly to "Dataset" of the FeatureReader. Generally FME readers have ability to read features from URL.

Takashi

Takashi's last point that you can just feed the http://.....zip straight into the feature reader as the dataset is bang on correct. You'd be setting the dataset from the attribute holding the URL -- you do this by clicking on the v menu to the right of Dataset.

FME will take any dataset that is a downloadable zip file, and automatically pull it down, and unzip it, read it, and then clean it up. I think his solution should *just work*.

Badge
@takashi

 

Thank you! My first thought was to pass the URLs to the feature reader and then I could not figure a way to do that. Your enlightenment is greatly appreciated! Thanks for the ideas. I'll post back if I run into any other hurdles but think this will get me pointed in the right direction! This is a workflow that I need to leverage frequently so this is a huge help!

 

 

Thanks for the help!

 

~Ben
Badge +6
Hi,

 

 

The HTTPCaller downloads the zip file into your specified folder and will store the file path into the "File Path Attribute" (called "_response_file_path" by default), but the download file name will not have extension unless you set the "Output Filename" parameter explicitly. The FeatureReader cannot assume that the download file is a zip file if it doesn't have the extension ".zip".

 

Therefore, you may have to create an attribute which stores download file name (full path) corresponding to each URL beforehand. For example:

 

  1. AttributeSplitter: Split URL by / (slash) and store the components into a list attribute.
  2. ListIndexer: Set -1 to the "List Index" parameter to demote the last element (i.e. zip file name) of the list to a regular attribute.
  3. StringConcatenator: Concatenate a directory path in your disk system, a path delimiter, and the zip file name (*.zip) to create full path attribute.
Then connect the HTTPCaller. Set "No" to the "Create a New File Per Feature" parameter, and set the full path attribute to the "Output Filename" parameter. You can then pass the full path to the "Dataset" parameter of the FeatureReader to read shape file which has been archived in the downloaded zip file.

 

 

Another thought.

 

If you don't need to save download zip files, you can also pass the URL directly to "Dataset" of the FeatureReader. Generally FME readers have ability to read features from URL.

Takashi
Oh, It's great !

 

Badge +6

To further simplify workspace, you can even get the URL list directly from the web's table instead of CSV, and then use @takashi's solution.

In this case, you will always get the latest data, even if the Web data is increasing or decreasing.

Badge

Hi all

I tried to follow your workflow but i'm not able to download the data because when arrive to the download page I have more than resources so I don't know how can select my file

I attach the csv that I used to try to download the data

thx in advance

test-trento.zip

Reply