Question

Automate Download, Extract and Prep as .csv for FME Job


I have a process I need to automate.

Let me describe what is usually done manually

 

1) I go to this website. I download a zip file. This is not a true FTP as it was suggested to be. So the automated download does not work.

What is the way to figure out the javascript behind this? Or is there any other way to do this?

2) I extract the zip file, but it's in the .tsv format. All I do is rename it to .csv

The job cannot read that .tsv file. Perhaps it's not a good representation of the .tsv file. It's very clunky .csv, however usually works

3) Once it's ready for csv the FME job updates the parcel data based on this csv

 

Where to start with this? I was suggested to look into selenium, but I am open to anything


2 replies

Userlevel 3
Badge +16

For getting data from websites, FME works best if there is a service or API to supply it. If the website is building the download for you, then it may not work, because while you can Get the contents of a webpage and links within with an HTTPCaller, you can't simulate a user clicking through a browser if that's what the javascript requires. Maybe when you do the download you can inspect the network traffic with F12 in your browser, and have FME make the same request as your browser did to perform the download, but if it's using a unique ID for your browser for example then it's probably not repeatable.

 

You also don't need to unzip the file. So if for example you downloaded the file to C:\\Data\\Download.zip and it contained a parcels.tsv file, then a CSV-format FeatureReader can simply read from C:\\Data\\Download.zip\\parcels.tsv

tsv is normally a tab-separated csv, so it may be useful to set the reader delimiter to tab, rather than the default auto-detect which will look for commas.

Step 3 Updating the parcel data is then what you make the FME workspace do after the file has been read in.

 

It sounds automatable except for the file download. So if the download can't be automated, then your workflow could be to download it yourself, save it to a location, then run an FME workspace.

For getting data from websites, FME works best if there is a service or API to supply it. If the website is building the download for you, then it may not work, because while you can Get the contents of a webpage and links within with an HTTPCaller, you can't simulate a user clicking through a browser if that's what the javascript requires. Maybe when you do the download you can inspect the network traffic with F12 in your browser, and have FME make the same request as your browser did to perform the download, but if it's using a unique ID for your browser for example then it's probably not repeatable.

 

You also don't need to unzip the file. So if for example you downloaded the file to C:\\Data\\Download.zip and it contained a parcels.tsv file, then a CSV-format FeatureReader can simply read from C:\\Data\\Download.zip\\parcels.tsv

tsv is normally a tab-separated csv, so it may be useful to set the reader delimiter to tab, rather than the default auto-detect which will look for commas.

Step 3 Updating the parcel data is then what you make the FME workspace do after the file has been read in.

 

It sounds automatable except for the file download. So if the download can't be automated, then your workflow could be to download it yourself, save it to a location, then run an FME workspace.

Is there a way to use Python code where it downloads the zip? Or does it pose the same issue?

 

I tried to use HTTP caller to reproduce the downloads, no errors were given, but the download doesn't really produce any results. Could be I am just not setting something up correctly?

 

The request looks like this

 

image.pngimage.png

Reply