Question

Download zip file from password protected website


I am trying to download a zip file containing some shapefiles from a website that requires logging into with a username/password. From what I've read so far, it appears that the HTTPCaller transformer is what I should use. I've pasted my settings below, but nothing seems to download when I run the workspace. Anyone else figured out a way to download shapefiles from a password protected site?


19 replies

Userlevel 4
Badge +30

Hi @mlayman09,

I did have success to use the HttpCaller to do a download from web. Some configurations:

Is there Error message in Log file?

Thanks,

Danilo

Hi @mlayman09,

I did have success to use the HttpCaller to do a download from web. Some configurations:

Is there Error message in Log file?

Thanks,

Danilo

Hi @danilo_fme- no error in the log file for now. Right now, I just have the HTTP caller in a workspace, and when I run it it says it's successful. However, the file that I'm trying to download does not show up in the directory that I'm specifying.

 

 

Userlevel 4
Badge +30
Hi @danilo_fme- no error in the log file for now. Right now, I just have the HTTP caller in a workspace, and when I run it it says it's successful. However, the file that I'm trying to download does not show up in the directory that I'm specifying.

 

 

Try to change the directory please.

 

 

Try to change the directory please.

 

 

I tried several different directories, nothing downloads despite it saying "translation was successful". 0 total features read, 0 total features written. I've attached the translation log.

 

Userlevel 4
Badge +30
I tried several different directories, nothing downloads despite it saying "translation was successful". 0 total features read, 0 total features written. I've attached the translation log.

 

Thanks the log file.

 

I didn't see any information about the transformer HttpCaller.

 

There is some trigger before your HttpCaller?

 

In my example I used the Creator to send trigger to HttpCaller.

 

 

 

 

 

Thanks the log file.

 

I didn't see any information about the transformer HttpCaller.

 

There is some trigger before your HttpCaller?

 

In my example I used the Creator to send trigger to HttpCaller.

 

 

 

 

 

I did not have a creator, but I just added one. Now the workspace creates an empty zip file. I've attached a screenshot of the log file. What are your creator transformer settings? I just used the default settings.

 

 

Userlevel 4
Badge +30

Great @mlayman09, your HttpCaller is working now!

Could you send the Workspace?

Great @mlayman09, your HttpCaller is working now!

Could you send the Workspace?

1derrick-testing.fmw. I've attached the workspace.

 

 

Userlevel 4
Badge +30
1derrick-testing.fmw. I've attached the workspace.

 

 

Thanks,

 

If you open this link request in your internet browser, its works?

 

 

Userlevel 4
Badge +30
Thanks,

 

If you open this link request in your internet browser, its works?

 

 

In browser, the download process was successful and you can extract the zip file?

 

 

In browser, the download process was successful and you can extract the zip file?

 

 

Yes, it does. But, that's because I've logged in and checked the "stay logged in" box. If I hadn't signed in, the link won't work.

 

 

Userlevel 2
Badge +17

Hi @mlayman09, web browser moved to this web page when I pasted your download URL (.zip) to the address bar.

It means that you cannot access the download URL with the Basic Authentication method.

Firstly you will have to analyze the HTML source of the sign-in page to find what you should perform to sign in the site. If posting some form data was enough to sign in the site, possibly you could download the zip file with a chain of these three HTTPCallers.

  1. HTTPCaller (GET method, save cookie): Access the sign-in page to start the session.
  2. HTTPCaller (POST method, save cookie): Post required form data to perform sign-in. You will have to analyze the HTML source of the sign-in page to know required URL and form data to be set to this HTTPCaller.
  3. HTTPCaller (GET method, save cookie): Download the zip file.

Depending on the structure of the site, more HTTPCaller could be required, but basically you can consider the chain of HTTPCallers as a simulation of your manual operations on a web browser. Also, it may not be possible to achieve the goal depending on the site structure. Good luck.

Userlevel 2
Badge +17

Hi @mlayman09, web browser moved to this web page when I pasted your download URL (.zip) to the address bar.

0684Q00000ArKXBQA3.png

It means that you cannot access the download URL with the Basic Authentication method.

Firstly you will have to analyze the HTML source of the sign-in page to find what you should perform to sign in the site. If posting some form data was enough to sign in the site, possibly you could download the zip file with a chain of these three HTTPCallers.

  1. HTTPCaller (GET method, save cookie): Access the sign-in page to start the session.

  2. HTTPCaller (POST method, save cookie): Post required form data to perform sign-in. You will have to analyze the HTML source of the sign-in page to know required URL and form data to be set to this HTTPCaller.
  3. HTTPCaller (GET method, save cookie): Download the zip file.

Depending on the structure of the site, more HTTPCaller could be required, but basically you can consider the chain of HTTPCallers as a simulation of your manual operations on a web browser. Also, it may not be possible to achieve the goal depending on the site structure. Good luck.

You can find the requirements for signing in the site from this <form> element in the source of the sign-in page.

 

<form method="post" action= ... > ... </form>
You can find the requirements for signing in the site from this <form> element in the source of the sign-in page.

 

<form method="post" action= ... > ... </form>

Thanks @takashi. I've got the 3 HTTPCaller transformers set up in the workspace, but I keep getting errors for the POST one. I've attached a screenshot of my settings for the first HTTPCaller, as well as the second (with the username and password values removed for privacy's sake). I've also attached a screenshot of the HTML source. Do you know what I should be putting in the Multipart Upload portion of the POST transformer? Am I even on the right track with this?

0684Q00000ArN9NQAV.jpg

0684Q00000ArN6eQAF.jpg

0684Q00000ArN57QAF.jpg

 
Userlevel 2
Badge +17
You can find the requirements for signing in the site from this <form> element in the source of the sign-in page.

 

<form method="post" action= ... > ... </form>
The Request URL is wrong. What URL is shown in the address bar when you signed in the site with a web browser manually? I think that is the correct URL to which you should post the form data - loginid and password.
The Request URL is wrong. What URL is shown in the address bar when you signed in the site with a web browser manually? I think that is the correct URL to which you should post the form data - loginid and password.
It's "https://www.1derrick.com/us-onshore-maps.php", which I have now set as the request URL in the POST transformer. Now when I run the translation, no errors occur, but the zip file doesn't appear to download (a blank zip file is created instead). I've attached a screenshot of the third HTTPCaller transformer settings that I have set up, as well as the log file.

logfile.zip

 

Userlevel 2
Badge +17
The Request URL is wrong. What URL is shown in the address bar when you signed in the site with a web browser manually? I think that is the correct URL to which you should post the form data - loginid and password.
oops, my bad. The page seems to be redirected. Try post the login data to this URL, which is constructed from the "action" attribute in the <form> element..

 

https://derrick.quickbase.com/db/main?a=signin&what;=

 

oops, my bad. The page seems to be redirected. Try post the login data to this URL, which is constructed from the "action" attribute in the <form> element..

 

https://derrick.quickbase.com/db/main?a=signin&what;=

 

That did the trick!! One final question- I see where you got the "main?a=signin&what;=" part of the URL, but what about the "/db/" part? How did you know to add the "/db" portion to the URL? Thanks so much for all your help @takashi!

 

 

Userlevel 2
Badge +17
oops, my bad. The page seems to be redirected. Try post the login data to this URL, which is constructed from the "action" attribute in the <form> element..

 

https://derrick.quickbase.com/db/main?a=signin&what;=

 

It's a kind of hack. Try signing in the site using fake username and password on the web browser, e.g. "abc" and "xyz", then have a look at the address bar ;-)

 

 

Reply