Skip to main content
Solved

HTTPCaller does not work with a simple GET url

  • June 16, 2022
  • 7 replies
  • 231 views

juanmahere
Supporter
Forum|alt.badge.img+13

Hi,

Could I have some explanation why HTTPCaller does not bring any data from a simple GET from this website:

 

https://www.puntopack.es/buscar-el-punto-pack-mas-cercano/?codePays=ES&codePostal=32005

 

Not sure if there are any parameters that I'd need to add.

The REJECTED ERROR value: "HTTP/2 503"

_fme_rejection_code: "NO_RESULT"

 

Uploaded a screenshot of the HttpCaller.

Many thanks in advance!

J.

 

Best answer by hkingsbury

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman

Hmm, having a look at the site, its sitting behind cloudflare. The response i'm getting is that cloudflare has blocked it. I can get around it by setting a referer, cookie, user-agent and the token. However I imagine that both the token and cookie and time based and will expire eventually.

 

Basically its been setup to stop what you're doing in regards to scraping. If you need the data from it, your best bet is going to be to contact the company directly and see if they can supply it as a file, or have a developers portal you can sign up to

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

7 replies

kailinatsafe
Safer
Forum|alt.badge.img+23
  • Safer
  • 720 replies
  • June 16, 2022

Hello @juanmahere​ , thanks for providing the URL! If I try to make the same GET request in Postman, I get a pretty similar result. It looks like this website doesn't like people scraping. After a quick search, it looks like they have an API you can leverage (see Page 8, I think this is the most up to date doc, but it may be worth double checking). Would you mind trying to use the API? Best, Kailin.


hkingsbury
Celebrity
Forum|alt.badge.img+63
  • Celebrity
  • 1620 replies
  • June 16, 2022

Hello @juanmahere​ , thanks for providing the URL! If I try to make the same GET request in Postman, I get a pretty similar result. It looks like this website doesn't like people scraping. After a quick search, it looks like they have an API you can leverage (see Page 8, I think this is the most up to date doc, but it may be worth double checking). Would you mind trying to use the API? Best, Kailin.

Further to that, a 503 error indicates that the server is not able to handle the request, suggesting that there is something not 100% on the server. In this case it seems to be that they're blocking scraping. You could possibly get around it by including a different user agent or referer


juanmahere
Supporter
Forum|alt.badge.img+13
  • Author
  • Supporter
  • 47 replies
  • June 17, 2022

Hello @juanmahere​ , thanks for providing the URL! If I try to make the same GET request in Postman, I get a pretty similar result. It looks like this website doesn't like people scraping. After a quick search, it looks like they have an API you can leverage (see Page 8, I think this is the most up to date doc, but it may be worth double checking). Would you mind trying to use the API? Best, Kailin.

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman


hkingsbury
Celebrity
Forum|alt.badge.img+63
  • Celebrity
  • 1620 replies
  • June 19, 2022

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman

Looking at the above the only header you should need is the requestverificationtoken. This will go in the headers part of the httpcaller

image 

I would hope you don't need the cookie header, if you do, thats going to need some additional calls and configuration

 


juanmahere
Supporter
Forum|alt.badge.img+13
  • Author
  • Supporter
  • 47 replies
  • June 20, 2022

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman

Hi @hkingsbury​ ,

Thanks for replying to this issue. Unfortunately, I'm sending that request you suggested, and I find no way to get it through.

Here is an example, passing the values as Published Params, and using "requestverificationtoken" on headerget_function_header:

 

 


hkingsbury
Celebrity
Forum|alt.badge.img+63
  • Celebrity
  • 1620 replies
  • Best Answer
  • June 21, 2022

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman

Hmm, having a look at the site, its sitting behind cloudflare. The response i'm getting is that cloudflare has blocked it. I can get around it by setting a referer, cookie, user-agent and the token. However I imagine that both the token and cookie and time based and will expire eventually.

 

Basically its been setup to stop what you're doing in regards to scraping. If you need the data from it, your best bet is going to be to contact the company directly and see if they can supply it as a file, or have a developers portal you can sign up to


juanmahere
Supporter
Forum|alt.badge.img+13
  • Author
  • Supporter
  • 47 replies
  • June 22, 2022

Thanks @kailinatsafe​  and @hkingsbury​  for your tips.

I moved forward and from Postman, I could make the request using this structure (screenshot), but I cannot reproduce/translate into FME. Any ideas?

get_function2_postman

@hkingsbury​  many thanks for the investigation, as you suggested, it may take less pain if I directly contact the company. On the other hand, I was more interested in the FME HttpCall setup rather than the data, honestly.

Thanks everybody for your tips!

Juanma,