Skip to main content
Solved

httpcaller and encoding failure

  • October 24, 2016
  • 5 replies
  • 147 views

Forum|alt.badge.img

hi all,

Im coming accross an issue with the HTTP caller or attributefilewriter and encoding. I have created a WB that loops on a series of complete urls and downloads the file to a folder. Everything goes fine until we reach url with special characters as for example the one below

http://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS DE JUAN NUÑEZ/A.ES.SDGC.AD.02021.zip (if you download this web page direclty on browser it works fine).

Httpcaller fails on 404 not found since it parses another page

http://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS%20DE%20JUAN%20NUÑEZ/A.ES.SDGC.AD.02021.zipAbover URL does not work on browser.

I have tried encoding the url before parsing to httpcaller, also encoding only the part of the url with the special characters and it fails all the time, even if the path parsed looks ok.

It also fails in couldnt resolve host name.

Any tip on this? I guess this is something related on how the url is parsed to httpcaller or filewriter but I cant solve it actually (dont think is a bug in httpcaller)

Best answer by david_r

Looks like the HTTPCaller tries to encode the URL as well, but it looks a bit buggy.

A PythonCaller with the following code will do the same for you:

import fmeobjects
import urllib

def DownloadAndSave(feature):
    url = feature.getAttribute('url')
    if url:
        local_dir = feature.getAttribute('local_dir')
        filename = url.split('/')[-1]
        myfile = urllib.URLopener()
        myfile.retrieve(url.encode('cp1252'), local_dir + '/' + filename)

Assumes that incoming features have the attributes "url" and "local_dir", e.g.:

urlhttp://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS DE JUAN NUÑEZ/A.ES.SDGC.AD.02021.ziplocal_dirc:\mydownloadshere

View original
Did this help you find an answer to your question?
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

david_r
Evangelist
  • Best Answer
  • October 24, 2016

Looks like the HTTPCaller tries to encode the URL as well, but it looks a bit buggy.

A PythonCaller with the following code will do the same for you:

import fmeobjects
import urllib

def DownloadAndSave(feature):
    url = feature.getAttribute('url')
    if url:
        local_dir = feature.getAttribute('local_dir')
        filename = url.split('/')[-1]
        myfile = urllib.URLopener()
        myfile.retrieve(url.encode('cp1252'), local_dir + '/' + filename)

Assumes that incoming features have the attributes "url" and "local_dir", e.g.:

urlhttp://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS DE JUAN NUÑEZ/A.ES.SDGC.AD.02021.ziplocal_dirc:\mydownloadshere


Forum|alt.badge.img
  • Author
  • October 24, 2016

Yep! This works perfectly, always forgot to keep an eye on python. thanks for this, I think I will add this as a bug in the httpcaller since I didnt found a way to do it with that transformer (and I dont think it will be very uncommon).


takashi
Influencer
  • October 24, 2016

Hi @geodavid76, in my quick test, the TextEoncoder (Encoding Type: URL (Percent Encoding)) was able to encode international characters into UTF-8 hex representation prefixed by % symbol. However, colon : and slash / were also be changed to %3A and %2F, so you have to restore them. e.g.


david_r
Evangelist
  • October 24, 2016
takashi wrote:

Hi @geodavid76, in my quick test, the TextEoncoder (Encoding Type: URL (Percent Encoding)) was able to encode international characters into UTF-8 hex representation prefixed by % symbol. However, colon : and slash / were also be changed to %3A and %2F, so you have to restore them. e.g.

Yes, I also tried that initially. The problem seems to be that even if your URL is correct before it gets to the HTTPCaller, it still doesn't work. I suspect the HTTPCaller also tries to encode the URL, but in a way that doesn't work in this scenario.

 

 


Forum|alt.badge.img
  • Author
  • October 25, 2016

Yes, same feedback here. even if the url sent was correct httpcaller was still encoding it in odd way


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings