Solved

httpcaller and encoding failure

  • 24 October 2016
  • 5 replies
  • 34 views

Badge

hi all,

Im coming accross an issue with the HTTP caller or attributefilewriter and encoding. I have created a WB that loops on a series of complete urls and downloads the file to a folder. Everything goes fine until we reach url with special characters as for example the one below

http://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS DE JUAN NUÑEZ/A.ES.SDGC.AD.02021.zip (if you download this web page direclty on browser it works fine).

Httpcaller fails on 404 not found since it parses another page

http://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS%20DE%20JUAN%20NUÑEZ/A.ES.SDGC.AD.02021.zipAbover URL does not work on browser.

I have tried encoding the url before parsing to httpcaller, also encoding only the part of the url with the special characters and it fails all the time, even if the path parsed looks ok.

It also fails in couldnt resolve host name.

Any tip on this? I guess this is something related on how the url is parsed to httpcaller or filewriter but I cant solve it actually (dont think is a bug in httpcaller)

icon

Best answer by david_r 24 October 2016, 10:17

View original

5 replies

Userlevel 4

Looks like the HTTPCaller tries to encode the URL as well, but it looks a bit buggy.

A PythonCaller with the following code will do the same for you:

import fmeobjects
import urllib

def DownloadAndSave(feature):
    url = feature.getAttribute('url')
    if url:
        local_dir = feature.getAttribute('local_dir')
        filename = url.split('/')[-1]
        myfile = urllib.URLopener()
        myfile.retrieve(url.encode('cp1252'), local_dir + '/' + filename)

Assumes that incoming features have the attributes "url" and "local_dir", e.g.:

urlhttp://www.catastro.minhap.es/INSPIRE/Addresses/02/02021-CASAS DE JUAN NUÑEZ/A.ES.SDGC.AD.02021.ziplocal_dirc:\mydownloadshere

Badge

Yep! This works perfectly, always forgot to keep an eye on python. thanks for this, I think I will add this as a bug in the httpcaller since I didnt found a way to do it with that transformer (and I dont think it will be very uncommon).

Userlevel 2
Badge +17

Hi @geodavid76, in my quick test, the TextEoncoder (Encoding Type: URL (Percent Encoding)) was able to encode international characters into UTF-8 hex representation prefixed by % symbol. However, colon : and slash / were also be changed to %3A and %2F, so you have to restore them. e.g.

Userlevel 4

Hi @geodavid76, in my quick test, the TextEoncoder (Encoding Type: URL (Percent Encoding)) was able to encode international characters into UTF-8 hex representation prefixed by % symbol. However, colon : and slash / were also be changed to %3A and %2F, so you have to restore them. e.g.

Yes, I also tried that initially. The problem seems to be that even if your URL is correct before it gets to the HTTPCaller, it still doesn't work. I suspect the HTTPCaller also tries to encode the URL, but in a way that doesn't work in this scenario.

 

 

Badge

Yes, same feedback here. even if the url sent was correct httpcaller was still encoding it in odd way

Reply