Hi @danilo_inovacao
I've taken a look at the URL you provide. Once you can extract the paths to the .xls file, the issue is solved. But since the page uses an iframe, the HTTPCaller doesn't seem to find those paths.
I can however visualise the page, with the xls-paths in the following way:
view-source:http://www.economia-sniim.gob.mx/nuevo/AdministracionSitio/ListaServicios.aspx but I don't succeed in parsing this page in FME. Nor via the HTTPCaller, nor via the urllib2 module from Python. Maybe someone else know how to do this?
What you need is what's called web scraping. I guess it could be possible to accomplish your goal using a HTTPCaller combined with some StringSearcher and regular expressions, but I think it's not going to be super straight-forward seeing as the site in question uses iframes and some server-side re-directions just to make your life difficult.
For your specific web site you may have to resort to Python and some fancy parsing to get to your links.
Hi @danilo_inovacao
I've taken a look at the URL you provide. Once you can extract the paths to the .xls file, the issue is solved. But since the page uses an iframe, the HTTPCaller doesn't seem to find those paths.
I can however visualise the page, with the xls-paths in the following way:
view-source:http://www.economia-sniim.gob.mx/nuevo/AdministracionSitio/ListaServicios.aspx but I don't succeed in parsing this page in FME. Nor via the HTTPCaller, nor via the urllib2 module from Python. Maybe someone else know how to do this?
If you try to read that URL using the HTTPCaller you won't get what you expect, the site automatically redirects you to their site map, which is what the HTTPCaller returns:
Looks like the site admins made an effort of making it difficult to scrape it...
have a look at the HTML extractor in the 2017RC.
Hi @danilo_inovacao
I've taken a look at the URL you provide. Once you can extract the paths to the .xls file, the issue is solved. But since the page uses an iframe, the HTTPCaller doesn't seem to find those paths.
I can however visualise the page, with the xls-paths in the following way:
view-source:http://www.economia-sniim.gob.mx/nuevo/AdministracionSitio/ListaServicios.aspx but I don't succeed in parsing this page in FME. Nor via the HTTPCaller, nor via the urllib2 module from Python. Maybe someone else know how to do this?
Hi @jeroenstiers thanks your help.
The Link that you send me is necessary to send my customer from this moment. But i will try using Python. Thanks
What you need is what's called web scraping. I guess it could be possible to accomplish your goal using a HTTPCaller combined with some StringSearcher and regular expressions, but I think it's not going to be super straight-forward seeing as the site in question uses iframes and some server-side re-directions just to make your life difficult.
For your specific web site you may have to resort to Python and some fancy parsing to get to your links.
Hi @david_r thanks your help me too. You are right, this site in question uses iframe and was very difficult to extract any information. By the way, i will look th Web Scraping in Python how did you say me.
Thanks
have a look at the HTML extractor in the 2017RC.
Hi @itay Thanks your help. I will look this new Transformer in FME 2017. Thank you