Skip to main content
Question

Extract HTML table

  • January 20, 2020
  • 3 replies
  • 102 views

pflegpet
Contributor
Forum|alt.badge.img+8

Hi,

I'm trying to extract the table from http://skpos.gku.sk/en/stanice.php with HTML table reader or an HTML extractor but in the output I only see the column headers and there is now list I could explode further. How can I get the complete table? Thank you for your help!

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

takashi
Celebrity
  • 7843 replies
  • January 20, 2020

Hi @kasparlov, FME can read HTML contents only if they are provided as HTML source statically, but in my observation, the table body in the site will be created with JavaScript script dynamically on the client browser, so unfortunately I don't think you can read it directly with FME.

A possible workaround is, once save the page as an HTML file using a web browser and then read the table with the HTML Table reader or the HTMLExtractor.

In my quick test, the HTML file saved with Google Chrome could be read with FME, like this. The header (column names) should be modified.


pflegpet
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • 62 replies
  • January 20, 2020

Hi @kasparlov, FME can read HTML contents only if they are provided as HTML source statically, but in my observation, the table body in the site will be created with JavaScript script dynamically on the client browser, so unfortunately I don't think you can read it directly with FME.

A possible workaround is, once save the page as an HTML file using a web browser and then read the table with the HTML Table reader or the HTMLExtractor.

In my quick test, the HTML file saved with Google Chrome could be read with FME, like this. The header (column names) should be modified.

Thank you, Takashi! Very helpful answer, as allways :)


bruceharold
Supporter
Forum|alt.badge.img+19
  • Supporter
  • 346 replies
  • January 21, 2020

Takashi as always has some insights, but a workaround might be to do a manual/clerical investigation to see where ultimately the data is hosted and grab it from there, for example clicking on one station name I find this log link:

 

ftp://epncb.oma.be/pub/center/oper/GKU.OC