Skip to main content
Question

HTML reader

  • June 21, 2018
  • 3 replies
  • 7 views

boubcher
Contributor
Forum|alt.badge.img+11

Hello, there I am looking to extract data from a public web site,

I used the HTML Table reader, but I am not getting the expected value in the table

website: https://www.stats.gov.sa/en/160

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

redgeographics
Celebrity
Forum|alt.badge.img+60
  • Celebrity
  • 3703 replies
  • June 21, 2018

That table is being generated dynamically through Javascript, the HTML table reader isn't able to access the cell contents. If you want to get hold of the Excel files you could probably read them directly, just plug the url into an Excel reader. The url pattern looks fixed so that should be ok.


boubcher
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • 212 replies
  • June 21, 2018

That table is being generated dynamically through Javascript, the HTML table reader isn't able to access the cell contents. If you want to get hold of the Excel files you could probably read them directly, just plug the url into an Excel reader. The url pattern looks fixed so that should be ok.

 

you are right

 

the only problem is we need to collect all those links manually , I was looking to extract all those links automatically, any sugestion ?

redgeographics
Celebrity
Forum|alt.badge.img+60
  • Celebrity
  • 3703 replies
  • June 22, 2018

That table is being generated dynamically through Javascript, the HTML table reader isn't able to access the cell contents. If you want to get hold of the Excel files you could probably read them directly, just plug the url into an Excel reader. The url pattern looks fixed so that should be ok.

I'm afraid not, on closer inspection it looks like there's pretty big differences from year to year about how the data is offered. The file url's don't actually appear in the page source so scraping it isn't going to work either.