Skip to main content
Question

Extract a Table from webpage and write out as excel or similar

  • January 13, 2021
  • 5 replies
  • 519 views

elpone
Contributor
Forum|alt.badge.img+3

Hi,

I am trying to extract a table that is embedded wthin a webpage that is an endpoint. Now I can get the table via html extractor as a lump of html but what I want to do is write it out data as an excel table.

One option is to write it out as a html file and then use a html Table reader but I would like to cut out the step of writing out and writing in.

In essence I would like to handle the attribute that is a snippit of html that represents the table data I want to manipulate.

 

Would the approach outlined in this post: attributes-from-kml-tag @deanatsafe​ apply?

However I think I am missing something as I cannot get the last step to work.

Attached is workspace and sample html data.

Regards

Justin.

 

url example:

https://www.nratrafficdata.ie/c2/tfdaysreport.asp?sgid=ZvyVmXU8jBt9PJE$c7UXt6&spid=NRA_000000001803&reportdate=2021-01-01&enddate=2021-01-08&dir=-1&dim1bin=7

 

Capture

5 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • January 13, 2021

You can use the url directly in a HTML table reader


elpone
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • January 13, 2021
ebygomm wrote:

You can use the url directly in a HTML table reader

Hi @ebygomm​ almost what I am looking for but I have a feature class that contains the url which is built from parameters. This is a test case to illustrate what I would like to happen. This will need to handle 10s to 100s of calls to the end point with different parameter, so in effect in would need to these url in as inputs.

What this reader does is exactly what I would like to do with the html fragment I have, but I need to use information which will come from the feature for other parts of the workflow.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • January 13, 2021
elpone wrote:

Hi @ebygomm​ almost what I am looking for but I have a feature class that contains the url which is built from parameters. This is a test case to illustrate what I would like to happen. This will need to handle 10s to 100s of calls to the end point with different parameter, so in effect in would need to these url in as inputs.

What this reader does is exactly what I would like to do with the html fragment I have, but I need to use information which will come from the feature for other parts of the workflow.

So you just use a FeatureReader with the HTML Table format if you need the url to come from input features


elpone
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • January 13, 2021
elpone wrote:

Hi @ebygomm​ almost what I am looking for but I have a feature class that contains the url which is built from parameters. This is a test case to illustrate what I would like to happen. This will need to handle 10s to 100s of calls to the end point with different parameter, so in effect in would need to these url in as inputs.

What this reader does is exactly what I would like to do with the html fragment I have, but I need to use information which will come from the feature for other parts of the workflow.

@ebygomm​ Thanks for you help, I'm new to this but I don't think that will work. I have successfully got the html snippit as an attribute which is what I want to push into an excel sheet. From what I can tell FeatureReader will get me to the same place I am currently stuck at. Anyway I'm going to go with the two part solution of pushing the table data into a html file and then in another workspace use HTMLTable reader.


deanatsafe
Safer
Forum|alt.badge.img+6
  • Safer
  • January 23, 2021

See the attached workspace. I downloaded your workspace and added the logic from the article you mentioned here: https://community.safe.com/s/article/how-to-expose-feature-attributes-from-kml-tag

I used the XMLFragmenter rather than the XQueryExtractor, because the former is easier to configure and doesnt depend on an exact XQuery expression. The other trick is that there are several different ways to store HTML tables. KML tables are structured slightly differently than your html table. So first its a matter of extracting the table into different features with the XMLFragmenter. Next we need to use the AttributeExposer to expose the newly extracted attributes we want to work on. Then we use an AttributeCreator to define the row values from td{0} to td{10} in your case. Finally we write this out to csv.

html to csv


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings