Skip to main content
Question

How extract specific element in HTML Table


Hello, I want to get the value from <a> tag in the following HTML table, someone has an idea how can I get it? I tested HTML extractor, FeatureReader, but don't return me "Website". Thks

<table id="myTable">
        <tr>
            <td>hello</td>
            <td>it's me!</td>
        </tr>
        <tr>
            <td>it's just a table <a class ="myClass" href="https://example.com">Website</a></td>
            <td>with two columns</td>
        </tr>
</table>

17 replies

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 3, 2020

In the HTML Extractor

Use a[href]

Or

.myClass

Either will work with your sample data, if the real data is more complicated you might need something else

Or

a[class=myClass]

(all links with class myClass)


  • Author
  • March 4, 2020
ebygomm wrote:

In the HTML Extractor

Use a[href]

Or

.myClass

Either will work with your sample data, if the real data is more complicated you might need something else

Or

a[class=myClass]

(all links with class myClass)

hi @ebygomm I'm trying to use .myClass method but I have the feeling that HTMLExtractor doesn't find my class !

 

Do you want url link to try ?

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 4, 2020
gisdev13 wrote:

hi @ebygomm I'm trying to use .myClass method but I have the feeling that HTMLExtractor doesn't find my class !

 

Do you want url link to try ?

Sure, post the link

 


  • Author
  • March 4, 2020
ebygomm wrote:

Sure, post the link

 

https://www.infoclimat.fr/observations-meteo/archives/3/aout/2019/leran/000AJ.html

 

 

 

I want to get these values. With the inspector, I have this class : tipsy-trigger button-rr-soleil

 

 

Thanks for your helps!

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 4, 2020

I'd look at using the HTMLTABLE reader to read the tableau_releves table first, then look at extracting the data from the Biométéo attribute

This would get you straight to this sort of data

If you need to get the UV index as well, I'd still use the table reader but don't remove the HTML formatting and use the HTMLExtractor on the Biométéo attribute

Are you looking to get the information displayed in the button or the information shown in the pop up?


  • Author
  • March 4, 2020

it seems to work? Thank you! Yes I want to get the value next to the temperature.

With the HTMLExtractor you get the value with its ID/CLASS, it's works ?

And in my TargetAttribute I have <null> at the end of the process

 

 

 


  • Author
  • March 5, 2020
ebygomm wrote:

Sure, post the link

 

I find in CSS inspector of the web page that it seems the tag of what I search but the HTMLExtractor doesn't complete my attribute. In the web page, I find ".button-rr-soleil span " or a.link, it'sambiguous !


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 5, 2020
gisdev13 wrote:

it seems to work? Thank you! Yes I want to get the value next to the temperature.

With the HTMLExtractor you get the value with its ID/CLASS, it's works ?

And in my TargetAttribute I have <null> at the end of the process

 

 

 

It doesn't look like you are using the HTMLReader, it is much easier to use this than to to use the HTML Extractor on the website html directly, see attached

htmltable.fmw


  • Author
  • March 6, 2020
ebygomm wrote:

It doesn't look like you are using the HTMLReader, it is much easier to use this than to to use the HTML Extractor on the website html directly, see attached

htmltable.fmw

yes your example work well I understand your logic :) But as I have an atribute "url" in which I stock differents urls, I use a FeatureReader then after is it possible to Extract my radiation value as you show me?

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 6, 2020
gisdev13 wrote:

yes your example work well I understand your logic :) But  as I  have an atribute "url" in which I stock differents urls, I use a FeatureReader then after is it possible to Extract my radiation value as you show me? 

 

0684Q00000ArN6PQAV.png

You can use exactly the same process as after the Reader in the example, so a html extractor to get 3 things

the uv value - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] b

everything in the span - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] span

everything in the b tag  - Tag part 'Whole'

a[class="tipsy-trigger button-rr-soleil"] b

Then a string replacer to remove the b element from everything in the span

(there may be better ways to do this, but I'm not sure on css selectors for everything but


  • Author
  • March 6, 2020
ebygomm wrote:

You can use exactly the same process as after the Reader in the example, so a html extractor to get 3 things

the uv value - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] b

everything in the span - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] span

everything in the b tag  - Tag part 'Whole'

a[class="tipsy-trigger button-rr-soleil"] b

Then a string replacer to remove the b element from everything in the span

(there may be better ways to do this, but I'm not sure on css selectors for everything but

I try to insert 

This is the result of my FeatureReader (the FeatureReader is  the  same reader than your HTMLTable in your example):

 

0684Q00000ArNVUQA3.pngSo after if I insert a HTMLExtractor, it doesn't find my value that I want to get, maybe I forget a simple parameter in my FeatureReader ? 

 

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 6, 2020
gisdev13 wrote:

I try to insert

This is the result of my FeatureReader (the FeatureReader is the same reader than your HTMLTable in your example):

 

So after if I insert a HTMLExtractor, it doesn't find my value that I want to get, maybe I forget a simple parameter in my FeatureReader ?

 

 

You need the parameter "Remove HTML Formatting" to be set to No


  • Author
  • March 6, 2020
ebygomm wrote:

You need the parameter "Remove HTML Formatting" to be set to No

Yes I find it. I have to modify "output" parameters or something else?

 


  • Author
  • March 6, 2020
ebygomm wrote:

You need the parameter "Remove HTML Formatting" to be set to No

As you can see, I set "Remove HTML Formatting " to NO as you say:

 

and I valid this changment. And I have an alert about the output

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 6, 2020
gisdev13 wrote:

As you can see, I set "Remove HTML Formatting " to NO as you say:

 

and I valid this changment. And I have an alert about the output

 

Just select one of your urls which will then generate the tableau_releves output port (I've presumed you are reading the same named table for each url)


  • Author
  • March 6, 2020
ebygomm wrote:

Just select one of your urls which will then generate the tableau_releves output port (I've presumed you are reading the same named table for each url)

right ok! it work. So now I have to build my new table because now I have the HTML format in my attributes, it's normal because "Remove Formatting HTML" is NO!

 

It's not a problem, I'm going to clean all my attributes. Can you give for last example the css selector of the "HEURE" column that I can set in a new HTMLExtractor to have something like "10h" and not <span.........> please, thanks !


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 6, 2020
gisdev13 wrote:

right ok! it work. So now I have to build my new table because now I have the HTML format in my attributes, it's normal because "Remove Formatting HTML" is NO!

 

It's not a problem, I'm going to clean all my attributes. Can you give for last example the css selector of the "HEURE" column that I can set in a new HTMLExtractor to have something like "10h" and not <span.........> please, thanks !

CSS selector of span with Tag Part Value will give you 14h30 etc.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings