Question

How extract specific element in HTML Table


Hello, I want to get the value from <a> tag in the following HTML table, someone has an idea how can I get it? I tested HTML extractor, FeatureReader, but don't return me "Website". Thks

<table id="myTable">
        <tr>
            <td>hello</td>
            <td>it's me!</td>
        </tr>
        <tr>
            <td>it's just a table <a class ="myClass" href="https://example.com">Website</a></td>
            <td>with two columns</td>
        </tr>
</table>

17 replies

Badge +10

In the HTML Extractor

Use a[href]

Or

.myClass

Either will work with your sample data, if the real data is more complicated you might need something else

Or

a[class=myClass]

(all links with class myClass)

In the HTML Extractor

Use a[href]

Or

.myClass

Either will work with your sample data, if the real data is more complicated you might need something else

Or

a[class=myClass]

(all links with class myClass)

hi @ebygomm I'm trying to use .myClass method but I have the feeling that HTMLExtractor doesn't find my class !

 

Do you want url link to try ?
Badge +10

hi @ebygomm I'm trying to use .myClass method but I have the feeling that HTMLExtractor doesn't find my class !

 

Do you want url link to try ?

Sure, post the link

 

Sure, post the link

 

https://www.infoclimat.fr/observations-meteo/archives/3/aout/2019/leran/000AJ.html

 

 

 

I want to get these values. With the inspector, I have this class : tipsy-trigger button-rr-soleil

 

 

Thanks for your helps!
Badge +10

I'd look at using the HTMLTABLE reader to read the tableau_releves table first, then look at extracting the data from the Biométéo attribute

This would get you straight to this sort of data

If you need to get the UV index as well, I'd still use the table reader but don't remove the HTML formatting and use the HTMLExtractor on the Biométéo attribute

Are you looking to get the information displayed in the button or the information shown in the pop up?

it seems to work? Thank you! Yes I want to get the value next to the temperature.

With the HTMLExtractor you get the value with its ID/CLASS, it's works ?

And in my TargetAttribute I have <null> at the end of the process

 

 

 

Sure, post the link

 

I find in CSS inspector of the web page that it seems the tag of what I search but the HTMLExtractor doesn't complete my attribute. In the web page, I find ".button-rr-soleil span " or a.link, it'sambiguous !

Badge +10

it seems to work? Thank you! Yes I want to get the value next to the temperature.

With the HTMLExtractor you get the value with its ID/CLASS, it's works ?

And in my TargetAttribute I have <null> at the end of the process

 

 

 

It doesn't look like you are using the HTMLReader, it is much easier to use this than to to use the HTML Extractor on the website html directly, see attached

htmltable.fmw

It doesn't look like you are using the HTMLReader, it is much easier to use this than to to use the HTML Extractor on the website html directly, see attached

htmltable.fmw

yes your example work well I understand your logic :) But as I have an atribute "url" in which I stock differents urls, I use a FeatureReader then after is it possible to Extract my radiation value as you show me?

 

Badge +10

yes your example work well I understand your logic :) But  as I  have an atribute "url" in which I stock differents urls, I use a FeatureReader then after is it possible to Extract my radiation value as you show me? 

 

0684Q00000ArN6PQAV.png

You can use exactly the same process as after the Reader in the example, so a html extractor to get 3 things

the uv value - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] b

everything in the span - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] span

everything in the b tag  - Tag part 'Whole'

a[class="tipsy-trigger button-rr-soleil"] b

Then a string replacer to remove the b element from everything in the span

(there may be better ways to do this, but I'm not sure on css selectors for everything but

You can use exactly the same process as after the Reader in the example, so a html extractor to get 3 things

the uv value - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] b

everything in the span - Tag part 'Value'

a[class="tipsy-trigger button-rr-soleil"] span

everything in the b tag  - Tag part 'Whole'

a[class="tipsy-trigger button-rr-soleil"] b

Then a string replacer to remove the b element from everything in the span

(there may be better ways to do this, but I'm not sure on css selectors for everything but

I try to insert 

This is the result of my FeatureReader (the FeatureReader is  the  same reader than your HTMLTable in your example):

 

0684Q00000ArNVUQA3.pngSo after if I insert a HTMLExtractor, it doesn't find my value that I want to get, maybe I forget a simple parameter in my FeatureReader ? 

 

 

Badge +10

I try to insert

This is the result of my FeatureReader (the FeatureReader is the same reader than your HTMLTable in your example):

 

So after if I insert a HTMLExtractor, it doesn't find my value that I want to get, maybe I forget a simple parameter in my FeatureReader ?

 

 

You need the parameter "Remove HTML Formatting" to be set to No

You need the parameter "Remove HTML Formatting" to be set to No

Yes I find it. I have to modify "output" parameters or something else?

 

You need the parameter "Remove HTML Formatting" to be set to No

As you can see, I set "Remove HTML Formatting " to NO as you say:

 

and I valid this changment. And I have an alert about the output

 

Badge +10

As you can see, I set "Remove HTML Formatting " to NO as you say:

 

and I valid this changment. And I have an alert about the output

 

Just select one of your urls which will then generate the tableau_releves output port (I've presumed you are reading the same named table for each url)

Just select one of your urls which will then generate the tableau_releves output port (I've presumed you are reading the same named table for each url)

right ok! it work. So now I have to build my new table because now I have the HTML format in my attributes, it's normal because "Remove Formatting HTML" is NO!

 

It's not a problem, I'm going to clean all my attributes. Can you give for last example the css selector of the "HEURE" column that I can set in a new HTMLExtractor to have something like "10h" and not <span.........> please, thanks !

Badge +10

right ok! it work. So now I have to build my new table because now I have the HTML format in my attributes, it's normal because "Remove Formatting HTML" is NO!

 

It's not a problem, I'm going to clean all my attributes. Can you give for last example the css selector of the "HEURE" column that I can set in a new HTMLExtractor to have something like "10h" and not <span.........> please, thanks !

CSS selector of span with Tag Part Value will give you 14h30 etc.

Reply