Skip to main content
Question

How to use the CSS selector in HTML extractor

  • October 25, 2022
  • 4 replies
  • 235 views

checcosisani
Contributor
Forum|alt.badge.img+12

Hi

I would like to extract some info from a website that are inside a p tag inside a div class = paragraphs_item__body_paragraph_bundle

the website is

https://www.padovanet.it/notizia/20221025/strade-chiuse

 

See picture below

thx for support

 

webscrapingFrancesco

 

4 replies

geomancer
Evangelist
Forum|alt.badge.img+58
  • Evangelist
  • 932 replies
  • October 26, 2022
.paragraphs_item__body_paragraph_bundle p

Select all p elements inside class paragraphs_item__body_paragraph_bundle (see HTMLExtractor and CSS Selector Reference).

HTMLExtractor_strade_chiuse


checcosisani
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • 66 replies
  • October 26, 2022

thx !


checcosisani
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • 66 replies
  • September 28, 2024

Hi

 

do you now if there any chance to extract info inside br tag 

I use this 

table > tbody > tr:nth-child(-n+10) > td:nth-child(2) > strong:nth-child(5) but I can’t expose the info inside br

 

 

this is the website 

 

https://cloud.urbi.it/urbi/progs/urp/ur1ME001.sto?DB_NAME=wt00038560&w3cbt=S&StwEvent=9100030

 

thx

 

Francesco


geomancer
Evangelist
Forum|alt.badge.img+58
  • Evangelist
  • 932 replies
  • September 30, 2024

You can just use a HTTPCaller, a few HTMLExtractors and ListExploders, and an AttributeSplitter.

Note that there is no ‘inside a <br> tag’, as <br> has no corresponding </br> tag. <br> signifies a line break (after <br> a new line is started). FME turns <br> into <br/> (I found this out by just testing).