Skip to main content
Question

Web scraping


checcosisani
Contributor
Forum|alt.badge.img+12

Hi

can you hlep me to understand if I can extract info from this webpage

 

https://viabilita.autostrade.it/it/viabilita/previsioni

What I need is the info in the blue box

thx

Francesco

image 

6 replies

hkingsbury
Celebrity
Forum|alt.badge.img+50
  • Celebrity
  • July 4, 2023

It looks like this url, https://viabilita.autostrade.it/json/previsioni.json, will return the results in JSON format


checcosisani
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • July 9, 2023

Hi thx

 

can you also help me to understand how can extract the value for this website inside tbody tr, I don't know how to handle the id

#ID15762803 > td:nth-child(1) > a

https://alboonline.comune.genova.it/albopretorio/#/albo/140

 

thx

 

Francesco

 


hkingsbury
Celebrity
Forum|alt.badge.img+50
  • Celebrity
  • July 9, 2023
checcosisani wrote:

Hi thx

 

can you also help me to understand how can extract the value for this website inside tbody tr, I don't know how to handle the id

#ID15762803 > td:nth-child(1) > a

https://alboonline.comune.genova.it/albopretorio/#/albo/140

 

thx

 

Francesco

 

This one (like the original) requests data from an API. When you use an HTML Reader/Extractor you get the webpage before the API call is made. All it returns is the JS code to make those calls - in other words, there is no data in the raw HTML.

 

To get the API calls, you need to open up Developer Tools and have a look at the network requests and figure out what call is return the data you need. In this case, the url that will return the data (in JSON) is:

https://alboonline.comune.genova.it/albopretorio/dispatcher/alboPretorioServlet/invoke


checcosisani
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • July 29, 2023

Hi,

 

sorry to bother you but I'not a programmer and I'm trying to understand but every website is different .....

 

here https://serviziweb.comune.avellino.it/kweb/ap/avellino?npage=0 I received the HTML with the data but I'm not able to extract the data

 

this my css div:nth-child(2)> div > div > div > div:nth-child(-n+10) > div > h3 > span ..but no results

 

thx for help

 

 

Francesco


hkingsbury
Celebrity
Forum|alt.badge.img+50
  • Celebrity
  • July 31, 2023
checcosisani wrote:

Hi,

 

sorry to bother you but I'not a programmer and I'm trying to understand but every website is different .....

 

here https://serviziweb.comune.avellino.it/kweb/ap/avellino?npage=0 I received the HTML with the data but I'm not able to extract the data

 

this my css div:nth-child(2)> div > div > div > div:nth-child(-n+10) > div > h3 > span ..but no results

 

thx for help

 

 

Francesco

Whilst you can pull information out of the HTML (and my preference in this instance would be using the HTML Table reader) it isn't pretty. I'd be reaching out the the local council (making an assumption here as I don't know Italian(?)) and asking if they have an API that can deliver this data.

 

It will make things a lot easier


elhamejtehadi19
Contributor
Forum|alt.badge.img

Hey Francesco,

Yes, you can extract the info from the blue box on that page using a tool like rvest in R or BeautifulSoup in Python. You'll need to inspect the webpage and target the correct CSS selector for the blue box content.

For example, with rvest:

library(rvest) url <- "https://viabilita.autostrade.it/it/viabilita/previsioni" page <- read_html(url) blue_box_info <- page %>% html_nodes("your_css_selector") %>% html_text() print(blue_box_info)

If the page uses JavaScript to load content, you may need Selenium for dynamic scraping. Also, if you run into issues with scraping or get blocked, tools like Multilogin can help avoid detection by using browser fingerprinting. 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings