Question

Web scraping

1 year ago
July 4, 2023
6 replies
196 views

+12

checcosisani
Contributor
66 replies

can you hlep me to understand if I can extract info from this webpage

https://viabilita.autostrade.it/it/viabilita/previsioni

What I need is the info in the blue box

thx

Francesco

+55

hkingsbury
Celebrity
1503 replies
1 year ago
July 4, 2023

It looks like this url, https://viabilita.autostrade.it/json/previsioni.json, will return the results in JSON format

+12

checcosisani
Author
Contributor
66 replies
1 year ago
July 9, 2023

Hi thx

can you also help me to understand how can extract the value for this website inside tbody tr, I don't know how to handle the id

#ID15762803 > td:nth-child(1) > a

https://alboonline.comune.genova.it/albopretorio/#/albo/140

thx

Francesco

+55

hkingsbury
Celebrity
1503 replies
1 year ago
July 9, 2023

checcosisani wrote:

Hi thx

can you also help me to understand how can extract the value for this website inside tbody tr, I don't know how to handle the id

#ID15762803 > td:nth-child(1) > a

https://alboonline.comune.genova.it/albopretorio/#/albo/140

thx

Francesco

This one (like the original) requests data from an API. When you use an HTML Reader/Extractor you get the webpage before the API call is made. All it returns is the JS code to make those calls - in other words, there is no data in the raw HTML.

To get the API calls, you need to open up Developer Tools and have a look at the network requests and figure out what call is return the data you need. In this case, the url that will return the data (in JSON) is:

https://alboonline.comune.genova.it/albopretorio/dispatcher/alboPretorioServlet/invoke

+12

checcosisani
Author
Contributor
66 replies
1 year ago
July 29, 2023

Hi,

sorry to bother you but I'not a programmer and I'm trying to understand but every website is different .....

here https://serviziweb.comune.avellino.it/kweb/ap/avellino?npage=0 I received the HTML with the data but I'm not able to extract the data

this my css div:nth-child(2)> div > div > div > div:nth-child(-n+10) > div > h3 > span ..but no results

thx for help

Francesco

+55

hkingsbury
Celebrity
1503 replies
1 year ago
July 31, 2023

checcosisani wrote:

Hi,

sorry to bother you but I'not a programmer and I'm trying to understand but every website is different .....

here https://serviziweb.comune.avellino.it/kweb/ap/avellino?npage=0 I received the HTML with the data but I'm not able to extract the data

this my css div:nth-child(2)> div > div > div > div:nth-child(-n+10) > div > h3 > span ..but no results

thx for help

Francesco

Whilst you can pull information out of the HTML (and my preference in this instance would be using the HTML Table reader) it isn't pretty. I'd be reaching out the the local council (making an assumption here as I don't know Italian(?)) and asking if they have an API that can deliver this data.

It will make things a lot easier

elhamejtehadi19
Contributor
2 replies
8 months ago
October 16, 2024

Hey Francesco,

Yes, you can extract the info from the blue box on that page using a tool like rvest in R or BeautifulSoup in Python. You'll need to inspect the webpage and target the correct CSS selector for the blue box content.

For example, with rvest:

library(rvest) url <- "https://viabilita.autostrade.it/it/viabilita/previsioni" page <- read_html(url) blue_box_info <- page %>% html_nodes("your_css_selector") %>% html_text() print(blue_box_info)

If the page uses JavaScript to load content, you may need Selenium for dynamic scraping. Also, if you run into issues with scraping or get blocked, tools like Multilogin can help avoid detection by using browser fingerprinting.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Web scraping