Skip to main content

Hi there I would like to scrape the data from this page. So want the lat/longs of the pins on the map and the background water outage data. I can't find an easy way to scrape the data and am just going around in circles trying to extract the data.

https://www.ccc.govt.nz/services/water-and-drainage/water-supply/maintenance-and-repairs/water-status-map/#12

 

It looks like the site you are trying to access is Javascript based. Unfortunately, there is not much FME can do to retrieve data directly from that URL. If the site has an API, you will be likely able to access data through their API endpoints, using HTTPCaller.


there is open data for water assets https://sdi.ccc.govt.nz/apollo-portal/ApolloPro.aspx

 


Scraping data from JavaScript-based sites, like the one you mentioned, can be a bit tricky since the content is often loaded dynamically. Here are a few steps you can take to extract the lat/longs of the pins and the related data:

1. Check for API Endpoints

  • Sometimes, the data on a map is served through an API. Use your browser's Developer Tools (usually accessed with F12) and navigate to the "Network" tab. Refresh the page and look for any requests that contain the data you're after, like JSON files or API endpoints.

2. Use Web Scraping Tools

  • If no API is available, you might need to use tools like Selenium, Playwright, or Puppeteer to interact with the JavaScript and render the page in a way that exposes the data. These tools can simulate a browser and allow you to scrape dynamically loaded content.

3. Check Open Data Portals

  • It seems there’s an open data portal related to water assets: Apollo Portal. You might be able to find relevant data there instead of scraping it directly from the website. This could include spatial data that matches what's displayed on the map.

4. Manual Extraction for Smaller Data Sets

  • If the number of data points is small, you can manually extract them using tools like QGIS or even Google Earth if you can export the map data.

To handle the complexity of web scraping from JavaScript-based sites while managing multiple web profiles, consider using tools like an anti-detect browser such as Multilogin. It allows you to manage multiple browsing environments, ensuring anonymity and efficient data scraping while navigating complex JavaScript-based sites.

If you have any specific data-related questions or need help with tools, feel free to ask!


This appears to be doable. All map data is available as GEOJSON inside the HTML page.

As all the GEOJSON is contained within one line, it is easier to read the HTML page as a Text file, and identify the right feature.

Some string juggling followed by some JSON processing gets you the features of the map.

See the attached workspace. Many attributes still need to be exposed.

 


Reply