Question

Web Scraping COVID Data from U.S. Department of State

  • 20 March 2020
  • 2 replies
  • 3 views

Hello FME community,

 

 

I have never used FME to scrape web pages but I am looking to do so but I don't know where to start. The web source I am looking at provides country-specific COVID information.

What I would like to do is scrape each of the country links provided on the site and pull data into a tabular format. Eventually, I would like to join this data to global admin boundaries.

Would this be possible using FME? Does anyone know where I could start? Thanks, FME Family!

 


2 replies

Userlevel 5
Badge +25

Here's something to get you started:

covid19scraper.fmw

 

HTML Table reader to grab the table from that web page, split that out in country and url attributes, then use a HTTPCaller to get every url seperately (you'll probably want to add a Decelerator before the HTTPCaller to avoid hammering the server too much). Then a StringSearcher to start parsing through the webpages. That's where my Regex skills are lacking me a bit...

From what I was quickly able to find out most of those pages follow a fairly similar format, but not all of them. I'm also not sure what exactly you are looking for. The number of confirmed cases seems to be in a sentence like [COUNTRY] has x,xxx confirmed cases of COVID-19 within its borders.

Hope this helps!

This is such a great start. I will take a closer look. Thank you redgeographics

Reply