Web Scraping COVID Data from U.S. Department of State

Question

Hello FME community,

I have never used FME to scrape web pages but I am looking to do so but I don't know where to start. The web source I am looking at provides c ountry-specific COVID information.

What I would like to do is scrape each of the country links provided on the site and pull data into a tabular format. Eventually, I would like to join this data to global admin boundaries.

Would this be possible using FME? Does anyone know where I could start? Thanks, FME Family!

redgeographics · Answer

Here's something to get you started:covid19scraper.fmwHTML Table reader to grab the table from that web page, split that out in country and url attributes, then use a HTTPCaller to get every url seperately (you'll probably want to add a Decelerator before the HTTPCaller to avoid hammering the server too much). Then a StringSearcher to start parsing through the webpages. That's where my Regex skills are lacking me a bit...From what I was quickly able to find out most of those pages follow a fairly similar format, but not all of them. I'm also not sure what exactly you are looking for. The number of confirmed cases seems to be in a sentence like [COUNTRY] has x,xxx confirmed cases of COVID-19 within its borders. Hope this helps!

Web Scraping COVID Data from U.S. Department of State

2 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded