Web Scraping COVID Data from U.S. Department of State

Question

Hello FME community,

I have never used FME to scrape web pages but I am looking to do so but I don't know where to start. The web source I am looking at provides c ountry-specific COVID information.

What I would like to do is scrape each of the country links provided on the site and pull data into a tabular format. Eventually, I would like to join this data to global admin boundaries.

Would this be possible using FME? Does anyone know where I could start? Thanks, FME Family!

redgeographics · Answer

Here's something to get you started:covid19scraper.fmwHTML Table reader to grab the table from that web page, split that out in country and url attributes, then use a HTTPCaller to get every url seperately (you'll probably want to add a Decelerator before the HTTPCaller to avoid hammering the server too much). Then a StringSearcher to start parsing through the webpages. That's where my Regex skills are lacking me a bit...From what I was quickly able to find out most of those pages follow a fairly similar format, but not all of them. I'm also not sure what exactly you are looking for. The number of confirmed cases seems to be in a sentence like [COUNTRY] has x,xxx confirmed cases of COVID-19 within its borders. Hope this helps!

Web Scraping COVID Data from U.S. Department of State

2 replies

Community Stats

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute