Skip to main content
Question

Web Scraping COVID Data from U.S. Department of State


scobierob

Hello FME community,

 

 

I have never used FME to scrape web pages but I am looking to do so but I don't know where to start. The web source I am looking at provides country-specific COVID information.

What I would like to do is scrape each of the country links provided on the site and pull data into a tabular format. Eventually, I would like to join this data to global admin boundaries.

Would this be possible using FME? Does anyone know where I could start? Thanks, FME Family!

 

2 replies

redgeographics
Celebrity
Forum|alt.badge.img+50

Here's something to get you started:

covid19scraper.fmw

 

HTML Table reader to grab the table from that web page, split that out in country and url attributes, then use a HTTPCaller to get every url seperately (you'll probably want to add a Decelerator before the HTTPCaller to avoid hammering the server too much). Then a StringSearcher to start parsing through the webpages. That's where my Regex skills are lacking me a bit...

From what I was quickly able to find out most of those pages follow a fairly similar format, but not all of them. I'm also not sure what exactly you are looking for. The number of confirmed cases seems to be in a sentence like [COUNTRY] has x,xxx confirmed cases of COVID-19 within its borders.

Hope this helps!


scobierob
  • Author
  • March 20, 2020

This is such a great start. I will take a closer look. Thank you redgeographics


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings