Skip to main content
Question

Web Scraping COVID Data from U.S. Department of State

  • March 20, 2020
  • 2 replies
  • 49 views

scobierob

Hello FME community,

 

 

I have never used FME to scrape web pages but I am looking to do so but I don't know where to start. The web source I am looking at provides country-specific COVID information.

What I would like to do is scrape each of the country links provided on the site and pull data into a tabular format. Eventually, I would like to join this data to global admin boundaries.

Would this be possible using FME? Does anyone know where I could start? Thanks, FME Family!

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

2 replies

redgeographics
Celebrity
Forum|alt.badge.img+62

Here's something to get you started:

covid19scraper.fmw

 

HTML Table reader to grab the table from that web page, split that out in country and url attributes, then use a HTTPCaller to get every url seperately (you'll probably want to add a Decelerator before the HTTPCaller to avoid hammering the server too much). Then a StringSearcher to start parsing through the webpages. That's where my Regex skills are lacking me a bit...

From what I was quickly able to find out most of those pages follow a fairly similar format, but not all of them. I'm also not sure what exactly you are looking for. The number of confirmed cases seems to be in a sentence like [COUNTRY] has x,xxx confirmed cases of COVID-19 within its borders.

Hope this helps!


scobierob
  • Author
  • March 20, 2020

This is such a great start. I will take a closer look. Thank you redgeographics