Question

WEBSITE HTML Table Read, Parsing to specific table format

  • 30 July 2019
  • 1 reply
  • 10 views

Hey FME powerusers - Been working through how to scrape websites with the HTTPCaller and HTML Table Reader which are very powerful! Trying to parse and store data into a table format has been challenging. Eventually, the goal is to scrape or direct read the website, store data in a table for updates, and then update or build the website.

From the information provided below, I’m having difficulty being able to associate each column between the COMM (col2) and associated PRODUCT_NAME (col1) to create the relationship. I wasn’t trying to get it elegant….just trying to walk through the process of how to parse strings effectively as well as using best practices from the FME community experience.

Here’s a snapshot of using the HTML Table Reader parsed to get to this far:

My goal is to create a table that will have the following data in this form - from the above table snapshot:

ALPHA_CODE

COMM

PRODUCT_NAME

HAZ_RATING

A

ALFALFA

ROUNDUP POWERMAX HERBICIDE

(1 1 1)

A

ALFALFA

BRANDT ONSITE

(X X X)

A

ALFALFA

PLANT HEALTH TECHNOLOGIES LOAD-UP

(X X X)

A

ALMOND

INTREPID 2F

(0 0 0)

A

ALMOND

ROUNDUP POWERMAX HERBICIDE

(1 1 1)

A

ALMOND

DUPONT ALTACOR INSECT CONTROL

(1 0 0)

 

I have tried using various transformers in succession without success such as the SubtringExtractor, AttributeTrimmer, or StringReplacer. I’m currently using WB FME 2019.

Thanks in advance - Suzanne

@takashi


1 reply

Badge +5

Hi Suzanne,

 

As you read the file in you can detect if the line is an Alpha code (with a Tester if Col2 starts with "-"), if so extract just the letter and the use a VariableSetter to set an Alpha_Code variable.

Similarly use a Tester to detect the Comm line (Col1 = "INDEX") and set a COMM variable.

For the data lines use a SubstringExtractor to split the attribute into Product_Name and Has_Rating and then use VariableRetreivers to attach the Alpha Code and Comm to them.

 

Reply