Skip to main content
Solved

How can I isolate HTML Table values <td> based on attribute name?

  • October 28, 2020
  • 2 replies
  • 111 views

johnk
Contributor
Forum|alt.badge.img+4
  • Contributor

This question is similar to one already asked below, but I couldn't crack the answer: https://community.safe.com/s/question/0D54Q000080hAQuSAM/css-selcetors-in-html-extractor?t=1603847778190

 

I would like to take the table as displayed in the xlsx example (mock up of a HTML table on a webpage), and force the values into attributes on a row per ID basis, to be able to construct an easy to use list of IDs with their associated attributes.

The tables (there are many across different URLs) are structured as in the 'HTML Table code example' (one table per ID) and I am having trouble trying to isolate the values of the associated attributes (in bold in the xlsx table example). The attribute names are always the same and in the same order, but not necessarily on the same row numbers everytime. I tried isolating by using the _element_index (using HTML Extractor and ListExploder) but these are not consistent across all tables (as attrbutes such as location can have multiple values), so i need to be able to recognise where, for example, td = ID and retrieve the next td value, which would be the ID value. Etc

 

Hope this makes sense, i'm very new to HTML and CSS selectors! Thanks FME team.

Best answer by ebygomm

There may be better ways, but I would probably go with the following

  • HTMLExtractor to get each table row - CSS selector tr and return format list
  • Explode the list so you have a feature per table row
  • A second html extractor to get the Name and Value
    • td:nth-of-type(1) to get the value of the first td in the row
    • td:nth-of-type(2) to get the value of the second td in the row

 

CaptureThe location values will probably need some further tidying up but that should be straightforward

View original
Did this help you find an answer to your question?

2 replies

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • Best Answer
  • October 28, 2020

There may be better ways, but I would probably go with the following

  • HTMLExtractor to get each table row - CSS selector tr and return format list
  • Explode the list so you have a feature per table row
  • A second html extractor to get the Name and Value
    • td:nth-of-type(1) to get the value of the first td in the row
    • td:nth-of-type(2) to get the value of the second td in the row

 

CaptureThe location values will probably need some further tidying up but that should be straightforward


johnk
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • October 29, 2020
ebygomm wrote:

There may be better ways, but I would probably go with the following

  • HTMLExtractor to get each table row - CSS selector tr and return format list
  • Explode the list so you have a feature per table row
  • A second html extractor to get the Name and Value
    • td:nth-of-type(1) to get the value of the first td in the row
    • td:nth-of-type(2) to get the value of the second td in the row

 

CaptureThe location values will probably need some further tidying up but that should be straightforward

Thankyou @ebygomm​ this is great and has solved my issue :)


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings