Solved

How can I isolate HTML Table values <td> based on attribute name?

4 years ago
October 28, 2020
2 replies
129 views

johnk
Contributor
22 replies

This question is similar to one already asked below, but I couldn't crack the answer: https://community.safe.com/s/question/0D54Q000080hAQuSAM/css-selcetors-in-html-extractor?t=1603847778190

I would like to take the table as displayed in the xlsx example (mock up of a HTML table on a webpage), and force the values into attributes on a row per ID basis, to be able to construct an easy to use list of IDs with their associated attributes.

The tables (there are many across different URLs) are structured as in the 'HTML Table code example' (one table per ID) and I am having trouble trying to isolate the values of the associated attributes (in bold in the xlsx table example). The attribute names are always the same and in the same order, but not necessarily on the same row numbers everytime. I tried isolating by using the _element_index (using HTML Extractor and ListExploder) but these are not consistent across all tables (as attrbutes such as location can have multiple values), so i need to be able to recognise where, for example, td = ID and retrieve the next td value, which would be the ID value. Etc

Hope this makes sense, i'm very new to HTML and CSS selectors! Thanks FME team.

Best answer by ebygomm

There may be better ways, but I would probably go with the following

HTMLExtractor to get each table row - CSS selector tr and return format list
Explode the list so you have a feature per table row
A second html extractor to get the Name and Value
- td:nth-of-type(1) to get the value of the first td in the row
- td:nth-of-type(2) to get the value of the second td in the row

Capture The location values will probably need some further tidying up but that should be straightforward

View original

Did this help you find an answer to your question?

+39

ebygomm
Influencer
3306 replies
Best Answer
4 years ago
October 28, 2020

There may be better ways, but I would probably go with the following

HTMLExtractor to get each table row - CSS selector tr and return format list
Explode the list so you have a feature per table row
A second html extractor to get the Name and Value
- td:nth-of-type(1) to get the value of the first td in the row
- td:nth-of-type(2) to get the value of the second td in the row

Capture The location values will probably need some further tidying up but that should be straightforward

johnk
Author
Contributor
22 replies
4 years ago
October 29, 2020

ebygomm wrote:

There may be better ways, but I would probably go with the following

HTMLExtractor to get each table row - CSS selector tr and return format list
Explode the list so you have a feature per table row
A second html extractor to get the Name and Value
- td:nth-of-type(1) to get the value of the first td in the row
- td:nth-of-type(2) to get the value of the second td in the row

Capture The location values will probably need some further tidying up but that should be straightforward

Thankyou @ebygomm this is great and has solved my issue :)

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

How can I isolate HTML Table values <td> based on attribute name?

2 Attachments