Solved

Extract table from webpage and hyperlinks in table

4 years ago
January 13, 2021
3 replies
114 views

johnscreekga
2 replies

I'm trying to extract all of the data from a table that lives on a webpage along with the hyperlinks that are embedded inside of one of the columns. The HTML table reader is only able to return the data that is displayed on the webpage, not the hyperlinks that are present. On the flip side, I can only get the HTMLExtractor transformer to collect the first record in the table. Attached is my workbench... what can I do to collect all of the records in the table?

Best answer by caracadrian

@Chris Warren You are on the right track.

By modifying your workspace a bit you can obtain the desired result.

Set your second HTMLExtractor to Output: Return List Attribute

Add ListExploders for every list that you need than merge the list via FeatureMerger using one of them as a Requestor and the rest as Suppliers, Join On _element_index, set it to Process Duplicate Suppliers.

Than you can continue to your Attribute Splitter. Explode HTML Lists By setting a ListExploder for every list you can obtain something like this:

Exploded Lists

View original

Did this help you find an answer to your question?

+23

warrendev
Enthusiast
119 replies
4 years ago
January 13, 2021

Hi @johnscreekga ,

One way I think would work would be to set the output in the HTMLExtractor to return a list. Then you could explode the list items and join by the element index. That should return all items in the table.

example

+23

caracadrian
Contributor
570 replies
Best Answer
4 years ago
January 14, 2021

@Chris Warren You are on the right track.

By modifying your workspace a bit you can obtain the desired result.

Set your second HTMLExtractor to Output: Return List Attribute

Add ListExploders for every list that you need than merge the list via FeatureMerger using one of them as a Requestor and the rest as Suppliers, Join On _element_index, set it to Process Duplicate Suppliers.

Than you can continue to your Attribute Splitter. Explode HTML Lists By setting a ListExploder for every list you can obtain something like this:

Exploded Lists

J

johnscreekga
Author
2 replies
4 years ago
January 14, 2021

caracadrian wrote:

@Chris Warren You are on the right track.

By modifying your workspace a bit you can obtain the desired result.

Set your second HTMLExtractor to Output: Return List Attribute

Add ListExploders for every list that you need than merge the list via FeatureMerger using one of them as a Requestor and the rest as Suppliers, Join On _element_index, set it to Process Duplicate Suppliers.

Than you can continue to your Attribute Splitter. Explode HTML Lists By setting a ListExploder for every list you can obtain something like this:

Exploded Lists

@Chris Warren and @caracadrian ... thank you both so much for the push in the right direction. I never would have thought of this on my own.

Reply

Rich Text Editor, editor1

Extract table from webpage and hyperlinks in table

3 replies

Reply

Helpful Members This Week

Recently Solved Questions

Read AEC Objects (Geometries and Attributes) in FME

Problems with points in Bufferer

WorkspaceReader - Find annotation linked to transformers

Linear Referencing Speed along line / Event CSV and Line Geometry

Reading and IFC-file, reproject it and write back to new IFC-file

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

SBG8300 slow speedicon

XFINITY blame my modem to the slow speed on SBG8300icon

Slow speed internet on sbg8300icon

Slow speed internet on SBG8300icon

Slow speed internet on SBG8300icon

Helpful Members This Week

Recently Solved Questions

Read AEC Objects (Geometries and Attributes) in FME

Problems with points in Bufferer

WorkspaceReader - Find annotation linked to transformers

Linear Referencing Speed along line / Event CSV and Line Geometry

Reading and IFC-file, reproject it and write back to new IFC-file

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings