Question

HTML Table & Hyperlink Parser

8 years ago
May 8, 2017
6 replies
77 views

natehewes13
2 replies

FMErs I need some help!

I've been beating my head against the wall trying to figure out how to properly parse these HTML Tables and have used some of the examples on here ( @takashi ) as go by's but just can't get it right.

http://webapps.rrc.state.tx.us/CMPL/publicSearchAction.do?packetSummaryId=171477&formData.methodHndlr.inputValue;=loadPacket&formData.hrefValue;=%257C1003%253Dhome%257C1005%253Dhome%257C1007%253D0&searchArgs.paramValue;=%257C0%253D04%252F01%252F2017%257C1%253D04%252F27%252F2017%257C2%253D01&pager.paramValue;=%7C1%3D1%7C2%3D100%7C3%3D221%7C4%3D0%7C5%3D3%7C6%3D10&pager.offset;=&publicUser;=

What we have

And want

I've tried different combinations of

HTTPCaller>HTMLExtractor (but for the table CSS Selector, I can't get it to flatten w/o substringextracting)
FeatureReader (HTML Table) (works great for the tables but it looses the hyperlinks)

Any help would be greatly appreciated!

takashi
7708 replies
8 years ago
May 9, 2017

Hi @natehewes13, I looked at the HTML source document and found that the required data are stored in two <table> elements, which can be identified by their class names - "GroupBox1" and "DataGrid" and therefore you can extract the <table> elements with the HTMLExtractor.

However, unfortunately, the current HTMLExtractor doesn't support to identify class names containing upper case characters (FME 2017.0). It's a known issue and I hope this will be fixed in the near future.

As a workaround in the interim, change "GroupBox1" and "DataGrid" within the response body to lower case (StringReplacer can be used), then extract the two <table> element using the HTMLExtractor with this setting. You can then parse the extracted <table> elements as XML fragments.

+10

lars_de_vries
388 replies
8 years ago
May 9, 2017

Additionally, if you want to create a working hyperlink in Excel, you will have to create two attributes:

1. 'View Form/Attachment' with value 'View'

2. 'View Form/Attachment.hyperlink' with the URL of the hyperlink.

Only the first must be used as an attribute in the writer. The .hyperlink is explained for the Reader but not for the Writer.

One thing to keep in mind: there is a limit of 66530 hyperlinks per worksheet. (excel specs)

stephenwu
8 replies
8 years ago
June 7, 2017

takashi wrote:

The issue where upper case characters are not allowed for class names has been addressed and should be allowed in FME 2017.1 build 17504.

takashi
7708 replies
8 years ago
June 7, 2017

stephenwu wrote:

The issue where upper case characters are not allowed for class names has been addressed and should be allowed in FME 2017.1 build 17504.

Good to hear :-) Thanks for the update, @stephenwu.

+13

mygis
Supporter
307 replies
8 years ago
June 7, 2017

Hi @natehewes13 ,

Could you please provide another link? I just want to check on the consistacy of the tags.

Thanks.

Lyes

natehewes13
Author
2 replies
8 years ago
June 8, 2017

takashi wrote:

This worked great! Thank you @takashi (and sorry this post is so late).

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

HTML Table & Hyperlink Parser