Skip to main content
Question

How to extract links inside a javascript toggle from HTML page

  • December 19, 2018
  • 3 replies
  • 112 views

asellars
Participant

Hi All,

 

 

I am currently trying to extract links from an HTML page, but it has them embedded inside a javascript toggle. The current workflow I have used in the past uses the HTTPCaller, then uses the HTML extractor to pull out any links using: CCS Selector = a[href] and Tag Part/HTML Attribute = href. It then uses the ListExploder to put all the links into a table. The problem I am running into is that on this page the links I need to pull out are inside a link that is <a href="javascript:toggleObj.()>. When I look at the HTML after the caller it does not show any of the links that are inside of that toggle. Anyone have any ideas how to pull these out? I have attached a couple images to illustrate the problem.

 

 

Thanks in advance!

 

-Adrian

 

 

Example of the link I need to pull out and concatenate:

 

HTMLExtractor:

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

takashi
Celebrity
  • 7843 replies
  • December 19, 2018

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference


asellars
Participant
  • Author
  • Participant
  • 2 replies
  • December 19, 2018

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference

Thank you for the response Takashi (@takashi). The problem that I am running into is that when I look at the output of the HTTPCaller, those values shown in the google chrome inspect window are not present. They seem to be inside of what the javascript object is calling. I have attached an image below of what the output looks like. 

 

0684Q00000ArMXmQAN.png

takashi
Celebrity
  • 7843 replies
  • December 20, 2018

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference

From the screenshot you have posted at first, I thought that the required links were written in the HTML document statically.

FME doesn't support to execute JavaScript codes, so I don't think that the links can be extracted unfortunately.