Question

How to extract links inside a javascript toggle from HTML page

  • 19 December 2018
  • 3 replies
  • 5 views

Hi All,

 

 

I am currently trying to extract links from an HTML page, but it has them embedded inside a javascript toggle. The current workflow I have used in the past uses the HTTPCaller, then uses the HTML extractor to pull out any links using: CCS Selector = a[href] and Tag Part/HTML Attribute = href. It then uses the ListExploder to put all the links into a table. The problem I am running into is that on this page the links I need to pull out are inside a link that is <a href="javascript:toggleObj.()>. When I look at the HTML after the caller it does not show any of the links that are inside of that toggle. Anyone have any ideas how to pull these out? I have attached a couple images to illustrate the problem.

 

 

Thanks in advance!

 

-Adrian

 

 

Example of the link I need to pull out and concatenate:

 

HTMLExtractor:


3 replies

Userlevel 2
Badge +17

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference

Thank you for the response Takashi (@takashi). The problem that I am running into is that when I look at the output of the HTTPCaller, those values shown in the google chrome inspect window are not present. They seem to be inside of what the javascript object is calling. I have attached an image below of what the output looks like. 

 

0684Q00000ArMXmQAN.png
Userlevel 2
Badge +17

If you set "List Attribute" to the Return Format parameter in the HTMLExtractor, "href" attribute values of all the "a" elements would be stored into a list attribute. I think you can then extract required links from the list elements.

Alternatively, if you need to extract links from "a" elements belonging to "nlink" class, this CSS Selector might help you.

a[class=nlink]

See here to learn more about CSS Selector:  CSS Selector Reference

From the screenshot you have posted at first, I thought that the required links were written in the HTML document statically.

FME doesn't support to execute JavaScript codes, so I don't think that the links can be extracted unfortunately.

Reply