Skip to main content

Hello,

 

I am trying to extract html tags from an attribute, however the documentation is very unclear and need further guidance.Page: https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/htmlextractor.htm

States: “In the HTMLExtractor, a query is constructed to find the div tag with the id “article” (CSS Selector = #article). The contents of that tag will be extracted (Tag Part/HTML Attribute = Value), and output to the new attribute articleText.”

Where is the mapping document that maps FME CSS selector ‘ids’ to html tags?  I am trying to extract <br>, what should the CSS selector for this be?  Using “br” as the selector and then using a list exploder produces an attribute with empty fields.

Hi ​@alsherren ,

See CSS Selector Reference to learn more about CSS selectors, which is linked from the Help on HTMLExtractor.

You can extract <br> tags from an HTML document with this setting. However, <br> tag has no contents, so this just populates "<br/>"s into a list.

 


You can also copy the CSS selector from your browsers developer tools.

If you inspect the page source and find the element(s) you want to select you can then copy these:
 

 

This is the CSS selector for the image ​@takashi included in his reply:

div.topic-view-content-wrapper:nth-child(3) > div:nth-child(1) > figure:nth-child(4) > img:nth-child(1)


Note, this is best for selecting specific elements.

 

The inspector can also be used to help refine the selection using classes. For example, to select all text in all replys to this thread, the css selector would be
 

div.post p


div.post is derived from the class of each reply box, and p being the paragraph element in each reply class.