Skip to main content
Solved

How to extract substring from attribute?

  • December 6, 2022
  • 4 replies
  • 96 views

limo
Supporter
Forum|alt.badge.img+8
  • Supporter
  • 52 replies

Hi,

 

I have an attribute "descripion" which contains some html code:

<figure class="bs-special-article-teaser-image bs-asset--full bs-asset--square" id="SP-NjExNTY5NjMw"><div><div class="[ js-bs-lazy-image-loading ] [ bs-fixed-size__content ]" data-noscript><noscript><div><img class="bs-fixed-size__content" src="/tourismus/rundgaenge/stadtrundgang_stationen/Burgplatz287.jpg.scaled/586dd9dd596cf8b34ecc51781c490af2.jpg" alt="Burgplatz"/></div></noscript></div></div><figcaption class="bs-asset__caption"><span class="bs-asset__caption__copyright">&copy;&nbsp;Braunschweig Stadtmarketing GmbH/Frank Sperling</span></figcaption></figure> <p class="bs-paragraph"> Der Burgplatz ist ein Ensemble von hoher geschichtlicher und kultureller Bedeutung. Seit dem 9. Jahrhundert lag hier der Fürstensitz der Brunonen. Herzog Heinrich der Löwe hat den Burgplatz im 12. Jahrhundert zum Zentrum der welfischen Macht ausgebaut. Er wird umgrenzt von der Burg Dankwarderode, dem Dom St. Blasii, dem klassizistischen Vieweghaus (Landesmuseum) und schönen Fachwerkbauten. Im Mittelpunkt des Platzes steht das Löwenstandbild. Der Bronzeguss aus der Zeit um 1166, einst vergoldet, wurde von Heinrich dem Löwen als Wahrzeichen seiner Macht und seiner Gerichtsbarkeit als erste freistehende Plastik nördlich der Alpen errichtet. Das Original und Teile des Welfenschatzes können in der Burg Dankwarderode, der ehemaligen Residenz Heinrichs des Löwen, besichtigt werden. Die Burg wurde 1887 nach dem Original-Grundriss von 1175 rekonstruiert und wieder errichtet. </p> <div class="[ bs-content-element bs-content-element--overflow-scroll ] bs-link-list"><div class="bs-link-list__items"><div class="bs-link-list__item"><a class="bs-link-list__link bs-link--internal bs-iconized bs-iconized--left" href="/tourismus/ueber-braunschweig/sehenswuerdigkeiten/burgplatz.php" target="_blank" rel="noopener"><i class="bs-icon-wrapper bs-icon-wrapper--link bs-link-list__link__icon bs-iconized__icon"> <svg xmlns="http://www.w3.org/2000/svg" class="bs-icon" aria-hidden="true" role="img"> <use xlink:href="/WEB-IES/braunschweig-module/1.9.13/img/svg-sprite-main.symbols.svg#link" /> </svg> </i><!--googleoff: index--><span class="bs-link-list__link__text bs-iconized__text">Weitere Informationen anzeigen…<span class="SPu-access"> (Öffnet in einem neuen Tab)</span></span><!--googleon: index--></a></div></div></div>

I will extract the substring between <p class="bs-paragraph"> Der Burgplatz ...</p>

How can I realize this. With substring extractor I can extract strings but I do not have the index.

 

So hope some can help me or maybe has an idea?

Thanks in advance!

Best answer by david_r

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

david_r
Celebrity
  • 8391 replies
  • Best Answer
  • December 6, 2022

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.


limo
Supporter
Forum|alt.badge.img+8
  • Author
  • Supporter
  • 52 replies
  • December 6, 2022

thanks david it works now with HTMLExtractor. Sorry I have never used this befor.

With CSS-Selector "p.bs-paragraph" I can now extract the value inside :)


danilo_fme
Celebrity
Forum|alt.badge.img+51
  • Celebrity
  • 2077 replies
  • December 6, 2022

thanks david it works now with HTMLExtractor. Sorry I have never used this befor.

With CSS-Selector "p.bs-paragraph" I can now extract the value inside :)

Good job!


danilo_fme
Celebrity
Forum|alt.badge.img+51
  • Celebrity
  • 2077 replies
  • December 6, 2022

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.

Nice answer.