Skip to main content
Solved

How to extract substring from attribute?


limo
Supporter
Forum|alt.badge.img+8
  • Supporter

Hi,

 

I have an attribute "descripion" which contains some html code:

<figure class="bs-special-article-teaser-image bs-asset--full bs-asset--square" id="SP-NjExNTY5NjMw"><div><div class="[ js-bs-lazy-image-loading ] [ bs-fixed-size__content ]" data-noscript><noscript><div><img class="bs-fixed-size__content" src="/tourismus/rundgaenge/stadtrundgang_stationen/Burgplatz287.jpg.scaled/586dd9dd596cf8b34ecc51781c490af2.jpgalt="Burgplatz"/></div></noscript></div></div><figcaption class="bs-asset__caption"><span class="bs-asset__caption__copyright">&copy;&nbsp;Braunschweig Stadtmarketing GmbH/Frank Sperling</span></figcaption></figure> <p class="bs-paragraph"> Der Burgplatz ist ein Ensemble von hoher geschichtlicher und kultureller Bedeutung. Seit dem 9. Jahrhundert lag hier der Fürstensitz der Brunonen. Herzog Heinrich der Löwe hat den Burgplatz im 12. Jahrhundert zum Zentrum der welfischen Macht ausgebaut. Er wird umgrenzt von der Burg Dankwarderode, dem Dom St. Blasii, dem klassizistischen Vieweghaus (Landesmuseum) und schönen Fachwerkbauten. Im Mittelpunkt des Platzes steht das Löwenstandbild. Der Bronzeguss aus der Zeit um 1166, einst vergoldet, wurde von Heinrich dem Löwen als Wahrzeichen seiner Macht und seiner Gerichtsbarkeit als erste freistehende Plastik nördlich der Alpen errichtet. Das Original und Teile des Welfenschatzes können in der Burg Dankwarderode, der ehemaligen Residenz Heinrichs des Löwen, besichtigt werden. Die Burg wurde 1887 nach dem Original-Grundriss von 1175 rekonstruiert und wieder errichtet. </p> <div class="[ bs-content-element bs-content-element--overflow-scroll ] bs-link-list"><div class="bs-link-list__items"><div class="bs-link-list__item"><a class="bs-link-list__link bs-link--internal bs-iconized bs-iconized--left" href="/tourismus/ueber-braunschweig/sehenswuerdigkeiten/burgplatz.phptarget="_blank" rel="noopener"><i class="bs-icon-wrapper bs-icon-wrapper--link bs-link-list__link__icon bs-iconized__icon"> <svg xmlns="http://www.w3.org/2000/svg" class="bs-icon" aria-hidden="true" role="img"> <use xlink:href="/WEB-IES/braunschweig-module/1.9.13/img/svg-sprite-main.symbols.svg#link" /> </svg> </i><!--googleoff: index--><span class="bs-link-list__link__text bs-iconized__text">Weitere Informationen anzeigen…<span class="SPu-access"> (Öffnet in einem neuen Tab)</span></span><!--googleon: index--></a></div></div></div>

I will extract the substring between <p class="bs-paragraph"> Der Burgplatz ...</p>

How can I realize this. With substring extractor I can extract strings but I do not have the index.

 

So hope some can help me or maybe has an idea?

Thanks in advance!

Best answer by david_r

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.

View original
Did this help you find an answer to your question?

4 replies

david_r
Evangelist
  • Best Answer
  • December 6, 2022

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.


limo
Supporter
Forum|alt.badge.img+8
  • Author
  • Supporter
  • December 6, 2022

thanks david it works now with HTMLExtractor. Sorry I have never used this befor.

With CSS-Selector "p.bs-paragraph" I can now extract the value inside :)


danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • December 6, 2022
limo wrote:

thanks david it works now with HTMLExtractor. Sorry I have never used this befor.

With CSS-Selector "p.bs-paragraph" I can now extract the value inside :)

Good job!


danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • December 6, 2022
david_r wrote:

Have a look at the HTMLExtractor:

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Transformers/Transformers/htmlextractor.htm

You can also use the StringSearcher with a regular expression, but it's probably going to be more finicky to get right for all possible edge cases.

Nice answer.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings