Skip to main content
Solved

Regex in StringSearcher

  • March 15, 2020
  • 6 replies
  • 30 views

Forum|alt.badge.img

Hello,

I'm trying to find only the letter D on its own in a string such as the following: B,G,D,DM,TD,DM,TD,D. In this example, that would be 2 D's (the third D and the last D I've marked in red).

Unfortunately, all the regex expressions I've tried also gets the D within TD (e.g using D,|D$).

Thanks.

Tony

Best answer by takashi

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

6 replies

takashi
Celebrity
  • 7843 replies
  • Best Answer
  • March 15, 2020

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • 562 replies
  • March 15, 2020

@aviveiro Don't forget that in the case of simple situations such as dealing with value delimited strings, there is also AttributeSplitter.


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • 126 replies
  • March 17, 2020

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

@takashi: Excellent answer. I am still not fully familiar with the power of \b.

Just to note: In 'normal' RegEx you indeed need the lookahead and lookbehind assertions. But in the StringSearcher, you don't need them, as it will capture all instances, and \b by definition captures nothing, so you just use

\bD\b

and specify the first list in the Advanced section.

0684Q00000ArMdBQAV.png


takashi
Celebrity
  • 7843 replies
  • March 17, 2020

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

@arnold_bijlsma, you are right. Lookbehind and lookahead are't essential here. Thanks for pointing it out.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • 2252 replies
  • March 17, 2020

@takashi

@aviveiro

@arnold_bijlsma

 

 

word boundary \\b represents all non word characters so \\bD\\b matches the string, because it matches some part(s).

Following is not correct to say the least:

In 'normal' RegEx you indeed need the lookahead and lookbehind assertions

 

To capture an asertion you need to enclose it in braces: \\b(D)\\b

(of course there is a non-capturing version (?:) will cap but not report.

Contrary to popular believe \\b is an (zero length)assertion. So if you enclose it in braces, it will be grabbed. Same goes for begining, end.

 

A regexp result always shows the enire string if grepped. The All Matches List Name.

To get the indvidual captured D's you will need to enclose it in braces and use Subexpression Matches List Name.

 

Furthermore, there are not many flavors hat use lookbehind. Python's version does, which will please you guys i guess.

there is a site that shows all flavors and their reaches.

Lookbehind can be emulated by lookahead and some more regexp fiddling.

Of course it may have changed since last i read on it.

 

Read up on the matter: Jan Goyvaerts work is awesomely suited for that. (see RegeX Buddy. His document is there) But there are plenty good ones.

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • 126 replies
  • March 17, 2020

The key thing is that for the RegEx implementation in the StringSearcher you don't need the lookahead/-behind assertions nor any grouping brackets to capture both instances of the single letter D in the test string.

 

But you're right that other implementations outside FME could give a different output.