Skip to main content
Solved

Regex in StringSearcher


Forum|alt.badge.img

Hello,

I'm trying to find only the letter D on its own in a string such as the following: B,G,D,DM,TD,DM,TD,D. In this example, that would be 2 D's (the third D and the last D I've marked in red).

Unfortunately, all the regex expressions I've tried also gets the D within TD (e.g using D,|D$).

Thanks.

Tony

Best answer by takashi

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)
View original
Did this help you find an answer to your question?

6 replies

takashi
Influencer
  • Best Answer
  • March 15, 2020

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • March 15, 2020

@aviveiro Don't forget that in the case of simple situations such as dealing with value delimited strings, there is also AttributeSplitter.


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14
takashi wrote:

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

@takashi: Excellent answer. I am still not fully familiar with the power of \b.

Just to note: In 'normal' RegEx you indeed need the lookahead and lookbehind assertions. But in the StringSearcher, you don't need them, as it will capture all instances, and \b by definition captures nothing, so you just use

\bD\b

and specify the first list in the Advanced section.

0684Q00000ArMdBQAV.png


takashi
Influencer
  • March 17, 2020
takashi wrote:

Hi @aviveiro, the meta character '\b' that represents a word boundary (including space, start/end of a text, comma, period etc.) might help you. For example, this regex matches a single character 'D' sandwiched by word boundaries. 

(?<=\b)D(?=\b)

@arnold_bijlsma, you are right. Lookbehind and lookahead are't essential here. Thanks for pointing it out.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • March 17, 2020

@takashi

@aviveiro

@arnold_bijlsma

 

 

word boundary \\b represents all non word characters so \\bD\\b matches the string, because it matches some part(s).

Following is not correct to say the least:

In 'normal' RegEx you indeed need the lookahead and lookbehind assertions

 

To capture an asertion you need to enclose it in braces: \\b(D)\\b

(of course there is a non-capturing version (?:) will cap but not report.

Contrary to popular believe \\b is an (zero length)assertion. So if you enclose it in braces, it will be grabbed. Same goes for begining, end.

 

A regexp result always shows the enire string if grepped. The All Matches List Name.

To get the indvidual captured D's you will need to enclose it in braces and use Subexpression Matches List Name.

 

Furthermore, there are not many flavors hat use lookbehind. Python's version does, which will please you guys i guess.

there is a site that shows all flavors and their reaches.

Lookbehind can be emulated by lookahead and some more regexp fiddling.

Of course it may have changed since last i read on it.

 

Read up on the matter: Jan Goyvaerts work is awesomely suited for that. (see RegeX Buddy. His document is there) But there are plenty good ones.

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14

The key thing is that for the RegEx implementation in the StringSearcher you don't need the lookahead/-behind assertions nor any grouping brackets to capture both instances of the single letter D in the test string.

 

But you're right that other implementations outside FME could give a different output.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings