Question

Regular expression for finding pattern in string searcher


Badge +2

I've been trying to no avail to put into StringSearcher a regular expression to search for this pattern: one to three digits, followed by either a double quote character or two single ' characters in succession, followed by an x, followed by one to three digits, followed by either a double quote character or two single ' characters in succession (again). I put together an expression

^[0-9]{1,3}[",'']{1,2}+[x][0-9]{1,3}[",'']{1,2}+

and it works in the regex101.com tester, but for some reason FME fails to execute when I put that expression in StringSearcher. I get a failed to evaluate expression message. What am I doing wrong?


10 replies

Badge +7

Hi @alpheus

Does this one work for you?

^[0-9]{1,3}(\\"|\\'{2})x[0-9]{1,3}(\\"|\\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

Badge +7

Hi @alpheus

Does this one work for you?

^[0-9]{1,3}(\\"|\\'{2})x[0-9]{1,3}(\\"|\\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

FYI, Great you are using regex101 to check you regular expressions. I think it is the best website out there. When I am creating regex to be used in FME, I tend to select the 'Flavor' Python (on the left side). I have noticed that it 'debugs' similarly as FME does.

 

 

Badge +2

Hi @alpheus

Does this one work for you?

^[0-9]{1,3}(\\"|\\'{2})x[0-9]{1,3}(\\"|\\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

Works great, much obliged! Was it the missing quote escapes that was confusing the FME complier?

 

 

Badge +7

Hi @alpheus

Does this one work for you?

^[0-9]{1,3}(\\"|\\'{2})x[0-9]{1,3}(\\"|\\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

Yes, I think that was what made the compiler fail.

 

 

Userlevel 2
Badge +17

Hi @alpheus 

Does this one work for you?

^[0-9]{1,3}(\"|\'{2})x[0-9]{1,3}(\"|\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

No, I don't think it's essential to escape double/single quotation marks in a regex for the StringSearcher. This regex should work as well.

 

^[0-9]{1,3}("|'{2})x[0-9]{1,3}("|'{2})$
Your original regex matches this string, with the StringSearcher (FME 2016, 2017).

 

100"x200''
However, also matches this string.

 

100',x200""
Because [",'']{1,2} matches 1 to 2 of any of double quotation mark, comma, or single quotation mark, and I don't think it's preferable.
Badge +7

Hi @alpheus

Does this one work for you?

^[0-9]{1,3}(\\"|\\'{2})x[0-9]{1,3}(\\"|\\'{2})$

This is what I have changed:

1/ I have removed the [] around the x since [] allows you to define a range. x is just a normal character

2/ I have changed [",''] into ("|'') because it is one of the options. I have escaped (\\") the quotes so they are being used as a literal character and no longer signal the start of a string and I have included the {2} right after the single quote since only this character may appear twice.

3/ I have removed all plus-signs since the plus sign signals the use of at least one of the previous character (e.g. ^4+$ - means at least one '4' but it might be more) since this functionality was already implemented using the curly brackets, I removed the plus sign.

You are right @takashi the unescaped quote also works. So I have tested the original expression in both FME 2016.1 and 2017 and in both versions I do not get the 'failed to evaluate expression' message. @alpheus in what version are you working?

 

Badge +2

I'm using the ESRI licensed Data Interoperability version, the about says FME 20150114 - Build 15245.

Badge +3

[] defines a character class.

For a very good tutorial check out

Regular Expressions

The Complete Tutorial

Jan Goyvaerts

(Belgian)

It's still out there to be downloaded...

Phython uses a advanced regex engine, it has lookbehind for instance. Regexes created in that flavor will often fail in fme.

The regexp tester in the string searcher works pretty good now-a-days. I prefer using Rubular or the one in Notepad++.

Badge +7
Hi @alpheus

 

Could you close your question If you've got an answer on it?

 

Userlevel 2
Badge +17

I'm using the ESRI licensed Data Interoperability version, the about says FME 20150114 - Build 15245.

The regex engine within FME has been upgraded in FME 2016. Your original regex could not be compiled with FME 2015 and earlier. The reason is that the expression contains unnecessary + symbols. The regex engine in FME 2016 or later seems to just ignore the + symbols, but be aware that it's not correct use of + anyway.

 

Reply