I hope the title is already quite self explanatory.
To illustrate the issue,
- I generated some (lowercase) text strings with 10 to 50 characters, using the methodology that was described by cwarren in the topic https://knowledge.safe.com/questions/88643/random-letter-generator.html.
- I first used a stringsearcher transformer with the pattern '.*?a.*'
- I then also used the FindRegEx stringfunction in an attributemanager for the same expression, with various combinations for configuring the optional parameters
- while including or excluding "" characters around the regex pattern
- while including or excluding the ^ and $ anchors (for 'start of string' and 'end of string')
- I tried searching for only part of the string in the stringfunction
- while also trying various combinations of greedy vs nongreedy searches.
- i.e. '.*?a' is a non greedy search to match 'zero or more' (i.e. '*') of 'any single character' (i.e. '.'), until the first 'a' character (i.e. '?'), whereas '.*a' would do the same, but it would match any single character until the last 'a' character.
The result I obtained was that while using the stringsearcher transformer I did obtain search matches, but I didn't manage to obtain any/the same matches when using the FindRegEx string function, while using the same search expression (where I would expect to see identical results). All search results with the string function returned '-1' for the result.
My first gut feeling was that it might have something to do that the regex checks whether the entire string complies to a pattern (i.e. contains an 'a' character), but when I perform an 'identity search' (searching whether the string contains the string), it does return the expected '0' index.
Then I investigated a bit more, and found that searching for the regex '.*?a' in the FindRegEx stringfunction also didn't work, whereas searching for the regex 'a.*' does seem to work. That got me thinking that maybe the 'non greedy' searching in the FindRegEx stringfunction just doesn't work. I feel that's indeed the case, as using the regex '.*a.*' in the FindRegEx stringfunction does work as expected.
Below are some screenshots to illustrate the issue, as well as the workspace that was used.
To me this feels like a bit of a bug in the FindRegEx stringfunction. However, I would be more than happy to hear any suggestions, ideas or other feedback in case I'm overlooking something.
@ safe; if it's indeed a bug, please also check whether this is isolated to only the FindRegEx stringfunction, or also occurs in e.g. the ReplaceRegEx stringfunctions.
Kind regards,
Thijs
N.b. I am using FME(R) 2019.2.3.2 (20200320 - Build 19825 - WIN64)
1)
2a;
2b;
3a)
3b)
4a)
4b)