Skip to main content

I hope the title is already quite self explanatory.

To illustrate the issue,

  1. I generated some (lowercase) text strings with 10 to 50 characters, using the methodology that was described by cwarren in the topic https://knowledge.safe.com/questions/88643/random-letter-generator.html.
  2. I first used a stringsearcher transformer with the pattern '.*?a.*'
  3. I then also used the FindRegEx stringfunction in an attributemanager for the same expression, with various combinations for configuring the optional parameters
    • while including or excluding "" characters around the regex pattern
    • while including or excluding the ^ and $ anchors (for 'start of string' and 'end of string')
  4. I tried searching for only part of the string in the stringfunction
    • while also trying various combinations of greedy vs nongreedy searches.
    • i.e. '.*?a' is a non greedy search to match 'zero or more' (i.e. '*') of 'any single character' (i.e. '.'), until the first 'a' character (i.e. '?'), whereas '.*a' would do the same, but it would match any single character until the last 'a' character.

The result I obtained was that while using the stringsearcher transformer I did obtain search matches, but I didn't manage to obtain any/the same matches when using the FindRegEx string function, while using the same search expression (where I would expect to see identical results). All search results with the string function returned '-1' for the result.

My first gut feeling was that it might have something to do that the regex checks whether the entire string complies to a pattern (i.e. contains an 'a' character), but when I perform an 'identity search' (searching whether the string contains the string), it does return the expected '0' index.

Then I investigated a bit more, and found that searching for the regex '.*?a' in the FindRegEx stringfunction also didn't work, whereas searching for the regex 'a.*' does seem to work. That got me thinking that maybe the 'non greedy' searching in the FindRegEx stringfunction just doesn't work. I feel that's indeed the case, as using the regex '.*a.*' in the FindRegEx stringfunction does work as expected.

Below are some screenshots to illustrate the issue, as well as the workspace that was used.

To me this feels like a bit of a bug in the FindRegEx stringfunction. However, I would be more than happy to hear any suggestions, ideas or other feedback in case I'm overlooking something.

@ safe; if it's indeed a bug, please also check whether this is isolated to only the FindRegEx stringfunction, or also occurs in e.g. the ReplaceRegEx stringfunctions.

Kind regards,

Thijs

N.b. I am using FME(R) 2019.2.3.2 (20200320 - Build 19825 - WIN64)

1)

2a;

2b;

3a)

3b)

4a)

4b)

@thijsknapen Thanks for the example. We've reproduced the issue of FindRegEx not recognizing the lazy (?) qualifier. We'll try and get eh StringSearcher and the regex functions to be compatible.


Quick addendum; today I noticed that the stringfunction ReplaceRegEx is able to handle the lazy search quantifier '?'.

Although i'm glad this is the case, I do think it's a bit peculiar that the one regex stringfunction is afflicted with this issue, and the other regex stringfunction isn't.

Jira issue: FMEENGINE-28864


Regular expression (regex) functions in the Text Editor - String Functions have been updated in FMe 2021.1 to be compatible with StringSearcher & StringReplacer. For backward compatibility reasons, the functions have been renamed:

  • @FindregularExpression() replaces @FindRegEx()
  • @ReplaceRegularExpression() replaces ReplaceRegEx()
  • @SubstringRegularExpression() is new and matches the substring functionality of StringSearcher

The old functions will continue to work. Only the new functions will be available when you create new expressions.

Example attached:


@Mark Stoakes​ That's great to hear. Tnx. Look forward to exploring/using these revised regex string functions!


Reply