Skip to main content
Solved

RegEx in FME v Rubular v RegExr


Forum|alt.badge.img+1
Hi,

 

From the following sample text string........

 

 

<abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

 

<abc>7:45<def>21<def>56<def>99<def>0<def>11<xyz>

 

<abc>10:30<def>0<def>0<def>44<def>0<def>33<xyz>

 

 

......I'm trying to extract times and data ie end up with the times (eg 7:45) and the numeric  data eg 21 56 etc

 

.............

 

As a test bed I sometimes use Rubular or RegExr for testing regular expressions - both pretty good,

 

The following expression does exactly what I want:

 

\\d+:\\d+|\\d+|\\n

 

Both test beds - Rubular and RegExr return the required data and do exactly what I want.

 

..........

 

However putting this expression in the FME StringSearcher Transformer only returns the first Group ie 1:00 then 7:45 etc

 

.......

 

My question is - why is there a difference between what RegEx in FME returns and what Rubular and RegExr returns ???

 

........

 

See attached screen dumps.

 

Any help much appreciated.

 

.........

 

PS

 

The above expression also works fine in RegexBuddy (same as Rubular)

 

.........

 

Cheers

 

Howard L'

 

 

 

 

 

 

Best answer by takashi

Hi Howard,

 

 

The pipe (|) means "OR", so your regex matches one of "\\d+:\\d", "\\d+" or "\\n".

 

Rubular shows all the matched parts, but the StringSearcher assigns the first matched part to "_matched_characters" attribute.

 

I think it's the reason for the difference between them.

 

 

For example, this expression can be used if you want to extract every matched part.

 

The StringSearcher assigns the matched parts to elements of "_matched_parts" list.

 

-----

 

(\\d+:\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)

 

-----

 

Input: <abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

 

Result:

 

`_matched_characters' has value `1:00<def>34<def>12<def>9<def>0<def>8'

 

`_matched_parts{0}' has value `1:00'

 

`_matched_parts{1}' has value `34'

 

`_matched_parts{2}' has value `12'

 

`_matched_parts{3}' has value `9'

 

`_matched_parts{4}' has value `0'

 

`_matched_parts{5}' has value `8'

 

 

Takashi
View original
Did this help you find an answer to your question?
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

takashi
Influencer
  • Best Answer
  • March 21, 2015
Hi Howard,

 

 

The pipe (|) means "OR", so your regex matches one of "\\d+:\\d", "\\d+" or "\\n".

 

Rubular shows all the matched parts, but the StringSearcher assigns the first matched part to "_matched_characters" attribute.

 

I think it's the reason for the difference between them.

 

 

For example, this expression can be used if you want to extract every matched part.

 

The StringSearcher assigns the matched parts to elements of "_matched_parts" list.

 

-----

 

(\\d+:\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)

 

-----

 

Input: <abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

 

Result:

 

`_matched_characters' has value `1:00<def>34<def>12<def>9<def>0<def>8'

 

`_matched_parts{0}' has value `1:00'

 

`_matched_parts{1}' has value `34'

 

`_matched_parts{2}' has value `12'

 

`_matched_parts{3}' has value `9'

 

`_matched_parts{4}' has value `0'

 

`_matched_parts{5}' has value `8'

 

 

Takashi

Forum|alt.badge.img+1
  • Author
  • March 21, 2015
Takashi,

 

Thanks very much for your reply.

 

Yes, I was aware of the alternation symbol [pipe(|) = OR]. The main issue in the original question was the fact that 3 completely different test beds (Rubular, RegExr and RegexBuddy) all show matched characters 1 way (by showing all possible matches), while it seems so far that FME alone shows matched characters another way by only showing the first match (in the _matched_characters attribute). It would be really good if the test beds and FME were all consistent. That way you would be able to move from test bed to FME a little easier. If Safe could change this I think it would be really useful.

 

.............

 

Your tip about using the _matched_parts is really helpful

 

.............

 

Thanks once again for your reply

 

Howard L'

takashi
Influencer
  • March 21, 2015
The TclCaller with this Tcl expression will generate space-delimited all matched parts (similar result to other tools).

 

 

Tcl Expression Example:

 

-----

 

return [regexp -all -inline -- {\\d+:\\d+|\\d+} [FME_GetAttribute "text_line_data"]]

 

-----

 

 

FYI

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • March 23, 2015
There are many Regexp flavors.

 

 

There is a list of wich has wich fascility.

 

Rubulator can show 3 versions atm.

 

 

Rubular standard shows results with the "-all" swith on.

 

I find this very handy.

 

 

Switches can be used in AttributeCreators. Though fme 2015 has a issue with expression outputs when they contain non-numerals; you cannot remove the @Evaluate() icon. In 2014 you could just remove it because it was not a fixed icon.

 

 

If you use the same regexp but with switches "-all", "indices", and "-inline" you can extract all the hits. There are posts on this in this forum. I found a couple i made in 2010. Shame to bug the evaluator like this...:(

 

 

Here www.tcl.tk/ you can find all you wish to know about it in all flavors

 

 

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings