Skip to main content
Hi,

 

From the following sample text string........

 

 

<abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

 

<abc>7:45<def>21<def>56<def>99<def>0<def>11<xyz>

 

<abc>10:30<def>0<def>0<def>44<def>0<def>33<xyz>

 

 

......I'm trying to extract times and data ie end up with the times (eg 7:45) and the numeric  data eg 21 56 etc

 

.............

 

As a test bed I sometimes use Rubular or RegExr for testing regular expressions - both pretty good,

 

The following expression does exactly what I want:

 

\\d+:\\d+|\\d+|\\n

 

Both test beds - Rubular and RegExr return the required data and do exactly what I want.

 

..........

 

However putting this expression in the FME StringSearcher Transformer only returns the first Group ie 1:00 then 7:45 etc

 

.......

 

My question is - why is there a difference between what RegEx in FME returns and what Rubular and RegExr returns ???

 

........

 

See attached screen dumps.

 

Any help much appreciated.

 

.........

 

PS

 

The above expression also works fine in RegexBuddy (same as Rubular)

 

.........

 

Cheers

 

Howard L'

 

 

 

 

 

 
Hi Howard,

 

 

The pipe (|) means "OR", so your regex matches one of "\\d+:\\d", "\\d+" or "\\n".

 

Rubular shows all the matched parts, but the StringSearcher assigns the first matched part to "_matched_characters" attribute.

 

I think it's the reason for the difference between them.

 

 

For example, this expression can be used if you want to extract every matched part.

 

The StringSearcher assigns the matched parts to elements of "_matched_parts" list.

 

-----

 

(\\d+:\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)

 

-----

 

Input: <abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

 

Result:

 

`_matched_characters' has value `1:00<def>34<def>12<def>9<def>0<def>8'

 

`_matched_parts{0}' has value `1:00'

 

`_matched_parts{1}' has value `34'

 

`_matched_parts{2}' has value `12'

 

`_matched_parts{3}' has value `9'

 

`_matched_parts{4}' has value `0'

 

`_matched_parts{5}' has value `8'

 

 

Takashi
Takashi,

 

Thanks very much for your reply.

 

Yes, I was aware of the alternation symbol lpipe(|) = OR]. The main issue in the original question was the fact that 3 completely different test beds (Rubular, RegExr and RegexBuddy) all show matched characters 1 way (by showing all possible matches), while it seems so far that FME alone shows matched characters another way by only showing the first match (in the _matched_characters attribute). It would be really good if the test beds and FME were all consistent. That way you would be able to move from test bed to FME a little easier. If Safe could change this I think it would be really useful.

 

.............

 

Your tip about using the _matched_parts is really helpful

 

.............

 

Thanks once again for your reply

 

Howard L'
The TclCaller with this Tcl expression will generate space-delimited all matched parts (similar result to other tools).

 

 

Tcl Expression Example:

 

-----

 

return uregexp -all -inline -- {\\d+:\\d+|\\d+} dFME_GetAttribute "text_line_data"]]

 

-----

 

 

FYI
There are many Regexp flavors.

 

 

There is a list of wich has wich fascility.

 

Rubulator can show 3 versions atm.

 

 

Rubular standard shows results with the "-all" swith on.

 

I find this very handy.

 

 

Switches can be used in AttributeCreators. Though fme 2015 has a issue with expression outputs when they contain non-numerals; you cannot remove the @Evaluate() icon. In 2014 you could just remove it because it was not a fixed icon.

 

 

If you use the same regexp but with switches "-all", "indices", and "-inline" you can extract all the hits. There are posts on this in this forum. I found a couple i made in 2010. Shame to bug the evaluator like this...:(

 

 

Here www.tcl.tk/ you can find all you wish to know about it in all flavors

 

 

Reply