Solved

RegEx in FME v Rubular v RegExr

10 years ago
March 20, 2015
4 replies
13 views

howard_l
44 replies

Hi,

From the following sample text string........

......I'm trying to extract times and data ie end up with the times (eg 7:45) and the numeric data eg 21 56 etc

.............

As a test bed I sometimes use Rubular or RegExr for testing regular expressions - both pretty good,

The following expression does exactly what I want:

\\d+:\\d+|\\d+|\\n

Both test beds - Rubular and RegExr return the required data and do exactly what I want.

..........

However putting this expression in the FME StringSearcher Transformer only returns the first Group ie 1:00 then 7:45 etc

.......

My question is - why is there a difference between what RegEx in FME returns and what Rubular and RegExr returns ???

........

See attached screen dumps.

Any help much appreciated.

.........

The above expression also works fine in RegexBuddy (same as Rubular)

.........

Cheers

Howard L'

Best answer by takashi

Hi Howard,

The pipe (|) means "OR", so your regex matches one of "\\d+:\\d", "\\d+" or "\\n".

Rubular shows all the matched parts, but the StringSearcher assigns the first matched part to "_matched_characters" attribute.

I think it's the reason for the difference between them.

For example, this expression can be used if you want to extract every matched part.

The StringSearcher assigns the matched parts to elements of "_matched_parts" list.

-----

(\\d+:\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)

-----

Input: <abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

Result:

`_matched_characters' has value `1:00<def>34<def>12<def>9<def>0<def>8'

`_matched_parts{0}' has value `1:00'

`_matched_parts{1}' has value `34'

`_matched_parts{2}' has value `12'

`_matched_parts{3}' has value `9'

`_matched_parts{4}' has value `0'

`_matched_parts{5}' has value `8'

Takashi

View original

Did this help you find an answer to your question?

This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

takashi
7724 replies
Best Answer
10 years ago
March 21, 2015

Hi Howard,

The pipe (|) means "OR", so your regex matches one of "\\d+:\\d", "\\d+" or "\\n".

Rubular shows all the matched parts, but the StringSearcher assigns the first matched part to "_matched_characters" attribute.

I think it's the reason for the difference between them.

For example, this expression can be used if you want to extract every matched part.

The StringSearcher assigns the matched parts to elements of "_matched_parts" list.

-----

(\\d+:\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)\\D+(\\d+)

-----

Input: <abc>1:00<def>34<def>12<def>9<def>0<def>8<xyz>

Result:

`_matched_characters' has value `1:00<def>34<def>12<def>9<def>0<def>8'

`_matched_parts{0}' has value `1:00'

`_matched_parts{1}' has value `34'

`_matched_parts{2}' has value `12'

`_matched_parts{3}' has value `9'

`_matched_parts{4}' has value `0'

`_matched_parts{5}' has value `8'

Takashi

howard_l
Author
44 replies
10 years ago
March 21, 2015

Takashi,

Thanks very much for your reply.

Yes, I was aware of the alternation symbol [pipe(|) = OR]. The main issue in the original question was the fact that 3 completely different test beds (Rubular, RegExr and RegexBuddy) all show matched characters 1 way (by showing all possible matches), while it seems so far that FME alone shows matched characters another way by only showing the first match (in the _matched_characters attribute). It would be really good if the test beds and FME were all consistent. That way you would be able to move from test bed to FME a little easier. If Safe could change this I think it would be really useful.

.............

Your tip about using the _matched_parts is really helpful

.............

Thanks once again for your reply

Howard L'

takashi
7724 replies
10 years ago
March 21, 2015

The TclCaller with this Tcl expression will generate space-delimited all matched parts (similar result to other tools).

Tcl Expression Example:

-----

return [regexp -all -inline -- {\\d+:\\d+|\\d+} [FME_GetAttribute "text_line_data"]]

-----

FYI

+15

gio
Contributor
2252 replies
10 years ago
March 23, 2015

There are many Regexp flavors.

There is a list of wich has wich fascility.

Rubulator can show 3 versions atm.

Rubular standard shows results with the "-all" swith on.

I find this very handy.

Switches can be used in AttributeCreators. Though fme 2015 has a issue with expression outputs when they contain non-numerals; you cannot remove the @Evaluate() icon. In 2014 you could just remove it because it was not a fixed icon.

If you use the same regexp but with switches "-all", "indices", and "-inline" you can extract all the hits. There are posts on this in this forum. I found a couple i made in 2010. Shame to bug the evaluator like this...:(

Here www.tcl.tk/ you can find all you wish to know about it in all flavors

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

RegEx in FME v Rubular v RegExr

4 replies

Helpful Members This Week

Recently Solved Questions

Read Access query FME

Process CSV file pairs

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

Community Stats

Latest FME

Cookie policy

Cookie settings

Related Topics

RegEx for finding more than one letter in a stringicon

How to combine multiple rows of one attribute into two attributes based on Excel indentionicon

Text To Column using RegExicon

Regex: How to remove spaces pattern in a w o r dicon

Number Notationicon

Helpful Members This Week

Recently Solved Questions

Read Access query FME

Process CSV file pairs

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings