Question

Regular Expression evaluation

12 years ago
June 19, 2013
8 replies
38 views

We need to use FME split up character strings like this:

“F29920110716104845F37920120929203346TC20080508184800”

into separate substrings:

"F29920110716104845", "F37920120929203346", "TC20080508184800".

The substrings are <personnel-ids> each made up of two sub-elements <operatorid><datetime> where the <operatorid> element always starts with [A-Z] and is two-five characters in length and the <datetime> element is always 14 decimal digits long (YYYYMMDDhhmmss).

The following JavaScript regular expression will parse this string and extract each substring:

( [A-Z].{1,4}[0-9]{14,17}((?=[A-Z])|$))

However, in the FME StringSearcher Transformer, it only ever extracts the first substring into _matched_parts{0}.

Q1. Does FME StringSearcher support a “global match” parameter, so that it finds all matches in the string, rather than just the first, and if so how do you set it?

Q2. Failing that, can anybody please recommend an alternate FME method to split out our <personnel-id> substrings?

takashi
7715 replies
12 years ago
June 19, 2013

Hi,

Unfortunately, the StringSercher seems not to support a parameter like "global match".

If the string always has 3 parts and each of them consists of one or more alphabetical characters and digits, the following expression would be effective: ^([A-Z]+[0-9]+)([A-Z]+[0-9]+)([A-Z]+[0-9]+)$ This is a simplified example, you can consider stricter expression, if necessary.

Hope this helps.

Takashi

david_r
8355 replies
12 years ago
June 19, 2013

Hi,

I think it would be quite easy to implement this using a PythonCaller and the re.findall() method. Give us a word if you need more details.

David

+39

ebygomm
Influencer
3313 replies
12 years ago
June 19, 2013

You can use a regular expression in a stringreplacer to search for the alpha characters, replace those with a comma and the matched characters and then use an attribute splitter to split at the inserted comma

deanrother
21 replies
12 years ago
June 19, 2013

Use the RegularExpressionMatcher from the FME Store. Connect it to a ListExploder set to read List Attribute: REM_matched_parts{}. I used the following regular expression: (\\w{1,4}\\d{13,17})

marks
7 replies
12 years ago
June 19, 2013

If the string pattern is consistent then try AttributeSplitter with a format string 18s18s16s

+15

gio
Contributor
2252 replies
12 years ago
June 24, 2013

Hi,

Use a Tcl regexp with -inline switch and maybe switch all.

This will give u all matches.

Like this in a an atribute creator:

@Evaluate([regexp -all -inline {your expression} "@Value(Object)" ])

If u know the amount of hits, i.e. 3 u can also do

@Evaluate([regexp -all {your expression matchedparts Match1 Match2 Match3} "@Value(Object)" ])

It wil then write em to variables named Match1 etc.

U can then assign those variables to attirbutes.

check www.tcl.tk/man/tcl8.4/TclCmd very powerfull

+15

gio
Contributor
2252 replies
12 years ago
June 24, 2013

...sryy

@Evaluate([regexp -all {your expression matchedparts Match1 Match2 Match3} "@Value(Object)" ])

should be

@Evaluate([regexp -all {your expression} "@Value(Object)" matchedparts Match1 Match2 Match3])

too many copy 'n pasting :)

+15

gio
Contributor
2252 replies
12 years ago
June 24, 2013

Hi, me again,

I changed ur regexp to ([A-Z]{1,4}[0-9]{14,17})(?=[A-Z]|$) so i get just 3 matches. Yours gives a space match as well 3x.

In an attribute creator, arithmic editor:

[regexp -all -inline {([A-Z]{1,4}[0-9]{14,17})(?=[A-Z]|$)} "@Value(Object)" ]

then

attributesplitter, delimeter: space.

listduplicateremover.

listexploder.

U should now have 3 attributes with separate <personnel-ids>

greetings

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Regular Expression evaluation