Skip to main content
Question

Regular Expression evaluation


We need to use FME split up character strings like this:

 

“F29920110716104845F37920120929203346TC20080508184800”

 

into separate substrings:

 

"F29920110716104845", "F37920120929203346", "TC20080508184800".

 

 

The substrings are <personnel-ids> each made up of two sub-elements <operatorid><datetime>  where the <operatorid> element always starts with [A-Z] and is two-five characters in length and the <datetime> element is  always 14 decimal digits long (YYYYMMDDhhmmss).

 

 

The following JavaScript regular expression will parse this string and extract each substring: 

 

     ( [A-Z].{1,4}[0-9]{14,17}((?=[A-Z])|$))   

 

 

However, in the FME StringSearcher Transformer, it only ever extracts the first substring into _matched_parts{0}.     

 

 

Q1.   Does FME StringSearcher support a “global match” parameter, so that it finds all matches in the string, rather than just the first, and if so how do you set it?   

 

 

Q2.  Failing that, can anybody please recommend an alternate FME method to split out our <personnel-id> substrings?

 

 

  

 

 

 

 

8 replies

takashi
Evangelist
  • June 19, 2013
Hi,

 

 

Unfortunately, the StringSercher seems not to support a parameter like "global match".

 

If the string always has 3 parts and each of them consists of one or more alphabetical characters and digits, the following expression would be effective: ^([A-Z]+[0-9]+)([A-Z]+[0-9]+)([A-Z]+[0-9]+)$   This is a simplified example, you can consider stricter expression, if necessary.

 

Hope this helps.

 

 

Takashi

david_r
Celebrity
  • June 19, 2013
Hi,

 

 

I think it would be quite easy to implement this using a PythonCaller and the re.findall() method. Give us a word if you need more details.

 

 

David

ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • June 19, 2013
You can use a regular expression in a stringreplacer to search for the alpha characters, replace those with a comma and the matched characters and then use an attribute splitter to split at the inserted comma

 

 

 


Forum|alt.badge.img+1
Use the RegularExpressionMatcher from the FME Store.  Connect it to a ListExploder set to read List Attribute: REM_matched_parts{}.  I used the following regular expression: (\\w{1,4}\\d{13,17})

  • June 19, 2013
If the string pattern is consistent then try AttributeSplitter with a format string 18s18s16s

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • June 24, 2013
Hi,

 

 

Use a Tcl regexp with -inline switch and maybe switch all.

 

This will give u all matches.

 

Like this in a an atribute creator:

 

@Evaluate([regexp -all -inline {your expression} "@Value(Object)" ])

 

 

If u know the amount of hits, i.e. 3  u can also do

 

@Evaluate([regexp -all  {your expression matchedparts Match1 Match2 Match3} "@Value(Object)" ])

 

It wil then write em to variables named Match1 etc.

 

U can then assign those variables to attirbutes.

 

 

check www.tcl.tk/man/tcl8.4/TclCmd very powerfull

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • June 24, 2013
...sryy

 

 

@Evaluate([regexp -all {your expression matchedparts Match1 Match2 Match3} "@Value(Object)" ])

 

 

should be

 

@Evaluate([regexp -all {your expression} "@Value(Object)" matchedparts Match1 Match2 Match3])

 

 

 

 

too many copy 'n pasting :)

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • June 24, 2013
Hi, me again,

 

 

I changed ur regexp to ([A-Z]{1,4}[0-9]{14,17})(?=[A-Z]|$) so i get just 3 matches. Yours gives a space match as well 3x.

 

 

In an attribute creator, arithmic editor:

 

[regexp -all -inline {([A-Z]{1,4}[0-9]{14,17})(?=[A-Z]|$)} "@Value(Object)" ]

 

 

then

 

attributesplitter, delimeter: space.

 

listduplicateremover.

 

listexploder.

 

 

U should now have 3 attributes with separate <personnel-ids>

 

 

greetings

 

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings