Skip to main content
Question

RegEx for finding more than one letter in a string


Hi all,

 

 

I have a string 'Apples Bananas Cherries' in a complete attribute field and I want to extract the capital letters and put them in another new attribute field, result being: 'ABC'.

 

 

How can I do this? Perhaps Rubular.com (v useful) whetted my appetited because when I used [A-Z] it highlighted each capital but StringSearcher in FME would only return the 'A'.

 

 

I've tried numerous other ways including [A-Z]+ and different variations. Can anyone help?

 

 

Regards,

 

 

Ian

10 replies

fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • October 4, 2013
Hi,

 

Try to include a space ' '  in your regex (I think \\w will do)

takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • October 5, 2013
Hi Ian,

 

 

I think many issues have two or more approaches to get a solution. In this case, extracting upper case characters or removing non-upper case characters can be considered.

 

 

Consider how to remove (to replace with empty string) all non-upper case characters. The StringReplacer with these settings would work. Text To Find: [^A-Z]+ Replacement Text: <not set. i.e. replace matched characters with empty> Use Regular Expressions: yes Case Sensitive: yes   I've started to learn Tcl just a few days ago, because I found big availability of Tcl in FME especially on string processing. Tcl command also can be used in this case. If you are familiar with Tcl (I think so from your previous post "TCL files"), please evaluate these examples for future references. Thanks.   Example for "Tcl Expression" for the TclCaller: FME_SetAttribute new_attr_name [regsub -all {[^A-Z]+} [FME_GetAttribute old_attr_name] {}]   Example for "Value" setting for the AttributeCreator: @Evaluate([regsub -all {[^A-Z]+} {@Value(old_attr_name)} {}])

 

 

Takashi

takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • October 5, 2013
The StringReplacer with these settings might be also available. This is the approach extracting upper case chars.

 

Text To Find: [^A-Z]*([A-Z]+)[^A-Z]*

 

Replacement Text: \\1   *this indicates the 1st bracketed part in the regex.

 

Use Regular Expressions: yes

 

Case Sensitive: yes

 

 

To retain the original string; create a new attribute, copy the original string to it, and then replace the new attribute.

david_r
Evangelist
  • October 7, 2013
I couldn't get the suggestion with the StringReplacer to work, so here is a solution using a PythonCaller. It assumes an input attribute "mystring" and it will create a new attribute "my_result" containing all the UPPER case letters at a beginning of a word:

 

 

-----

 

import fmeobjects import re   def FeatureProcessor(feature):     criteria = r"\\b[A-Z]+"     to_search = feature.getAttribute("my_string")     found = re.findall(criteria, to_search)     result = ''.join(found)     feature.setAttribute("my_result", result)

 

-----

 

 

If "my_string" = "Apple Bananas Cherries raDishes", then "my_result" = "ABC"

 

 

Tested with FME2013sp3.

 

 

David

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 7, 2013
A string replacer with the folloing settings works fine for me

 

 

 

StringReplacer [StringReplacer:3]

 

Attributes: name

 

Text to Find: [a-z ]

 

Replacement Text: <not set>

 

Use Regular Expressions: Yes

 

Case Sensitive: Yes

 

Version: 3

 

 

Replaces any lowercase letters and spaces with nothing, leaving just uppercase

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 7, 2013
Aah, a job for tcl!

 

 

U can put a regexp in a attributecreator or whatever:

 

 

=[regexp -inline -all {[A-Z]} "@Value(LettersCap)"]

 

 

the all switch makes it find all occuring hits for the expression.

 

the inline switch writes aatributes to a parameter, just like what u see in Ruby.

 

 

Your input "Apples Bananas Cherries" will end up as "A B C"

 

Now u can use attributesplitter with a space as delimiter if u want the letters in seperate attributes.

 

 

 

Tcl has many switches and controls...pretty neat.

 

 

i.e.

 

[string map {A 1stletterofAlphabet B 2ndletterofAlphabet C 3thletterofAlphabet} [regexp -inline -all {[A-Z]} "@Value(LettersCap)"]

 

 

Value " Apples Bananas CherriesAnnanas"

 

would result in "1stletterofAlphabet 2ndletterofAlphabet 3thletterofAlphabet 1stletterofAlphabet"

 

 

 

etc.

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 7, 2013
...where i wrote "parameter" should be "variable"..

  • Author
  • October 7, 2013
Thanks to all for the suggestions - and the wide variety of options!

 

 

I did think that I'd be able to use StringSearcher as that directs matched output to attributes. StringReplacer sounds like the best option for me.

david_r
Evangelist
  • October 7, 2013
That's one of the strengths of FME: there are almost always several different paths to the same solution :-)

takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • October 8, 2013
Glad to know you got a solution.   Gio, thank you for Tcl examples.

 

Although I think the StringReplacer is preferable solution in this case, I understand that Tcl commands can be used effectively to perform more complicated string processing.

 

The AttributeCreator with this expression, for example, replaces "Apples Bananas Cherries" with "A,B,C" (comma separated upper case characters).

 

@Evaluate([join [regexp -all -inline {[A-Z]} {@Value(attr_name)}] {,}])   Interesting. I've taken to Tcl.

Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings