Question

Splitting Strings


Badge
Hi there,

 

 

is there a simple way (like a regular expression) to remove a substring and to split the string on every capital letter by inserting a whitespace?

 

 

Example: Attribute value of "Hello/DearFmeCommunity" should become "Dear Fme Community"

 

 

--> "Hello/" should be eliminated in the string and the rest should be splitted on every upper letter..

 

 

Many thanks.

 

 

Kind regards

 

Thomas

11 replies

Badge
I'm not sure about eliminating part of the string (I'm just not sure what your exact conditions are - is it everything up to a / character?) but the other part is relatively simple.

 

 

Use a StringReplacer. The parameters are:

 

 

Text to Match: [A-Z]

 

Replacement Text:  &   (that's a space followed by an ampersand character)

 

Use Regular Expressions: Yes

 

Case Sensitive: Yes

 

 

That will add a space before each upper case letter. If you really want to split the data then follow up with an AttributeSplitter.

 

 

Hope this helps

 

 

Mark

 

 

Mark Ireland

 

Product Evangelist

 

Safe Software Inc
Badge +3
I think it cannot be done with one regexp, certainly not with the transformers.

 

 

Some kind of iteration is needed for this.

 

Wether you do it in tcl, python.

 

Using transformers would involve tcl, as u can use these in creators (singleline and nested tcl functions) and a customtransformer to create the required loop.

 

 

Here is a simple Python script.

 

U can find these all of the python forums.

 

 

def split_on_caps(str):

 

        rs = re.findall('[A-Z][^A-Z]*',str)

 

    fs = ""

 

    for word in rs:

 

        fs += " "+word

 

    return fs

 

 

Use it in a pythoncaller.

 

 

U can add the "/" if u need to, or just have it searched and removed it u dont need it. (dont forget escaping it)
Userlevel 2
Badge +17
Hi,

 

 

You can use the AttributeSplitter (Delimiter: /) and the ListIndexer to extract substring which appears after the last slash in the string. The ListIndexer (List Index: -1) extracts the last element (i.e. substring after the last slash) from the list.

 

You can then apply the StringReplacer with Mark's suggestion.

 

Finally, the AttributeTrimmer can be used to remove excessive white space added in the head of the string.

 

 

Takashi
Badge +3
Sure you can do those things.

 

 

But thats not the solution to posed problem.

 

 

Mike suggestion just replaces all Capitals by space+Capital.

 

It does not target [a-z][A-Z] to create/replace [a-z]\\s[A-Z]

 

So any capital word would get messed up.

 

 

If u use the stringreplacer with

 

Text to Match: [a-z][A-Z]  or even [a-z]([A-Z])

 

Replacement Text:  &

 

It doesnt yield the answer either. Try it out, it doesn't yiled an error.

 

 

You can use a creator with a tcl regexp to get the capitals  @Evaluate([regexp -all -inline {[A-Z\\/]} "@Value(tststr)"]) andtcl split function to get the "rest" @Evaluate([split @Value(tststr) "@Value(tststr2)"])

 

Now u have to reassemble the words from thes to lists

 

{} ello {} ear me ommunity and

 

H / D F C

 

 

...i'd rather go for a simple python script (or a tcl) using a caller.

 

 

the slash is no problem in this matter.
Userlevel 2
Badge +17
Gio, I've understood your point.

 

I think Tcl regsub command is suitable to resolve the issue.

 

@Evaluate([regsub -all {([^A-Z\\s/])([A-Z])} "@Value(src)" {\\1 \\2}])
Badge +3
@Takashi,

 

 

Yes,

 

 

Regsub like that will fulfill ThomasK 's question.

 

 

FME string transformers using regexp are too limited, thats why i use single line (nested) tcl functions in for instance creators. Been doin that since i found out i could go quite far using tcl functions in creators, testers and such. (that'll be 3 years, i'm fiddling with fme for about 3 years now)

 

Prior to the existence of conditonal options in creators i used tcl to do that (wich often led to immensely compex tcl-lines..lol) I often still do that...

 

 

You should see some of my contsructions... they give me a headache when i look at them..;)

 

 

 

Badge +3
btw this one does the same

 

@Evaluate([regsub -all {([a-z])([A-Z])} "@Value(tststr)" {\\1 \\2}])

 

 

and this one removes the slash

 

@Evaluate([regsub -all {([a-z])/*([A-Z])} "@Value(tststr)" {\\1 \\2}])

 

 

(as the slash is not captured)

 

 

and maybe to explain it to Thamos

 

the substitution {\\1 \\2} is actually matchedsubpart1+space+matchedsubpart2 
Badge
Many thanks to you guys for all the answers.

 

 

I'm not very firm in the string replacement field. Can somebody perhaps explain where I can enter the @Evaluate() functions with the Regsub? The creator doesn't have an input port, so I guess it's one of the string transformers...?

 

 

Kind regards

 

Thomas

 

 

Badge +3
AttributeCreator
Badge
Many thanks. Works like a charm ,-)
Badge +10

How about splitting only on the first word separated by a space between the first two words of a string: "word1 word2 word3"? So new_attribute1 = word1 and new_attribute2 = word2 word3.

Reply