Question

Splitting Strings

10 years ago
July 22, 2014
11 replies
1130 views

thomask
71 replies

Hi there,

is there a simple way (like a regular expression) to remove a substring and to split the string on every capital letter by inserting a whitespace?

Example: Attribute value of "Hello/DearFmeCommunity" should become "Dear Fme Community"

--> "Hello/" should be eliminated in the string and the rest should be splitted on every upper letter..

Many thanks.

Kind regards

Thomas

mark
43 replies
10 years ago
July 22, 2014

I'm not sure about eliminating part of the string (I'm just not sure what your exact conditions are - is it everything up to a / character?) but the other part is relatively simple.

Use a StringReplacer. The parameters are:

Text to Match: [A-Z]

Replacement Text: & (that's a space followed by an ampersand character)

Use Regular Expressions: Yes

Case Sensitive: Yes

That will add a space before each upper case letter. If you really want to split the data then follow up with an AttributeSplitter.

Hope this helps

Mark

Mark Ireland

Product Evangelist

Safe Software Inc

+15

gio
Contributor
2252 replies
10 years ago
July 22, 2014

I think it cannot be done with one regexp, certainly not with the transformers.

Some kind of iteration is needed for this.

Wether you do it in tcl, python.

Using transformers would involve tcl, as u can use these in creators (singleline and nested tcl functions) and a customtransformer to create the required loop.

Here is a simple Python script.

U can find these all of the python forums.

def split_on_caps(str):

rs = re.findall('[A-Z][^A-Z]*',str)

fs = ""

for word in rs:

fs += " "+word

return fs

Use it in a pythoncaller.

U can add the "/" if u need to, or just have it searched and removed it u dont need it. (dont forget escaping it)

takashi
7714 replies
10 years ago
July 23, 2014

Hi,

You can use the AttributeSplitter (Delimiter: /) and the ListIndexer to extract substring which appears after the last slash in the string. The ListIndexer (List Index: -1) extracts the last element (i.e. substring after the last slash) from the list.

You can then apply the StringReplacer with Mark's suggestion.

Finally, the AttributeTrimmer can be used to remove excessive white space added in the head of the string.

Takashi

+15

gio
Contributor
2252 replies
10 years ago
July 23, 2014

Sure you can do those things.

But thats not the solution to posed problem.

Mike suggestion just replaces all Capitals by space+Capital.

It does not target [a-z][A-Z] to create/replace [a-z]\\s[A-Z]

So any capital word would get messed up.

If u use the stringreplacer with

Text to Match: [a-z][A-Z] or even [a-z]([A-Z])

Replacement Text: &

It doesnt yield the answer either. Try it out, it doesn't yiled an error.

You can use a creator with a tcl regexp to get the capitals @Evaluate([regexp -all -inline {[A-Z\\/]} "@Value(tststr)"]) andtcl split function to get the "rest" @Evaluate([split @Value(tststr) "@Value(tststr2)"])

Now u have to reassemble the words from thes to lists

{} ello {} ear me ommunity and

H / D F C

...i'd rather go for a simple python script (or a tcl) using a caller.

the slash is no problem in this matter.

takashi
7714 replies
10 years ago
July 23, 2014

Gio, I've understood your point.

I think Tcl regsub command is suitable to resolve the issue.

@Evaluate([regsub -all {([^A-Z\\s/])([A-Z])} "@Value(src)" {\\1 \\2}])

+15

gio
Contributor
2252 replies
10 years ago
July 23, 2014

@Takashi,

Yes,

Regsub like that will fulfill ThomasK 's question.

FME string transformers using regexp are too limited, thats why i use single line (nested) tcl functions in for instance creators. Been doin that since i found out i could go quite far using tcl functions in creators, testers and such. (that'll be 3 years, i'm fiddling with fme for about 3 years now)

Prior to the existence of conditonal options in creators i used tcl to do that (wich often led to immensely compex tcl-lines..lol) I often still do that...

You should see some of my contsructions... they give me a headache when i look at them..;)

+15

gio
Contributor
2252 replies
10 years ago
July 23, 2014

btw this one does the same

@Evaluate([regsub -all {([a-z])([A-Z])} "@Value(tststr)" {\\1 \\2}])

and this one removes the slash

@Evaluate([regsub -all {([a-z])/*([A-Z])} "@Value(tststr)" {\\1 \\2}])

(as the slash is not captured)

and maybe to explain it to Thamos

the substitution {\\1 \\2} is actually matchedsubpart1+space+matchedsubpart2

thomask
Author
71 replies
10 years ago
July 23, 2014

Many thanks to you guys for all the answers.

I'm not very firm in the string replacement field. Can somebody perhaps explain where I can enter the @Evaluate() functions with the Regsub? The creator doesn't have an input port, so I guess it's one of the string transformers...?

Kind regards

Thomas

+15

gio
Contributor
2252 replies
10 years ago
July 23, 2014

AttributeCreator

thomask
Author
71 replies
10 years ago
July 24, 2014

Many thanks. Works like a charm ,-)

+16

salvaleonrp
Enthusiast
126 replies
8 years ago
June 30, 2017

How about splitting only on the first word separated by a space between the first two words of a string: "word1 word2 word3"? So new_attribute1 = word1 and new_attribute2 = word2 word3.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Splitting Strings