Question

Regex in StringReplacer for special characters

11 years ago
June 4, 2014
11 replies
465 views

thomask
71 replies

Dear Community,

we have a dataset with attribute values that contain german special characters (such as Ä, ä, ö).

I want to replace them with international characters, so I think the StringReplacer is the right transformer for this job. Since I do not want to copy as many transformers as I have special signs, I need to do it with a regular expression.

Can somebody give me a hint of how this expression has to look like:

We want to replace these 7 signs in a word: (Ä/ä, Ö/ö, Ü/ü, ß) with their corresponding values: A/a, O/o, U/u and ss

Kind regards

Thomas

david_r
8354 replies
11 years ago
June 4, 2014

Hi,

have a look at this previous post (http://fmepedia.safe.com/AnswersQuestionDetail?id=906a0000000ckT7AAI).

David

thomask
Author
71 replies
11 years ago
June 4, 2014

Thank you David. I have not tried the Python-Code, but the StringPairReplacer works fine. However, I need to create a new transformer for every attribute. Is there a way to do it in one just transformer? Probably the PythonCaller, but my knowledge of that is still very limited, so I would rather avoid using it ,-))

Kind regards

Thomas

david_r
8354 replies
11 years ago
June 4, 2014

Hi,

yes, the PythonCaller code I posted there supports multiple attributes. Just modify the line

attribute_list = ("name", "type", "state")

so that it lists the attributes you want to modify. Note that the attribute names are case sensitive.

David

+15

gio
Contributor
2252 replies
11 years ago
June 4, 2014

Hi,

You can use a attributecreator and set the conidtionalcreator.

Make for each character one condition, in your example 7 conitions.

Use TCL like following.

conditions:

@Evaluate([regexp -all {Ä} "@Value(string)"])=1

and its output value:

@Evaluate(@Evaluate([regsub -all {Ä} "@Value(string)" A string])>0?"$string":"@Value(string)")

@Evaluate([regexp -all {ß} "@Value(string)"])=1

and its output value:

@Evaluate(@Evaluate([regsub -all {ß} "@Value(string)" ss string])>0?"$string":"@Value(string)")

etc.etc.

Now u have one attributecreator to do the mapping.

takashi
7706 replies
11 years ago
June 4, 2014

Hi,

Just an additional info. If you are interested in Tcl script, I think "string map" procedure is worth to try. An AttributeCreator with this value setting works like the StringPairReplacer.

 @Evaluate([string map {Ä A ä a Ö O ö o Ü U ü u ß ss} {@Value(source)}])

Takashi

thomask
Author
71 replies
11 years ago
June 5, 2014

Thank you guys for your help!

The Tcl script works very well. Are there some explanations on safe.com where I can learn more about this "string map" thing, or this this pure Tcl?

If I try it with the Python script, I get the following error:

PythonFactory failed to load python symbol `FeatureProcessor'

Factory proxy not initialized

PythonFactory failed to process feature

A fatal error has occurred. Check the logfile above for details

 import fmeobjects import unicodedata as ud def rmdiacritics(char):     '''     Return the base character of char, by "removing" any     diacritics like accents or curls and strokes and the like.     '''     desc = ud.name(unicode(char))     cutoff = desc.find(' WITH ')     if cutoff != -1:         desc = desc[:cutoff]     return ud.lookup(desc)      def removeAccents(feature):     attribute_list = ("Trasse")     for attrib in feature.getAllAttributeNames():         if attrib in attribute_list:             value = feature.getAttribute(attrib)             if value:                 value = unicode(value)                 new_value = ''.join([rmdiacritics(char) for char in value])                 feature.setAttribute(attrib, new_value)

+15

gio
Contributor
2252 replies
11 years ago
June 5, 2014

Hi,

tcl functions etc.

http://www.tcl.tk/man/tcl8.4/Keywords

http://www.tcl.tk/man

david_r
8354 replies
11 years ago
June 5, 2014

Hi,

for the PythonCaller, you have to specify the name of the function that process your features. In the code I posted this is "removeAccents", so you need to setup the PythonCaller like this:

David

thomask
Author
71 replies
11 years ago
June 5, 2014

With this the Python function works. But as you wrote, the function is for accents and curls and so on. I guess I have to adapt it to the german characters, if I had more Python knowledge ,-)

Here, it makes Böschung to Boschung. But the right way would be Boeschung. (don't know if the forum displays the special signs).

But no need there, since the solution with the AttributeCreator works very well.

Many thanks again to all.

Kind regards

Thomas

david_r
8354 replies
11 years ago
June 5, 2014

Yeah, the approaches are rather different, in the end you pick what works for you :-)

The solution provided by Takashi was pretty neat given your constraints.

David

takashi
7706 replies
11 years ago
June 5, 2014

There are two character types for representing numbers in Japanese. One is Arabic number {1, 2, 3, 4, ...} same as ascii, another type is Kanji number {?, ?, ?, ?, ...} same as Chinese Hanzi.

Sometimes I need to convert between them, Tcl "string map" is convenient in such a case.

Strictly, there is one more type - 2 byte Arabic number {?, ?, ?, ?, ...}.

... idle talk.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Regex in StringReplacer for special characters