Skip to main content
Dear Community,

 

 

we have a dataset with attribute values that contain german special characters (such as Ä, ä, ö).

 

 

I want to replace them with international characters, so I think the StringReplacer is the right transformer for this job. Since I do not want to copy as many transformers as I have special signs, I need to do it with a regular expression.

 

 

Can somebody give me a hint of how this expression has to look like:

 

 

We want to replace these 7 signs in a word: (Ä/ä, Ö/ö, Ü/ü, ß) with their corresponding values: A/a, O/o, U/u and ss

 

 

Kind regards

 

Thomas
Hi,

 

 

have a look at this previous post (http://fmepedia.safe.com/AnswersQuestionDetail?id=906a0000000ckT7AAI).

 

 

David
Thank you David. I have not tried the Python-Code, but the StringPairReplacer works fine. However, I need to create a new transformer for every attribute. Is there a way to do it in one just transformer? Probably the PythonCaller, but my knowledge of that is still very limited, so I would rather avoid using it ,-))

 

 

Kind regards

 

Thomas
Hi,

 

 

yes, the PythonCaller code I posted there supports multiple attributes. Just modify the line

 

 

attribute_list = ("name", "type", "state")

 

 

so that it lists the attributes you want to modify. Note that the attribute names are case sensitive.

 

 

David
Hi,

 

 

You can use a attributecreator and set the conidtionalcreator.

 

Make for each character one condition, in your example 7 conitions.

 

Use TCL like following.

 

 

conditions:

 

 

@Evaluate(vregexp -all {Ä} "@Value(string)"])=1

 

and its output value:

 

@Evaluate(@Evaluate((regsub -all {Ä} "@Value(string)" A string])>0?"$string":"@Value(string)")

 

 

@Evaluate( 

and its output value:

 

@Evaluate(@Evaluate(aregsub -all {ß} "@Value(string)" ss string])>0?"$string":"@Value(string)")

 

 

etc.etc.

 

 

Now u have one attributecreator to do the mapping.

 

 

 


Hi,

 

 

Just an additional info. If you are interested in Tcl script, I think "string map" procedure is worth to try. An AttributeCreator with this value setting works like the StringPairReplacer.

 

 @Evaluate(.string map {Ä A ä a Ö O ö o Ü U ü u ß ss} {@Value(source)}])
 Takashi
Thank you guys for your help!

 

 

The Tcl script works very well. Are there some explanations on safe.com where I can learn more about this "string map" thing, or this this pure Tcl?

 

 

If I try it with the Python script, I get the following error:

 

 

PythonFactory failed to load python symbol `FeatureProcessor'

 

Factory proxy not initialized

 

PythonFactory failed to process feature

 

PythonFactory failed to process feature

 

A fatal error has occurred. Check the logfile above for details

 

 

 import fmeobjects import unicodedata as ud def rmdiacritics(char):     '''     Return the base character of char, by "removing" any     diacritics like accents or curls and strokes and the like.     '''     desc = ud.name(unicode(char))     cutoff = desc.find(' WITH ')     if cutoff != -1:         desc = desc :cutoff]     return ud.lookup(desc)      def removeAccents(feature):     attribute_list = ("Trasse")     for attrib in feature.getAllAttributeNames():         if attrib in attribute_list:             value = feature.getAttribute(attrib)             if value:                 value = unicode(value)                 new_value = ''.join(mrmdiacritics(char) for char in value])                 feature.setAttribute(attrib, new_value)
 

 


Hi,

 

 

tcl functions etc.

 

 

http://www.tcl.tk/man/tcl8.4/Keywords

 

 

http://www.tcl.tk/man
Hi,

 

 

for the PythonCaller, you have to specify the name of the function that process your features. In the code I posted this is "removeAccents", so you need to setup the PythonCaller like this:

 

 

 

 

David
With this the Python function works. But as you wrote, the function is for accents and curls and so on. I guess I have to adapt it to the german characters, if I had more Python knowledge ,-)

 

 

Here, it makes Böschung to Boschung. But the right way would be Boeschung. (don't know if the forum displays the special signs).

 

 

But no need there, since the solution with the AttributeCreator works very well.

 

 

Many thanks again to all.

 

 

Kind regards

 

Thomas

 

 


Yeah, the approaches are rather different, in the end you pick what works for you :-)

 

 

The solution provided by Takashi was pretty neat given your constraints.

 

 

David
There are two character types for representing numbers in Japanese. One is Arabic number {1, 2, 3, 4, ...} same as ascii, another type is Kanji number {?, ?, ?, ?, ...} same as Chinese Hanzi.

 

Sometimes I need to convert between them, Tcl "string map" is convenient in such a case.

 

 

Strictly, there is one more type - 2 byte Arabic number {?, ?, ?, ?, ...}.

 

... idle talk.

Reply