Question

Accents in attributes

11 years ago
April 4, 2013
24 replies
332 views

jryan
15 replies

Is there an easy way to remove accents from characters that are contained in an attribute?

+17

fmelizard
Contributor
3725 replies
11 years ago
April 4, 2013

Hi,

The string searcher or the string pair replacer are a few options to consider.

+24

sigtill
Contributor
956 replies
11 years ago
April 4, 2013

All Transformers under the Category: Strings will be worth a look:

The real SigTill

david_r
8313 replies
11 years ago
April 4, 2013

Hi,

here is a more dynamic solution using a PythonCaller. Modify "attribute_list" (line 16, case sensitive) to include the names of the attributes you want to be checked for accents:

-----

import fmeobjects import unicodedata as ud def rmdiacritics(char): ''' Return the base character of char, by "removing" any diacritics like accents or curls and strokes and the like. ''' desc = ud.name(unicode(char)) cutoff = desc.find(' WITH ') if cutoff != -1: desc = desc[:cutoff] return ud.lookup(desc) def removeAccents(feature): attribute_list = ("name", "type", "state") for attrib in feature.getAllAttributeNames(): if attrib in attribute_list: value = feature.getAttribute(attrib) if value: value = unicode(value) new_value = ''.join([rmdiacritics(char) for char in value]) feature.setAttribute(attrib, new_value)

-----

You can also download the code here, in case the forum mangles the indents.

Example values before the PythonCaller:

Attribute(encoded: utf-8): `name' has value `François' Attribute(encoded: utf-8): `state' has value `Tørst' Attribute(encoded: utf-8): `type' has value `Salé'

After the PythonCaller:

Attribute(encoded: utf-16): `name' has value `Francois' Attribute(encoded: utf-16): `state' has value `Torst' Attribute(encoded: utf-16): `type' has value `Sale' David

david_r
8313 replies
11 years ago
April 4, 2013

Hi,

here is a more dynamic solution using a PythonCaller. Modify "attribute_list" (line 16, case sensitive) to include the names of the attributes you want to be checked for accents:

import fmeobjects
import unicodedata as ud

def rmdiacritics(char):
    '''
    Return the base character of char, by "removing" any
    diacritics like accents or curls and strokes and the like.
    '''
    desc = ud.name(unicode(char))
    cutoff = desc.find(' WITH ')
    if cutoff != -1:
        desc = desc[:cutoff]
    return ud.lookup(desc)
    
def removeAccents(feature):
    attribute_list = ("name", "type", "state") # Modify as needed
    for attrib in feature.getAllAttributeNames():
        if attrib in attribute_list:
            value = feature.getAttribute(attrib)
            if value:
                value = unicode(value)
                new_value = ''.join([rmdiacritics(char) for char in value])
                feature.setAttribute(attrib, new_value)

Example values before the PythonCaller:

Attribute(encoded: utf-8): `name' has value `François'
Attribute(encoded: utf-8): `state' has value `Tørst'
Attribute(encoded: utf-8): `type' has value `Salé'

After the PythonCaller:

Attribute(encoded: utf-16): `name' has value `Francois'
Attribute(encoded: utf-16): `state' has value `Torst'
Attribute(encoded: utf-16): `type' has value `Sale'

jryan
Author
15 replies
11 years ago
April 4, 2013

Thank David.

THis is exactly what i am looking for. However i am new with python scripts, and still stuck in FME2011 so it looks like this code will work with 2012. Any suggestions on how to implement in 2011?

Thanks!

david_r
8313 replies
11 years ago
April 4, 2013

Hi,

seems like a good opportunity to learn some Python, then :-)

Basically, replace

"fmeobjects" with "pyfme"
"getAttribute" with "getUnicodeString"
"setAttribute" with "setUnicodeString"

Untested, but I think that should be enough to get it running, I hope.

Also, take a look at the pyfme API doc in <fmedir>\\\\fmeobjects\\python\\apidocs\\index.html -- in particular the methods under FMEFeature.

David

jryan
Author
15 replies
11 years ago
April 4, 2013

Thanks again. yes agreed i need to learn this part of FME.

One last (and probably dumb) question. What symbol is used in the caller?

david_r
8313 replies
11 years ago
April 4, 2013

Hi,

not a dumb question when you're not familiar with the PythonCaller :-)

You should use "removeAccents" for the PythonCaller.

Hint: it's almost always the function ("def ...") that takes a parameter called "feature", which represents each feature object passed into the function. E.g.

def <name of function>(feature):

David

steph
1 reply
11 years ago
April 4, 2013

I use StringPairReplacer like this

Source Attribute: Name

Replacement Pairs: É E È E Ë E Ê E Ô O Ç C Â A ï I î I

Result Attributer: Name2

david_r
8313 replies
11 years ago
April 4, 2013

Hi Steph,

I agree, that is the most "native" FME solution, but it assumes that you're able to populate it with all the possible variants. If something isn't caught (like a sudden "Û" in your example), it will simply pass through and result in an error further down the line, where it might not be obvious what happened.

The beauty of the PythonCaller script is that it is a lot more future-proof, although it adds quite a bit of complexity, I must admit...

David

jryan
Author
15 replies
11 years ago
April 5, 2013

Thanks all for the responses. Agree that the native solution is probably the mosre proper, however my issue is that i do know all the possible input scenerios i may run into. The python script worked great. With that said, i have never used the stringpairreplacer so it is good to see how this one works as well.

jeff_konnen
9 replies
11 years ago
April 5, 2013

This function could be used in a PythonCaller:

 import unicodedata def remove_accents(input_str):     nkfd_form = unicodedata.normalize('NFKD', unicode(input_str))     only_ascii = nkfd_form.encode('ASCII', 'ignore')     return only_ascii

fme4ever
10 replies
9 years ago
January 27, 2016

david_r wrote:

Hi,

here is a more dynamic solution using a PythonCaller. Modify "attribute_list" (line 16, case sensitive) to include the names of the attributes you want to be checked for accents:

import fmeobjects
import unicodedata as ud

def rmdiacritics(char):
    '''
    Return the base character of char, by "removing" any
    diacritics like accents or curls and strokes and the like.
    '''
    desc = ud.name(unicode(char))
    cutoff = desc.find(' WITH ')
    if cutoff != -1:
        desc = desc[:cutoff]
    return ud.lookup(desc)
    
def removeAccents(feature):
    attribute_list = ("name", "type", "state") # Modify as needed
    for attrib in feature.getAllAttributeNames():
        if attrib in attribute_list:
            value = feature.getAttribute(attrib)
            if value:
                value = unicode(value)
                new_value = ''.join([rmdiacritics(char) for char in value])
                feature.setAttribute(attrib, new_value)

Example values before the PythonCaller:

Attribute(encoded: utf-8): `name' has value `François'
Attribute(encoded: utf-8): `state' has value `Tørst'
Attribute(encoded: utf-8): `type' has value `Salé'

After the PythonCaller:

Attribute(encoded: utf-16): `name' has value `Francois'
Attribute(encoded: utf-16): `state' has value `Torst'
Attribute(encoded: utf-16): `type' has value `Sale'

Thanks David, that solution is just awesome! I used it via a Python Caller and it worked perfectly fine!!

Cheers mate!

jeroenstiers
178 replies
8 years ago
October 5, 2016

I noticed that I regularly kept comming back to this question because of the provided code by @david_r.

I figured there probably are more people using this code so I converted it to a custom transformer:

https://hub.safe.com/transformers/stringcleaner

david_r
8313 replies
8 years ago
October 5, 2016

jeroenstiers wrote:

I noticed that I regularly kept comming back to this question because of the provided code by @david_r.

I figured there probably are more people using this code so I converted it to a custom transformer:

https://hub.safe.com/transformers/stringcleaner

Cool! Thanks for making it available to us all.

blucas
4 replies
6 years ago
October 25, 2018

jeroenstiers wrote:

I noticed that I regularly kept comming back to this question because of the provided code by @david_r.

I figured there probably are more people using this code so I converted it to a custom transformer:

https://hub.safe.com/transformers/stringcleaner

Does jeroenstiers have any plans to upgrade the "stringcleaner" custom transformer to Python 3.4? I'm using "stringcleaner" to clean Non-HTML characters (&, <, >, ") from the JSON data that I am reading before writing out to an AGOL feature layer. It works very well. Thanks for "stringcleaner" custome transformer. Otherwise, my data would fail writing to AGOL.

david_r
8313 replies
6 years ago
October 26, 2018

Here's the same code updated for Python 3.6, @jeroenstiers

import fmeobjects
import unicodedata as ud
 
def rmdiacritics(char):
    '''
    Return the base character of char, by "removing" any
    diacritics like accents or curls and strokes and the like.
    '''
    desc = ud.name(char)
    cutoff = desc.find(' WITH ')
    if cutoff != -1:
        desc = desc[:cutoff]
    return ud.lookup(desc)
    
def removeAccents(feature):
    attribute_list = ("name", "type", "state") # Modify as needed
    for attrib in feature.getAllAttributeNames():
        if attrib in attribute_list:
            value = feature.getAttribute(attrib)
            if value:
                value = str(value)
                new_value = ''.join([rmdiacritics(char) for char in value])
                feature.setAttribute(attrib, new_value)

blucas
4 replies
6 years ago
October 26, 2018

david_r wrote:

Here's the same code updated for Python 3.6, @jeroenstiers

import fmeobjects
import unicodedata as ud
 
def rmdiacritics(char):
    '''
    Return the base character of char, by "removing" any
    diacritics like accents or curls and strokes and the like.
    '''
    desc = ud.name(char)
    cutoff = desc.find(' WITH ')
    if cutoff != -1:
        desc = desc[:cutoff]
    return ud.lookup(desc)
    
def removeAccents(feature):
    attribute_list = ("name", "type", "state") # Modify as needed
    for attrib in feature.getAllAttributeNames():
        if attrib in attribute_list:
            value = feature.getAttribute(attrib)
            if value:
                value = str(value)
                new_value = ''.join([rmdiacritics(char) for char in value])
                feature.setAttribute(attrib, new_value)

Thanks for updating the code to Python 3.6. I'll put it to good use. @blucas

+17

philippeb
Enthusiast
289 replies
6 years ago
October 26, 2018

Use the StringPairReplacer and paste this string below into the Replacement Pairs parameter.

Create a custom transformer and use it easily in any workbench. I've created the AccentRemover just like that :D

It's only good for french though...

à a À A â a Â A ç c Ç C é e É E è e È E ê e Ê E ë e Ë E î i Î I ï i Ï I ô o Ô O ù u Ù U û u Û U ü u Ü U

FME Lover

david_r
8313 replies
6 years ago
October 26, 2018

philippeb wrote:

Use the StringPairReplacer and paste this string below into the Replacement Pairs parameter.

Create a custom transformer and use it easily in any workbench. I've created the AccentRemover just like that :D

It's only good for french though...

à a À A â a Â A ç c Ç C é e É E è e È E ê e Ê E ë e Ë E î i Î I ï i Ï I ô o Ô O ù u Ù U û u Û U ü u Ü U

I agree, it's a very nice solution if you know beforehand all the possible accents that you want to get rid of.

david_r
8313 replies
6 years ago
October 26, 2018

philippeb wrote:

Use the StringPairReplacer and paste this string below into the Replacement Pairs parameter.

Create a custom transformer and use it easily in any workbench. I've created the AccentRemover just like that :D

It's only good for french though...

à a À A â a Â A ç c Ç C é e É E è e È E ê e Ê E ë e Ë E î i Î I ï i Ï I ô o Ô O ù u Ù U û u Û U ü u Ü U

Oh, and by the way, if you wanted to keep it 7-bit safe, then I think you forgot the oe-ligature ;-)

+10

jackyd
Contributor
45 replies
2 years ago
June 28, 2022

david_r wrote:

Hi,

here is a more dynamic solution using a PythonCaller. Modify "attribute_list" (line 16, case sensitive) to include the names of the attributes you want to be checked for accents:

-----

You can also download the code here, in case the forum mangles the indents.

Example values before the PythonCaller:

Attribute(encoded: utf-8): `name' has value `François' Attribute(encoded: utf-8): `state' has value `Tørst' Attribute(encoded: utf-8): `type' has value `Salé'

After the PythonCaller:

Attribute(encoded: utf-16): `name' has value `Francois' Attribute(encoded: utf-16): `state' has value `Torst' Attribute(encoded: utf-16): `type' has value `Sale' David

HI David,

The download file no longer appears to available.

I tried the code above up I can't get it to work, and my python isn't strong enough to see where i have gone wrong. It runs but doesn't to remove the accents :(

Here is a screen shot of my python caller

accent remover python caller screen shot

david_r
8313 replies
2 years ago
June 29, 2022

jackyd wrote:

HI David,

The download file no longer appears to available.

I tried the code above up I can't get it to work, and my python isn't strong enough to see where i have gone wrong. It runs but doesn't to remove the accents :(

Here is a screen shot of my python caller

accent remover python caller screen shot

Try using the StringDiacriticRemover from the FME Hub in stead: https://hub.safe.com/publishers/safe-lab/transformers/stringdiacriticremover

+10

jackyd
Contributor
45 replies
2 years ago
June 29, 2022

jackyd wrote:

HI David,

The download file no longer appears to available.

I tried the code above up I can't get it to work, and my python isn't strong enough to see where i have gone wrong. It runs but doesn't to remove the accents :(

Here is a screen shot of my python caller

accent remover python caller screen shot

Thanks will do :)

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Accents in attributes

24 replies

Reply

Most helpful members this week

Recently Solved Questions

Print HTTPCaller

Completely suppress logging

FME Flow unavailable services

FME Flow Fanout Dataset Filename Issue

Excel Writer and column widths and text formatting

Community Stats

Cookie policy

Cookie settings

Reply

Related topics

New to Firm change templates to my nameicon

How can I combine existing templates?icon

Can you change a signer to a new person on multiple templates at the same time?icon

Template renaming with file nameicon

Added as a co-signericon

Most helpful members this week

Recently Solved Questions

Print HTTPCaller

Completely suppress logging

FME Flow unavailable services

FME Flow Fanout Dataset Filename Issue

Excel Writer and column widths and text formatting

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings