Question

Is it possible to extract the character encoding of an attribute?

6 years ago
10 August 2017
6 replies
35 views

geosander
327 replies

Text attributes can have a character encoding in FME, as we all know. The encoding that is used is shown in the Data Inspector, for instance:

I would like to fetch that "iso-8895-1", "utf-8" or "windows-1252" value. My guess is that the answer is no but the question is: is it possible to extract the encoding somehow? I know that the FME Objects Python API allows me to detect if the attribute is an encoded string (FMEFeature.getAttributeType() ==> FME_ATTR_ENCODED_STRING), but it doesn't tell me what the encoding is. It seems to be stored as a (hidden) attribute property though, otherwise the Data Inspector could not show it.

Depending on the answer(s) I will get here, I'm thinking of posting an idea for an EncodingExtractor transformer.

6 replies

Userlevel 5

+25

redgeographics
Influencer
3339 replies
6 years ago
10 August 2017

I thought the Schema reader would be able to do that but no (so that could be an idea too)

Userlevel 4

I'm curious, why do you need to know the encoding?

Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.

I'm curious, why do you need to know the encoding?

Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.

@david_r: I am reading an attribute with a PythonCaller. In the PythonCaller, I do some manipulations and concatenations and then I write out a new attribute. I would like that output attribute to have the same encoding as the input attribute.

However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:

Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.

So I guess that even when it's possible to extract the encoding, it will not be possible to write it with a PythonCaller using that same encoding, unless Safe changes the API. However, if I knew the input encoding, I could properly set the encoding after the PythonCaller using an AttributeEncoder like you said (but set to "Use Bytes" for the Python 2 case).

Userlevel 4

However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:

Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.

Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.

You may want to consider sending this to Safe support as well.

Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.

You may want to consider sending this to Safe support as well.

Done! :)

Split it into 2 ideas actually:

https://knowledge.safe.com/idea/50224/python-api-new-setattributetype-method-for-fmefeat.html

https://knowledge.safe.com/idea/50225/transformer-to-extract-attribute-character-encodin.html

Userlevel 4

Done! :)

Split it into 2 ideas actually:

https://knowledge.safe.com/idea/50224/python-api-new-setattributetype-method-for-fmefeat.html

https://knowledge.safe.com/idea/50225/transformer-to-extract-attribute-character-encodin.html

Upvoted x2 :-)

Is it possible to extract the character encoding of an attribute?

6 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded