Question

Is it possible to extract the character encoding of an attribute?

  • 10 August 2017
  • 6 replies
  • 35 views

Badge

Text attributes can have a character encoding in FME, as we all know. The encoding that is used is shown in the Data Inspector, for instance:

 

I would like to fetch that "iso-8895-1", "utf-8" or "windows-1252" value. My guess is that the answer is no but the question is: is it possible to extract the encoding somehow? I know that the FME Objects Python API allows me to detect if the attribute is an encoded string (FMEFeature.getAttributeType() ==> FME_ATTR_ENCODED_STRING), but it doesn't tell me what the encoding is. It seems to be stored as a (hidden) attribute property though, otherwise the Data Inspector could not show it.

Depending on the answer(s) I will get here, I'm thinking of posting an idea for an EncodingExtractor transformer.


6 replies

Userlevel 5
Badge +25

I thought the Schema reader would be able to do that but no (so that could be an idea too)

Userlevel 4

I'm curious, why do you need to know the encoding?

Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.

Badge

I'm curious, why do you need to know the encoding?

Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.

@david_r: I am reading an attribute with a PythonCaller. In the PythonCaller, I do some manipulations and concatenations and then I write out a new attribute. I would like that output attribute to have the same encoding as the input attribute.

 

However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:

 

  • Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
  • Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.
So I guess that even when it's possible to extract the encoding, it will not be possible to write it with a PythonCaller using that same encoding, unless Safe changes the API. However, if I knew the input encoding, I could properly set the encoding after the PythonCaller using an AttributeEncoder like you said (but set to "Use Bytes" for the Python 2 case).

 

Userlevel 4
@david_r: I am reading an attribute with a PythonCaller. In the PythonCaller, I do some manipulations and concatenations and then I write out a new attribute. I would like that output attribute to have the same encoding as the input attribute.

 

However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:

 

  • Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
  • Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.
So I guess that even when it's possible to extract the encoding, it will not be possible to write it with a PythonCaller using that same encoding, unless Safe changes the API. However, if I knew the input encoding, I could properly set the encoding after the PythonCaller using an AttributeEncoder like you said (but set to "Use Bytes" for the Python 2 case).

 

Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.

 

 

You may want to consider sending this to Safe support as well.
Badge
Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.

 

 

You may want to consider sending this to Safe support as well.
Done! :)

 

Split it into 2 ideas actually:

 

https://knowledge.safe.com/idea/50224/python-api-new-setattributetype-method-for-fmefeat.html

 

https://knowledge.safe.com/idea/50225/transformer-to-extract-attribute-character-encodin.html

 

 

Userlevel 4
Upvoted x2 :-)

Reply