I thought the Schema reader would be able to do that but no (so that could be an idea too)
I'm curious, why do you need to know the encoding?
Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.
I'm curious, why do you need to know the encoding?
Would using an AttributeEncoder set to honor the input encoding to convert the strings to e.g. "Unicode (utf-8)" work? Having a known encoding, it should be fairly easy to take it from there.
@david_r: I am reading an attribute with a PythonCaller. In the PythonCaller, I do some manipulations and concatenations and then I write out a new attribute. I would like that output attribute to have the same encoding as the input attribute.
However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:
- Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
- Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.
So I guess that even when it's possible to extract the encoding, it will not be possible to write it with a PythonCaller using that same encoding, unless Safe changes the API. However, if I knew the input encoding, I could properly set the encoding after the PythonCaller using an AttributeEncoder like you said (but set to "Use Bytes" for the Python 2 case).
@david_r: I am reading an attribute with a PythonCaller. In the PythonCaller, I do some manipulations and concatenations and then I write out a new attribute. I would like that output attribute to have the same encoding as the input attribute.
However, in order to preserve the encoding, I should also be able to specify the encoding when calling .setAttribute() on the feature, else it will be lost anyway:
- Using Python 2.*, the result attribute is written as a system encoded string (provided that the input is converted from Unicode to str first using the .encode('utf8') method on the Unicode object - although I would prefer to call .encode(<detected encoding>) instead).
- Using Python 3.*, that returns a bytes object instead of a Unicode object, the result attribute is always written as a UTF-8 encoded string, even if the input was encoded as something else.
So I guess that even when it's possible to extract the encoding, it will not be possible to write it with a PythonCaller using that same encoding, unless Safe changes the API. However, if I knew the input encoding, I could properly set the encoding after the PythonCaller using an AttributeEncoder like you said (but set to "Use Bytes" for the Python 2 case).
Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.
You may want to consider sending this to Safe support as well.
Thanks for the explanation, I see your point. Maybe post it as an idea? I'd vote for it.
You may want to consider sending this to Safe support as well.
Done! :)
Split it into 2 ideas actually:
https://knowledge.safe.com/idea/50224/python-api-new-setattributetype-method-for-fmefeat.html
https://knowledge.safe.com/idea/50225/transformer-to-extract-attribute-character-encodin.html