Skip to main content
Question

Attribute's encoding check ?

  • November 12, 2015
  • 15 replies
  • 106 views

Forum|alt.badge.img
Hey,

 

I need to control the string encoding (utf-8) but i didn't find a transformer for.

 

Do you have a tip to test the encoding ?

 

Thanks for your help.

 

Alexy

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

15 replies

pratap
Contributor
Forum|alt.badge.img+12
  • Contributor
  • November 12, 2015
Hi,

 

 

Is it possible to explain further more

 

 

Pratap

Forum|alt.badge.img
  • Author
  • November 12, 2015
Of course,

 

I need to make sure that the attribute type is "string" and check its encoding to utf-8.

 

I can not find transformers to perform these actions.

 

Thanks in advance

 


takashi
Celebrity
  • November 12, 2015
Hi,

 

 

If you will change the type of attribute(s) to utf-8 unconditionally, the AttributeEncoder might help you.

 

However, there isn't a transformer to check if the type is "string". If you have to check that, consider using a Python script (PythonCaller). The "FMEFeature.getAttributeType" method returns an integer identifier indicating the internal attribute type of specified attribute.

 

 

Takashi

david_r
Celebrity
  • November 12, 2015
Hi

 


 


If you need to know if a particular attribute is unicode or not, you can do the following in a PythonCaller:

 


 



import fme
import fmeobjects
 
def FeatureProcessor(feature):
    s = feature.getAttribute('MyString') # Change attribute name as needed
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

 


David

takashi
Celebrity
  • November 12, 2015
Hi

 


 


If you need to know if a particular attribute is unicode or not, you can do the following in a PythonCaller:

 


 



import fme
import fmeobjects
 
def FeatureProcessor(feature):
    s = feature.getAttribute('MyString') # Change attribute name as needed
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

 


David

@david_r, aside, how did you paste the script into the code block? I tried a few times in other Q&A;, but the script will not be contained in the code block correctly anyway. See here. I gave up and reported it to Safe...

https://knowledge.safe.com/questions/19701/point-to-line-to-point-retain-original-attributes.html#answer-19789


david_r
Celebrity
  • November 12, 2015

@david_r, aside, how did you paste the script into the code block? I tried a few times in other Q&A;, but the script will not be contained in the code block correctly anyway. See here. I gave up and reported it to Safe...

https://knowledge.safe.com/questions/19701/point-to-line-to-point-retain-original-attributes.html#answer-19789

Yeah, that code block thingy isn't very good... Basically, you can't have any blank lines, or it'll mess it up.


david_r
Celebrity
  • November 12, 2015

Yeah, that code block thingy isn't very good... Basically, you can't have any blank lines, or it'll mess it up.

I found a workaround for the blank lines. You can toggle into HTML tag mode when editing your post and replace the blank lines with a <br> tag. Then you get two blank lines for the price of one ;-)


takashi
Celebrity
  • November 12, 2015

I found a workaround for the blank lines. You can toggle into HTML tag mode when editing your post and replace the blank lines with a <br> tag. Then you get two blank lines for the price of one ;-)

Great, thanks. I'll try it at the next chance. Hope the editor will be fixed as soon as possible. @mitahajirakar, @dewetvannieker


Forum|alt.badge.img
  • Author
  • November 12, 2015
Hi,

 

 

If you will change the type of attribute(s) to utf-8 unconditionally, the AttributeEncoder might help you.

 

However, there isn't a transformer to check if the type is "string". If you have to check that, consider using a Python script (PythonCaller). The "FMEFeature.getAttributeType" method returns an integer identifier indicating the internal attribute type of specified attribute.

 

 

Takashi

Tank you.

 

Takashi, do you know if a grid of interpretation of "feature.getAttributeType"'s result is available somewhere?

takashi
Celebrity
  • November 12, 2015

Tank you.

 

Takashi, do you know if a grid of interpretation of "feature.getAttributeType"'s result is available somewhere?

You can find required information in the API reference. Go to:

Knowledge Center home > FME Documentation > Python FME Objects API Reference


david_r
Celebrity
  • November 12, 2015

You can find required information in the API reference. Go to:

Knowledge Center home > FME Documentation > Python FME Objects API Reference

Link for the lazy: getAttributeType

But as you can tell, it won't help telling the difference between a unicode or a regular string. You will have to use the Python code below for that.


takashi
Celebrity
  • November 13, 2015

Link for the lazy: getAttributeType

But as you can tell, it won't help telling the difference between a unicode or a regular string. You will have to use the Python code below for that.

Thanks for the link for lazy :-)

I thought that FME_ATTR_STRING (=11) returned by the getAttributeType indicates "string". Was I wrong?


david_r
Celebrity
  • November 13, 2015

Link for the lazy: getAttributeType

But as you can tell, it won't help telling the difference between a unicode or a regular string. You will have to use the Python code below for that.

@takashi, you're right about FME_ATTR_STRING, of course. My point was that it won't tell you if the attribute is a string encoded to the current locale (e.g. cp1252 here in western Europe) or if it is in unicode (utf-8 or utf-16).


takashi
Celebrity
  • November 14, 2015

Link for the lazy: getAttributeType

But as you can tell, it won't help telling the difference between a unicode or a regular string. You will have to use the Python code below for that.

@david_r, thanks for the clarification. Yes, of course it's necessary to add some codes to check the encoding.


takashi
Celebrity
  • November 14, 2015

I found a workaround for the blank lines. You can toggle into HTML tag mode when editing your post and replace the blank lines with a <br> tag. Then you get two blank lines for the price of one ;-)

@mitahajirakar, thanks for your efforts. Regards.