Skip to main content
Question

write utf-8 attribute PythonCaller

  • January 8, 2016
  • 4 replies
  • 53 views

jorge_vidinha
Contributor
Forum|alt.badge.img+2

Hi,

Im reading a txt file inside a PythonCaller with some special charaters in.

on the close method when i print line i can see all charaters are fine in log file.

When i pyoutput(feature) encoding gets lost. see my bit of code below 

Wondering if some one have an elegant solution to pyoutput attributes in correct encoding (utf-8) without make use o python 3 interpreter or needing to import external librarys.

Thanks all

import fmeobjects
import sys, os, csv, codecs


class FeatureCreator(object):
    def __init__(self):
        self.delimiter = '|'
        
    def input(self, feature):
        self.fileout = 'file.txt'
  
    def close(self):
        feature = fmeobjects.FMEFeature()
        reader= codecs.open(self.fileout)
       
        for line in reader:
            print line
            feature.setAttribute('output', line)
            self.pyoutput(feature) 
            
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

david_r
Celebrity
  • January 8, 2016

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David


jorge_vidinha
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • January 8, 2016

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David

Thanks for the support David,

Yes file.txt its UTF-8 with out BOM 

Using the unicode() i get the error below 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

Forum|alt.badge.img
  • January 11, 2016

Hi,

when you import text in a code you should always decode it (when you know the encoding of the source) here is 'utf8':

line=line.decode('utf8')

and when you export text you should always encode it in the encoding of the target:

line=line.encode('the encoding of the table').

Hope it helps


Forum|alt.badge.img
  • January 11, 2016

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David

Hi,

when you import text in a code you should always decode it (when you know  the encoding of the source) here is 'utf8':

line=line.decode('utf8')

and when you export text you should always encode it in the encoding of the target:

line=line.encode('the encoding of the table').

Hope it helps