Question

write utf-8 attribute PythonCaller

8 years ago
8 January 2016
4 replies
8 views

+1

jorge_vidinha
Contributor
30 replies

Hi,

Im reading a txt file inside a PythonCaller with some special charaters in.

on the close method when i print line i can see all charaters are fine in log file.

When i pyoutput(feature) encoding gets lost. see my bit of code below

Wondering if some one have an elegant solution to pyoutput attributes in correct encoding (utf-8) without make use o python 3 interpreter or needing to import external librarys.

Thanks all

import fmeobjects
import sys, os, csv, codecs


class FeatureCreator(object):
    def __init__(self):
        self.delimiter = '|'
        
    def input(self, feature):
        self.fileout = 'file.txt'
  
    def close(self):
        feature = fmeobjects.FMEFeature()
        reader= codecs.open(self.fileout)
       
        for line in reader:
            print line
            feature.setAttribute('output', line)
            self.pyoutput(feature)

4 replies

Userlevel 4

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David

+1

jorge_vidinha
Author
Contributor
30 replies
8 years ago
8 January 2016

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David

Thanks for the support David,

Yes file.txt its UTF-8 with out BOM

Using the unicode() i get the error below

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

Y

Hi,

when you import text in a code you should always decode it (when you know the encoding of the source) here is 'utf8':

line=line.decode('utf8')

and when you export text you should always encode it in the encoding of the target:

line=line.encode('the encoding of the table').

Hope it helps

Y

Hi

Is "file.txt" saved as UTF-8? If so, you could try to cast the "line" object as unicode before you set the attribute:

feature.setAttribute('output', unicode(line))

When sending text to the FME log window (using print of the FMELogFile-object), just remember that it will be output in cp1252 (or whatever your local codepage is) and not UTF-8!

David

Hi,

when you import text in a code you should always decode it (when you know the encoding of the source) here is 'utf8':

line=line.decode('utf8')

and when you export text you should always encode it in the encoding of the target:

line=line.encode('the encoding of the table').

Hope it helps

write utf-8 attribute PythonCaller

4 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded