Skip to main content

HI All,

First of all a Happy and productive new year to all!

I initially used SQL to filter and map these characters.

Using this set:

regexp_instr (np.GESLACHTS_NAAM,'[????Š???Ž?????š???ž??ÁA?Ä??Ç?É?Ë?ÍÎ????ÓÔ?Ö×??Ú?ÜÝ?ß?áâ?ä??ç?é?ë?íî????óô?ö??ú?üý?]')

But FME, wich is supposed to send the SQL as is to the SQL-engine makes it into this: (i have to post pic, because upon insertion of text in this box the arrows disappears. Tabs I presume)

So I went tcl and made a stringmapper in attributecreator:

astring map {

 

? A ? L ? L ? S Š S ? S ? T ? Z Ž Z ? Z ? a ? l ? l ? s š s ? s ? t ? z ž z ? z ? R Á A A A ? A Ä A ? L ? C Ç C ? C É E ? E Ë E ? E Í I Î I ? D ? D ? N ? N Ó O Ô O ? O Ö O × x ? R ? U Ú U ? U Ü U Ý Y ? T ß s ? r á a â a ? a ä a ? l ? c ç c ? c é e ? e ë e ? e í i î i ? d ? d ? n ? n ó o ô o ? o ö o ? r ? u ú u ? u ü u ý y ? t

 

} ) }]

But maps to :arrows like in pic.

Whereas a TCL creator using:

proc GSN {} {

 

set MapString {? A ? L ? L ? S Š S ? S ? T ? Z Ž Z ? Z ? a ? l ? l ? s š s ? s ? t ? z ž z ? z ? R Á A A A ? A Ä A ? L ? C Ç C ? C É E ? E Ë E ? E M' M Í I Î I ? D ? D ? N ? N Ó O Ô O ? O Ö O × x ? R ? U Ú U ? U Ü U Ý Y ? T ß s ? r á a â a ? a ä a ? l ? c ç c ? c ? g é e è e ? e ë e ? e í i î i ï i ? d ? d ? n ? n ó o ô o ? o ö o ? r ? u ú u ? u ü u ý y ? t}

 

FME_SetAttribute GESNAM string map $MapString FME_GetAttribute GESNAM]]

 

}

works.

I somehow think it should work in SQL, QAttributeCreator as it does in the TCL.

Anyone tips on this?

ps.

Normalisation (::unicode::normalizeS)works trough a TCL caller but not trough a AttributeCreator (due to Unicode require is assume). Tough i know not (yet) how to extract the base characters.

Based on your findings, I suspect that the SQLCaller (or maybe even your database client drivers) transmits the SQL queries in the local codepage and not using Unicode. This means that if you enter characters in the SQLCaller that does not exist in your local codepage, they will be "truncated" when transmitted to the database.

Since it looks like you're using Oracle, you can work around this issue using the UNISTR()-function, e.g.

select unistr('n\0153uds') from dual;

will return 

nœuds

You can use the overview at unicode-table.com to find the necessary values. This technique should be safe for all client character sets, as the Unicode value isn't interpreted before it has arrived at the server.

David


Based on your findings, I suspect that the SQLCaller (or maybe even your database client drivers) transmits the SQL queries in the local codepage and not using Unicode. This means that if you enter characters in the SQLCaller that does not exist in your local codepage, they will be "truncated" when transmitted to the database.

Since it looks like you're using Oracle, you can work around this issue using the UNISTR()-function, e.g.

select unistr('n\0153uds') from dual;

will return 

nœuds

You can use the overview at unicode-table.com to find the necessary values. This technique should be safe for all client character sets, as the Unicode value isn't interpreted before it has arrived at the server.

David

If it fits your scenario, there is also a pretty neat way of removing all accents using Python, see here for some sample code to be used in a PythonCaller.


Hi David,

When i execute it using SQL DEveloper there is no problem.

Only when i se the SQL CAller or SQL Executor.

Here is a pic with the above line uncommented:

I will try the python diacritic remover (who needs a critic anyway?..)

And what is your view on the TCL function?

WHy does it fail in the AttributeCreator but works in a tcl caller? Is it not a fme internal translation?

tx.


Hi David,

When i execute it using SQL DEveloper there is no problem.

Only when i se the SQL CAller or SQL Executor.

Here is a pic with the above line uncommented:

I will try the python diacritic remover (who needs a critic anyway?..)

And what is your view on the TCL function?

WHy does it fail in the AttributeCreator but works in a tcl caller? Is it not a fme internal translation?

tx.

I couldn't tell for sure without having access to FME source code, but I suspect it is a mix of unicode and local codepage strings in the code that is messing up your special characters before they get passed on to Oracle. This is, unfortunately, not uncommon in legacy systems where a lot of the code was written before unicode was as wide spread as it is today.

Since sqldeveloper is written in Java, I would expect this to not be a problem there (all strings are unicode by default in Java). If I'm not mistaken that is also the case in TCL, which might explain why the TCLCaller works and the AttributeCreator doesn't.

Regarding the TCL function you'll have to get somebody elses (@takashi?) opinion , I never bothered to learn it myself ;-)


Hi David,

When i execute it using SQL DEveloper there is no problem.

Only when i se the SQL CAller or SQL Executor.

Here is a pic with the above line uncommented:

I will try the python diacritic remover (who needs a critic anyway?..)

And what is your view on the TCL function?

WHy does it fail in the AttributeCreator but works in a tcl caller? Is it not a fme internal translation?

tx.

Hi @gio, Happy New Year!

The @Evaluate function has the ability to perform a Tcl expression that is written as a parameter for the AttributeCreator and some transformers. Although this fact is not documented anywhere, the ability can be used conveniently in some cases. However, I think that the @Evaluate cannot treat encoding of strings correctly.That is, Tcl uses UTF-8 but the @Evaluate seems to assume the resulting value as an ASCII string. I guess it's the reason why the AttributeCreator doesn't work in this case.

Since the @Evaluate has been designed in order to perform math operations, I think the ability to perform a Tcl string expression should be considered as just a "fluke".


Hi @gio

you could read the values as they are and then use StringPairReplacer transformer to replace the diacritics. The Replacement Pairs in StringPairReplacer can be turned into a user parameter and then used again and again if you have multiple attributes to process.


I couldn't tell for sure without having access to FME source code, but I suspect it is a mix of unicode and local codepage strings in the code that is messing up your special characters before they get passed on to Oracle. This is, unfortunately, not uncommon in legacy systems where a lot of the code was written before unicode was as wide spread as it is today.

Since sqldeveloper is written in Java, I would expect this to not be a problem there (all strings are unicode by default in Java). If I'm not mistaken that is also the case in TCL, which might explain why the TCLCaller works and the AttributeCreator doesn't.

Regarding the TCL function you'll have to get somebody elses (@takashi?) opinion , I never bothered to learn it myself ;-)

@david_r is correct -- FME itself internally is a mix of code that knows unicode and code that doesn't. In general, all the attribute handling is completely unicode smart *within* functions, but when we assign values back to attributes (as in the AttributeCreator) we sometimes go through "system" encoding. So if you are on a system that didn't know those characters, you get problems.

Doing the attribute assignment all in Tcl will work just fine, because that world stays in Unicode.

For FME 2016.1/FME 2017 the internal plumbing is getting widened out so that your AttributeCreator will do the right thing, but until then, you're better doing such attribute creation in Tcl if you need to assign characters that are not part of your Windows computer's encoding.

Note -- if you use Mac or Linux, you don't have this issue at all. Why? Because the "system encoding" *is* unicode -- utf-8. So everything just works. If only Windows would have a unicode system encoding, none of these issues would happen...


Reply