Hi All,
I have a string that ends with a couple of characters that cause PostGreSQL to fatally terminate the process when it receives them. Unfortunately I'm not certain that the string /always/ ends with these characters or I'd use the substringExtractor.
I get this error:
invalid byte sequence for encoding "UTF8": 0xc0 0x80
So what I'd like to do is trim them. The problem is - How to do that without resorting to Python?
I can't just copy/paste them into an attributeTrimmer because FME represents these with the special "replacement character" ? - https://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character - meaning what the trimmer is searching and looking to replace is that special unicode character (?), not the actual ones that I want to remove.
If I go into the source database and select the character in it's original encoding and paste that, it just pastes a space.
There's nothing in the AttributeTrimmer docs about this sort of trimming.
StringReplacer with a regexp and using a specific code point - https://www.regular-expressions.info/unicode.html#codepoint - doesn't work because FME explicitly doesn't allow the \\u modifier. Or the \\p modifier.
If I'm really desperate I can probably write some Python to do it, but I need to keep it fairly lightweight because this is going to be run for millions of features.
Does anyone have any suggestions for how to handle this?
Note: FME 2016.0
Thanks,
Jonathan