We have some text fields that shouldn’t have any UNICODE characters. Turns out at least one has snuck into the data. Is there a way to Test to find UNICODE characters and then remove them?
FME 2022.2.3
We have some text fields that shouldn’t have any UNICODE characters. Turns out at least one has snuck into the data. Is there a way to Test to find UNICODE characters and then remove them?
FME 2022.2.3
You could try the AttributeEncoder with “Replace invalid characters”=Yes to transform the string to e.g. Latin-1 (or whatever you require), then compare the string before and after so see if any extended Unicode characters where replaced/removed.
Since FME uses a PERL implementation of RegEx, then could use StringSearcher with RegEx to find where there is a string that has a match to a non-ASCII character, and the character positions.
RegEx Pattern looking for is >^^:ascii:]]
Gives
Thank you
First use the StringSearcher to find a Match for the UNICODE characters.
Then use the Attribute Encoder to remove them.
Then use the StringSearcher (during development) to confirm the UNICODE characters are gone.
Thank you both :)