Skip to main content

We have some text fields that shouldn’t have any UNICODE characters.  Turns out at least one has snuck into the data.  Is there a way to Test to find UNICODE characters and then remove them?

FME 2022.2.3

You could try the AttributeEncoder with “Replace invalid characters”=Yes to transform the string to e.g. Latin-1 (or whatever you require), then compare the string before and after so see if any extended Unicode characters where replaced/removed.


Since FME uses a PERL implementation of RegEx, then could use StringSearcher with RegEx to find where there is a string that has a match to a non-ASCII character, and the character positions.

RegEx Pattern looking for is >^^:ascii:]]

 



Gives

 


Thank you @bwn this helps us see the special characters that have snuck into our data. 
@david_ Thank you for also taking the time to submit an idea.  Your idea works in conjunction with @bwn.

First use the StringSearcher to find a Match for the UNICODE characters.
Then use the Attribute Encoder to remove them.
Then use the StringSearcher (during development) to confirm the UNICODE characters are gone.

 

Thank you both :)


Reply