Skip to main content
Solved

How to Find Unicode Characters in Text String

  • August 16, 2024
  • 3 replies
  • 412 views

juliarozema
Contributor
Forum|alt.badge.img+7

We have some text fields that shouldn’t have any UNICODE characters.  Turns out at least one has snuck into the data.  Is there a way to Test to find UNICODE characters and then remove them?

FME 2022.2.3

Best answer by bwn

Since FME uses a PERL implementation of RegEx, then could use StringSearcher with RegEx to find where there is a string that has a match to a non-ASCII character, and the character positions.

RegEx Pattern looking for is [^[:ascii:]]

 



Gives

 

View original
Did this help you find an answer to your question?

3 replies

david_r
Evangelist
  • August 19, 2024

You could try the AttributeEncoder with “Replace invalid characters”=Yes to transform the string to e.g. Latin-1 (or whatever you require), then compare the string before and after so see if any extended Unicode characters where replaced/removed.


bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • Best Answer
  • August 20, 2024

Since FME uses a PERL implementation of RegEx, then could use StringSearcher with RegEx to find where there is a string that has a match to a non-ASCII character, and the character positions.

RegEx Pattern looking for is [^[:ascii:]]

 



Gives

 


juliarozema
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 22, 2024

Thank you @bwn this helps us see the special characters that have snuck into our data. 
@david_ Thank you for also taking the time to submit an idea.  Your idea works in conjunction with @bwn.

First use the StringSearcher to find a Match for the UNICODE characters.
Then use the Attribute Encoder to remove them.
Then use the StringSearcher (during development) to confirm the UNICODE characters are gone.

 

Thank you both :)


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings