Question

String Replacer doesn't catch Unicode Character

  • 29 March 2018
  • 3 replies
  • 4 views

Badge

Hello all,

I have a problem with catching Turkish Unicode characters in StringReplacer and StringSearcher transformers. "\\w" function doesn't catch " Ç ç ? ? ? ? Ö ö ? ? Ü ü" letters. It works when I put these letters in the search bar. Also, I checked the same letters on www.regexpal.com and it didn't work too.

Is there any solution to catch these letters with "\\w"? Otherwise, I need to change these letters with other special characters which don't exist in Turkish Alphabet at the beginning and the end of the Workspace.

Thanks

Deniz


3 replies

Badge +16

Try \\X

From a quick read of PERL unicode support

If you can resort to Python it gets a bunch easier with the -U flag.

Badge

Hello @bruceharold

 

Thank you for your answer. \\X selects everything including whitespace. My strings include letters, whitespaces, dots and, numbers. I am going to select everything and, exclude numbers whitespaces and, dots. Also, thank you for the Python advice. All the doors open to Python, I should start to learn.

Thank you

Deniz

Userlevel 4

The "\w" metacharacter (usually) only matches the set [a-z, A-Z, 0-9, _], that's why your special characters aren't included. 

You best bet is to specify your own set, e.g.

[\wç?ü...etc...]

Reply