Question

String Replacer doesn't catch Unicode Character

7 years ago
March 29, 2018
3 replies
34 views

denizturan1985
Participant
11 replies

Hello all,

I have a problem with catching Turkish Unicode characters in StringReplacer and StringSearcher transformers. "\\w" function doesn't catch " Ç ç ? ? ? ? Ö ö ? ? Ü ü" letters. It works when I put these letters in the search bar. Also, I checked the same letters on www.regexpal.com and it didn't work too.

Is there any solution to catch these letters with "\\w"? Otherwise, I need to change these letters with other special characters which don't exist in Turkish Alphabet at the beginning and the end of the Workspace.

Thanks

Deniz

+17

bruceharold
Contributor
338 replies
7 years ago
March 29, 2018

Try \\X

From a quick read of PERL unicode support

If you can resort to Python it gets a bunch easier with the -U flag.

denizturan1985
Author
Participant
11 replies
7 years ago
March 30, 2018

Hello @bruceharold

Thank you for your answer. \\X selects everything including whitespace. My strings include letters, whitespaces, dots and, numbers. I am going to select everything and, exclude numbers whitespaces and, dots. Also, thank you for the Python advice. All the doors open to Python, I should start to learn.

Thank you

Deniz

david_r
8355 replies
7 years ago
April 3, 2018

The "\w" metacharacter (usually) only matches the set [a-z, A-Z, 0-9, _], that's why your special characters aren't included.

You best bet is to specify your own set, e.g.

[\wç?ü...etc...]

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

String Replacer doesn't catch Unicode Character