Question

Replace Unicode character in KML

7 years ago
August 1, 2017
14 replies
159 views

Anonymous

I have source data with a special character in it, a right arrow, that is:

When outputting to KML, Google Earth complains about this character. Notepad++ represents it as 'SUB'. It may be the Unicode Rightwards Arrow 8594. Whatever it is, I need to get rid of it. I've tried both the CharacterCodeReplacer and the StringReplacer without success. Is it possible?

Thanks

rwhittington
Contributor
14 replies
7 years ago
August 1, 2017

have you tried the TextEncoder (as XML) on the strings you are passing the kml writer?

Anonymous
0 replies
7 years ago
August 1, 2017

rwhittington wrote:

have you tried the TextEncoder (as XML) on the strings you are passing the kml writer?

Yes, it doesn't change the character. I need something that can recognize that character and either remove it or change it to a hyphen.

+39

ebygomm
Influencer
3313 replies
7 years ago
August 1, 2017

If you copy and paste the character into the text editor of the string replacer it should work.

Anonymous
0 replies
7 years ago
August 1, 2017

ebygomm wrote:

If you copy and paste the character into the text editor of the string replacer it should work.

Yes, that was one of the transformers that I tried. Although it shows the character in the info balloon, it doesn't change it in the output. Perhaps it is because I am using the FeatureReader and the input and output are dynamically chosen. The output attribute is just 'fme_feature_type'.

david_r
8355 replies
7 years ago
August 2, 2017

Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:

david_r
8355 replies
7 years ago
August 2, 2017

ebygomm wrote:

If you copy and paste the character into the text editor of the string replacer it should work.

Interestingly, I couldn't get this to work in FME 2017.

+39

ebygomm
Influencer
3313 replies
7 years ago
August 2, 2017

david_r wrote:

Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:

I couldn't get any a regular expression (including the above) to find or replace the that was posted in the message but the stringreplacer works fine in both 2016 and 2017 for me. I've created the test data in a text file though, if I try to feed it in via an attribute I get a fatal error.

david_r
8355 replies
7 years ago
August 2, 2017

ebygomm wrote:

The little square that was posted in the message above isn't Unicode character 8594, it's a control character (ascii 26), it must've been mangled when posted.

+39

ebygomm
Influencer
3313 replies
7 years ago
August 2, 2017

david_r wrote:

The little square that was posted in the message above isn't Unicode character 8594, it's a control character (ascii 26), it must've been mangled when posted.

I see an arrow, no little square

david_r
8355 replies
7 years ago
August 2, 2017

ebygomm wrote:

I see an arrow, no little square

Weird, I either get the little square or nothing at all. Tried with 3 different browsers :-)

Anonymous
0 replies
7 years ago
August 2, 2017

The simple StringReplacer solution of pasting the right arrow into the Text To Match works when I read the data from the ESRI geodb reader and write with a KML writer. I think the other solutions proposed by @david_r, @rwhittington, and @egomm would work too if I knew what character the right arrow is ( I don't think it is 8594).

I think the problem is that the script uses a dynamic reader and writer. Thus the schema is not known by the transformers. All the StringReplacer knows is "fme_feature_type". I think it is too much for the StringReplacer or other transformers to find and replace characters in this case.

I will go back to the users and tell them, "no special characters in your input, please!" I think this is referred to as a PEBKAC problem (Problem Exists Between Keyboard And Chair).

Thanks for everyone's help.

tim_wood
Contributor
311 replies
6 years ago
September 25, 2018

david_r wrote:

Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:

Note that to get this to work, you may need to set the encoding on your Reader. For example, when reading a CSV, set the Character Encoding to "Unicode 8-bit (utf-8)" in the Reader parameters otherwise the regex in the StringReplacer won't work.

tim_wood
Contributor
311 replies
6 years ago
September 25, 2018

jimo wrote:

I will go back to the users and tell them, "no special characters in your input, please!" I think this is referred to as a PEBKAC problem (Problem Exists Between Keyboard And Chair).

Thanks for everyone's help.

I've heard it described as PICNIC: Problem In Chair, Not In Computer.

lanthar
Contributor
9 replies
1 year ago
October 2, 2023

Having just dealt with this, I pasted string with same character to notepad++, saw "SUB", then checked in a hex editor, and saw it was hex character 1A, the SUBSTITUTE character, ascii 26, ctrl-z. To replace it in my attribute manager I just did this to swap the weird arrow for a greater than symbol to simulate the arrow with a printable character:

@ReplaceRegularExpression(@Value(description),"\\x{001A}",">")

Hope that helps others.

It would be nice to have a ToAscii(int) and ToNumber(char) function that would convert single characters so we could do @Replace(mystring, ToAscii(26), ">") instead of having to use regex with unicode or other conversions, but I guess Regex works.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Replace Unicode character in KML