Skip to main content

I have source data with a special character in it, a right arrow, that is:

When outputting to KML, Google Earth complains about this character. Notepad++ represents it as 'SUB'. It may be the Unicode Rightwards Arrow 8594. Whatever it is, I need to get rid of it. I've tried both the CharacterCodeReplacer and the StringReplacer without success. Is it possible?

Thanks

 

have you tried the TextEncoder (as XML) on the strings you are passing the kml writer?


have you tried the TextEncoder (as XML) on the strings you are passing the kml writer?

Yes, it doesn't change the character. I need something that can recognize that character and either remove it or change it to a hyphen.

 

 


If you copy and paste the character into the text editor of the string replacer it should work.


If you copy and paste the character into the text editor of the string replacer it should work.

Yes, that was one of the transformers that I tried. Although it shows the character in the info balloon, it doesn't change it in the output. Perhaps it is because I am using the FeatureReader and the input and output are dynamically chosen. The output attribute is just 'fme_feature_type'.

 

 


Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:


If you copy and paste the character into the text editor of the string replacer it should work.

Interestingly, I couldn't get this to work in FME 2017.

Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:

I couldn't get any a regular expression (including the above) to find or replace the that was posted in the message but the stringreplacer works fine in both 2016 and 2017 for me. I've created the test data in a text file though, if I try to feed it in via an attribute I get a fatal error.
I couldn't get any a regular expression (including the above) to find or replace the that was posted in the message but the stringreplacer works fine in both 2016 and 2017 for me. I've created the test data in a text file though, if I try to feed it in via an attribute I get a fatal error.
The little square that was posted in the message above isn't Unicode character 8594, it's a control character (ascii 26), it must've been mangled when posted.
The little square that was posted in the message above isn't Unicode character 8594, it's a control character (ascii 26), it must've been mangled when posted.
I see an arrow, no little square

 

 


I see an arrow, no little square

 

 

Weird, I either get the little square or nothing at all. Tried with 3 different browsers 🙂

The simple StringReplacer solution of pasting the right arrow into the Text To Match works when I read the data from the ESRI geodb reader and write with a KML writer. I think the other solutions proposed by @david_r, @rwhittington, and @egomm would work too if I knew what character the right arrow is ( I don't think it is 8594).

I think the problem is that the script uses a dynamic reader and writer. Thus the schema is not known by the transformers. All the StringReplacer knows is "fme_feature_type". I think it is too much for the StringReplacer or other transformers to find and replace characters in this case.

I will go back to the users and tell them, "no special characters in your input, please!" I think this is referred to as a PEBKAC problem (Problem Exists Between Keyboard And Chair).

Thanks for everyone's help.


Try the following regex in a StringReplacer, it will replace any character in a certain unicode range (in the below example it's the range 0800-FFFF) with the replacement text of your choice:

Note that to get this to work, you may need to set the encoding on your Reader. For example, when reading a CSV, set the Character Encoding to "Unicode 8-bit (utf-8)" in the Reader parameters otherwise the regex in the StringReplacer won't work.

 


The simple StringReplacer solution of pasting the right arrow into the Text To Match works when I read the data from the ESRI geodb reader and write with a KML writer. I think the other solutions proposed by @david_r, @rwhittington, and @egomm would work too if I knew what character the right arrow is ( I don't think it is 8594).

I think the problem is that the script uses a dynamic reader and writer. Thus the schema is not known by the transformers. All the StringReplacer knows is "fme_feature_type". I think it is too much for the StringReplacer or other transformers to find and replace characters in this case.

I will go back to the users and tell them, "no special characters in your input, please!" I think this is referred to as a PEBKAC problem (Problem Exists Between Keyboard And Chair).

Thanks for everyone's help.

I've heard it described as PICNIC: Problem In Chair, Not In Computer.

 


Having just dealt with this, I pasted string with same character to notepad++, saw "SUB", then checked in a hex editor, and saw it was hex character 1A, the SUBSTITUTE character, ascii 26, ctrl-z. To replace it in my attribute manager I just did this to swap the weird arrow for a greater than symbol to simulate the arrow with a printable character:

@ReplaceRegularExpression(@Value(description),"\\x{001A}",">")

 

Hope that helps others.

 

It would be nice to have a ToAscii(int) and ToNumber(char) function that would convert single characters so we could do @Replace(mystring, ToAscii(26), ">") instead of having to use regex with unicode or other conversions, but I guess Regex works.


Reply