Hi, I am reading text from a pdf using the PDF reader and in the text I see xEF, xBF, xBE - what are these and how do I remove them.
Thank you
Hi, I am reading text from a pdf using the PDF reader and in the text I see xEF, xBF, xBE - what are these and how do I remove them.
Thank you
Hi @olivermorris
Could you share a amount of this information?
Thanks,
Danilo
It could be either non-printable characters or unicode characters that aren't encodable in the active encoding. xEF could e.g. be an ï (letter i with diaeresis): http://www.fileformat.info/info/unicode/char/ef/index.htm
But without knowing the context it's hard to be certain.
You could try using an AttributeEncoder to see if that gives you the expected result.
Thanks for the help, an example below.
String replacer and copying the black sections to replace they then were removed. As @david_r suggests I think they are just non printable characters.
All sorted, thanks