Skip to main content
Question

PDF reader characters

  • March 6, 2019
  • 3 replies
  • 27 views

oliver.morris
Contributor
Forum|alt.badge.img+14

Hi, I am reading text from a pdf using the PDF reader and in the text I see xEF, xBF, xBE - what are these and how do I remove them.

Thank you

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

danilo_fme
Celebrity
Forum|alt.badge.img+51
  • Celebrity
  • 2077 replies
  • March 6, 2019

Hi @olivermorris

 

Could you share a amount of this information?

 

Thanks,

Danilo

 


david_r
Celebrity
  • 8394 replies
  • March 7, 2019

It could be either non-printable characters or unicode characters that aren't encodable in the active encoding. xEF could e.g. be an ï (letter i with diaeresis): http://www.fileformat.info/info/unicode/char/ef/index.htm

But without knowing the context it's hard to be certain.

You could try using an AttributeEncoder to see if that gives you the expected result.

 


oliver.morris
Contributor
Forum|alt.badge.img+14
  • Author
  • Contributor
  • 176 replies
  • March 7, 2019

 

Thanks for the help, an example below.I had some more of a search around and after adding:

String replacer and copying the black sections to replace they then were removed. As @david_r suggests I think they are just non printable characters.

All sorted, thanks