Question

PDF reader characters


Badge +10

Hi, I am reading text from a pdf using the PDF reader and in the text I see xEF, xBF, xBE - what are these and how do I remove them.

Thank you


3 replies

Userlevel 4
Badge +30

Hi @olivermorris

 

Could you share a amount of this information?

 

Thanks,

Danilo

 

Userlevel 4

It could be either non-printable characters or unicode characters that aren't encodable in the active encoding. xEF could e.g. be an ï (letter i with diaeresis): http://www.fileformat.info/info/unicode/char/ef/index.htm

But without knowing the context it's hard to be certain.

You could try using an AttributeEncoder to see if that gives you the expected result.

 

Badge +10

 

Thanks for the help, an example below.I had some more of a search around and after adding:

String replacer and copying the black sections to replace they then were removed. As @david_r suggests I think they are just non printable characters.

All sorted, thanks

Reply