Skip to main content

Hi, I am reading text from a pdf using the PDF reader and in the text I see xEF, xBF, xBE - what are these and how do I remove them.

Thank you

Hi @olivermorris

 

Could you share a amount of this information?

 

Thanks,

Danilo

 


It could be either non-printable characters or unicode characters that aren't encodable in the active encoding. xEF could e.g. be an ï (letter i with diaeresis): http://www.fileformat.info/info/unicode/char/ef/index.htm

But without knowing the context it's hard to be certain.

You could try using an AttributeEncoder to see if that gives you the expected result.

 


 

Thanks for the help, an example below.I had some more of a search around and after adding:

String replacer and copying the black sections to replace they then were removed. As @david_r suggests I think they are just non printable characters.

All sorted, thanks


Reply