Skip to main content

Morning, I've had some feedback that an XML file produced using FME ( the file can be downloaded here )

has created a problem whilst being parsed by Lxml. The error that's getting returned is "Unicode strings
with encoding declaration are not supported. Please use bytes input or XML
fragments without declaration." I've been told that this is referring to a non breaking space character.

The XML file is a list of recent planning applications and for planning case 15/0779 the <casetext> tag seems to contain the offending character after the full stop at the end of the casetext description.

I've tried the attributetrimmer to remove this but with no luck. Just wondering if anyone has come across this before and has worked out how to remove these characters.

Thanks

Actually I worked my own answer out to this one.

Using a stringreplacer I did a regex search for the unicode value of \\u00A0 . This is the code for a no-break space character. I left the replace with field blank so it just removes this character and doesn't replace anything.


Hi, I think this link is helpful >> Python unicode strings


Reply