Skip to main content

Our company has FME Server installed for internal use only, so I have it set up to poll an Office365 e-mail inbox for new e-mails. The imap_publisher_content list only has one index of text/html, and the text in the body of the e-mail comes in with encoding issues. Example: Met with inspector and surveyed the two 2â€? electric conduit lines. I’️ll upload the photos to SharePoint.

 

I have attached the incoming JSON.

 

I am having issues in trying to get the text encoded correctly and not sure how to proceed.

 

This is also related to a recent post I made that I am still unable to fix or find out how to receive the imap_publisher_content list in plain/text through our online company Office365 exchange account.

https://knowledge.safe.com/questions/82234/fme-server-polling-imap.html

Hi @madwarren

could you please share how ...the two 2â€? electric conduit lines. I’️ll upload... should look like? The characters that are garbled were non-English characters, am I correct?

It looks like the string was tagged with a wrong encoding at some point. So far I am not sure when and why it might have happened.


Hi @madwarren

could you please share how ...the two 2â€? electric conduit lines. I’️ll upload... should look like? The characters that are garbled were non-English characters, am I correct?

It looks like the string was tagged with a wrong encoding at some point. So far I am not sure when and why it might have happened.

Sure @LenaAtSafe. The original text from the e-mail that was sent should be: Met with inspector and surveyed the two 2" electric conduit lines. I'll upload the photos to SharePoint.

In all of the googling that I could find it seems that it might be misinterpreting the encoding as windows-1252 instead of utf-8? I really don't know and have been having issues getting it to display correctly.

Thank you for any help with this! It has been a bit frustrating to figure out! lol.


Sure @LenaAtSafe. The original text from the e-mail that was sent should be: Met with inspector and surveyed the two 2" electric conduit lines. I'll upload the photos to SharePoint.

In all of the googling that I could find it seems that it might be misinterpreting the encoding as windows-1252 instead of utf-8? I really don't know and have been having issues getting it to display correctly.

Thank you for any help with this! It has been a bit frustrating to figure out! lol.

Yes, you are right, it looks like the string is assumed to be in Win-1252. However, there is RIGHT SINGLE QUOTATION MARK (U+2019) and RIGHT DOUBLE QUOTATION MARK (U+201D) in the string. These two Unicode characters are not supported by Win-1252 and are replaced with three Win-1252 characters each (where each byte is interpreted as a separate Win-1252 character).

This is an interesting discussion that sheds a lot of light on the problem: https://stackoverflow.com/questions/2477452/%C3%A2%E2%82%AC-showing-on-page-instead-of

And these are the two problem causing characters in the string: http://www.fileformat.info/info/unicode/char/2019/index.htm and https://www.fileformat.info/info/unicode/char/201d/index.htm

I am still not sure whether it is FME or Outlook problem. If it is an Outlook problem, you will need to use a Python fix suggested in the article above. If the problem is caused by FME, we will get it fixed. I am going to ping our FME Server experts and ask them to investigate the problem.


Yes, you are right, it looks like the string is assumed to be in Win-1252. However, there is RIGHT SINGLE QUOTATION MARK (U+2019) and RIGHT DOUBLE QUOTATION MARK (U+201D) in the string. These two Unicode characters are not supported by Win-1252 and are replaced with three Win-1252 characters each (where each byte is interpreted as a separate Win-1252 character).

This is an interesting discussion that sheds a lot of light on the problem: https://stackoverflow.com/questions/2477452/%C3%A2%E2%82%AC-showing-on-page-instead-of

And these are the two problem causing characters in the string: http://www.fileformat.info/info/unicode/char/2019/index.htm and https://www.fileformat.info/info/unicode/char/201d/index.htm

I am still not sure whether it is FME or Outlook problem. If it is an Outlook problem, you will need to use a Python fix suggested in the article above. If the problem is caused by FME, we will get it fixed. I am going to ping our FME Server experts and ask them to investigate the problem.

Thank you for the useful info @LenaAtSafe.

I've actually went through the stack overflow link before. The issue is that we have 60+ users sending e-mail to this e-mail address and having them force their encoding when they send e-mail isn't a possibility. I have also been using python to try and fix the encoding, but I haven't got it to 100% as it will occasionally show some invalid characters.

 

I'll keep at it and see what else I can find. I was just curious as to why the encoding was misinterpreted as it comes into FME Server from polling the inbox.


Thank you for the useful info @LenaAtSafe.

I've actually went through the stack overflow link before. The issue is that we have 60+ users sending e-mail to this e-mail address and having them force their encoding when they send e-mail isn't a possibility. I have also been using python to try and fix the encoding, but I haven't got it to 100% as it will occasionally show some invalid characters.

 

I'll keep at it and see what else I can find. I was just curious as to why the encoding was misinterpreted as it comes into FME Server from polling the inbox.

I hope we will find the root of the problem and will be able to fix it (unless it is something done by Outlook). Users should not need to do extra steps/set ups, and fixing the problem with a script is a patch, which I hope won't be needed in a long run.

Our Server experts are looking into this issue. I apologize for the inconvenience it causes at the moment.


I hope we will find the root of the problem and will be able to fix it (unless it is something done by Outlook). Users should not need to do extra steps/set ups, and fixing the problem with a script is a patch, which I hope won't be needed in a long run.

Our Server experts are looking into this issue. I apologize for the inconvenience it causes at the moment.

Thank you for looking into this for me! In the mean time I will use python to clean it up as best I can.


Reply