Skip to main content
Solved

JSONFragmenter UTF-8 Encoding Issue


Forum|alt.badge.img

I'm using a TEXT FILE reader to read in JSON so that the encoding can be set to UTF-8 but the JSONFragmenter that is directly connected is still complaining that there are invalid characters that are not UTF-8 compliant. These characters are compliant &-@$!~() . Anyone have a workaround for this issue?

Best answer by hollyatsafe

Hi @lynn_bryant,

For the Text File Reader this is a warning rather than an error so with the link provided you should be able to continue this translation without any problems. However I can reproduce the warning in FME Desktop and can see it is complaining about the File Name rather than the data itself so I do not believe this should have any impact on your workflow. You can confirm this by saving the data as a text file first without any of those characters and this warning is no longer in the log. Having said that I do believe this is a bug because those characters are valid in UTF-8 encoding so have filed FMEENGINE-59626 to get this corrected.

With regard to the JSONFragmenter warning this is because the HTTP response is not actually UTF-8 but is actually windows-1252 encoding so although most of the characters will be the same, some might be incorrect. The JSONFragmenter should be able to handle this so I have filed FMEENGINE-59627. In the meantime the workaround to remove this warning from the log is by using the HTTPCaller to download the file to an attribute as this should handle the encoding correctly (this will also remove the need for the Text Reader that was causing problems as well).

 

EDIT: Both these issues have been fixed for 2019.1.

View original
Did this help you find an answer to your question?

10 replies

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • March 26, 2019

Can you provide a sample? I can't reproduce the problem.


david_r
Evangelist
  • March 27, 2019

Are you 100% certain that the file is in fact encoded as utf-8?

If you can open the file in e.g. Notepad++ you can look in the Encoding menu to verify the encoding, it should look like this:

Without changing anything in the Encoding menu, verify that the special characters are properly displayed.

If either of those two checks fail, there's an encoding problem in the input file.


Forum|alt.badge.img

JDH - Thanks for your response. This is the public link being used: https://poweroutage.us/api/getcountyoutageinfo/ATT928hoi71!789$k2F


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 27, 2019
lynn_bryant wrote:

JDH - Thanks for your response. This is the public link being used: https://poweroutage.us/api/getcountyoutageinfo/ATT928hoi71!789$k2F

What are your settings in the JSONfragmenter?


Forum|alt.badge.img

Here are the settings being used


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • March 27, 2019
lynn_bryant wrote:

Here are the settings being used

I cannot recreate the error


Forum|alt.badge.img

These are the errors reported out from FME Server:

Text File Reader: Renamed attribute 'ATT928hoi71!789$k2F' to 'ATT928hoi71_789_k2F' to remove the following invalid characters '&-@$!~()'JSONFragmenter(JSONQueryFactory): A sequence of bytes was found which is invalid in the utf-8 encoding.

Forum|alt.badge.img+2
  • Best Answer
  • March 27, 2019

Hi @lynn_bryant,

For the Text File Reader this is a warning rather than an error so with the link provided you should be able to continue this translation without any problems. However I can reproduce the warning in FME Desktop and can see it is complaining about the File Name rather than the data itself so I do not believe this should have any impact on your workflow. You can confirm this by saving the data as a text file first without any of those characters and this warning is no longer in the log. Having said that I do believe this is a bug because those characters are valid in UTF-8 encoding so have filed FMEENGINE-59626 to get this corrected.

With regard to the JSONFragmenter warning this is because the HTTP response is not actually UTF-8 but is actually windows-1252 encoding so although most of the characters will be the same, some might be incorrect. The JSONFragmenter should be able to handle this so I have filed FMEENGINE-59627. In the meantime the workaround to remove this warning from the log is by using the HTTPCaller to download the file to an attribute as this should handle the encoding correctly (this will also remove the need for the Text Reader that was causing problems as well).

 

EDIT: Both these issues have been fixed for 2019.1.


Forum|alt.badge.img
hollyatsafe wrote:

Hi @lynn_bryant,

For the Text File Reader this is a warning rather than an error so with the link provided you should be able to continue this translation without any problems. However I can reproduce the warning in FME Desktop and can see it is complaining about the File Name rather than the data itself so I do not believe this should have any impact on your workflow. You can confirm this by saving the data as a text file first without any of those characters and this warning is no longer in the log. Having said that I do believe this is a bug because those characters are valid in UTF-8 encoding so have filed FMEENGINE-59626 to get this corrected.

With regard to the JSONFragmenter warning this is because the HTTP response is not actually UTF-8 but is actually windows-1252 encoding so although most of the characters will be the same, some might be incorrect. The JSONFragmenter should be able to handle this so I have filed FMEENGINE-59627. In the meantime the workaround to remove this warning from the log is by using the HTTPCaller to download the file to an attribute as this should handle the encoding correctly (this will also remove the need for the Text Reader that was causing problems as well).

 

EDIT: Both these issues have been fixed for 2019.1.

Thanks Holly!


Forum|alt.badge.img
david_r wrote:

Are you 100% certain that the file is in fact encoded as utf-8?

If you can open the file in e.g. Notepad++ you can look in the Encoding menu to verify the encoding, it should look like this:

Without changing anything in the Encoding menu, verify that the special characters are properly displayed.

If either of those two checks fail, there's an encoding problem in the input file.

Thanks David. Looks like I found a bug!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings