Solved

JSONFragmenter UTF-8 Encoding Issue


Badge

I'm using a TEXT FILE reader to read in JSON so that the encoding can be set to UTF-8 but the JSONFragmenter that is directly connected is still complaining that there are invalid characters that are not UTF-8 compliant. These characters are compliant &-@$!~() . Anyone have a workaround for this issue?

icon

Best answer by hollyatsafe 28 March 2019, 00:56

View original

10 replies

Badge +22

Can you provide a sample? I can't reproduce the problem.

Userlevel 4

Are you 100% certain that the file is in fact encoded as utf-8?

If you can open the file in e.g. Notepad++ you can look in the Encoding menu to verify the encoding, it should look like this:

Without changing anything in the Encoding menu, verify that the special characters are properly displayed.

If either of those two checks fail, there's an encoding problem in the input file.

Badge

JDH - Thanks for your response. This is the public link being used: https://poweroutage.us/api/getcountyoutageinfo/ATT928hoi71!789$k2F

Badge +10

JDH - Thanks for your response. This is the public link being used: https://poweroutage.us/api/getcountyoutageinfo/ATT928hoi71!789$k2F

What are your settings in the JSONfragmenter?

Badge

Here are the settings being used

Badge +10

Here are the settings being used

I cannot recreate the error

Badge

These are the errors reported out from FME Server:

Text File Reader: Renamed attribute 'ATT928hoi71!789$k2F' to 'ATT928hoi71_789_k2F' to remove the following invalid characters '&-@$!~()'JSONFragmenter(JSONQueryFactory): A sequence of bytes was found which is invalid in the utf-8 encoding.
Badge +2

Hi @lynn_bryant,

For the Text File Reader this is a warning rather than an error so with the link provided you should be able to continue this translation without any problems. However I can reproduce the warning in FME Desktop and can see it is complaining about the File Name rather than the data itself so I do not believe this should have any impact on your workflow. You can confirm this by saving the data as a text file first without any of those characters and this warning is no longer in the log. Having said that I do believe this is a bug because those characters are valid in UTF-8 encoding so have filed FMEENGINE-59626 to get this corrected.

With regard to the JSONFragmenter warning this is because the HTTP response is not actually UTF-8 but is actually windows-1252 encoding so although most of the characters will be the same, some might be incorrect. The JSONFragmenter should be able to handle this so I have filed FMEENGINE-59627. In the meantime the workaround to remove this warning from the log is by using the HTTPCaller to download the file to an attribute as this should handle the encoding correctly (this will also remove the need for the Text Reader that was causing problems as well).

 

EDIT: Both these issues have been fixed for 2019.1.

Badge

Hi @lynn_bryant,

For the Text File Reader this is a warning rather than an error so with the link provided you should be able to continue this translation without any problems. However I can reproduce the warning in FME Desktop and can see it is complaining about the File Name rather than the data itself so I do not believe this should have any impact on your workflow. You can confirm this by saving the data as a text file first without any of those characters and this warning is no longer in the log. Having said that I do believe this is a bug because those characters are valid in UTF-8 encoding so have filed FMEENGINE-59626 to get this corrected.

With regard to the JSONFragmenter warning this is because the HTTP response is not actually UTF-8 but is actually windows-1252 encoding so although most of the characters will be the same, some might be incorrect. The JSONFragmenter should be able to handle this so I have filed FMEENGINE-59627. In the meantime the workaround to remove this warning from the log is by using the HTTPCaller to download the file to an attribute as this should handle the encoding correctly (this will also remove the need for the Text Reader that was causing problems as well).

 

EDIT: Both these issues have been fixed for 2019.1.

Thanks Holly!

Badge

Are you 100% certain that the file is in fact encoded as utf-8?

If you can open the file in e.g. Notepad++ you can look in the Encoding menu to verify the encoding, it should look like this:

Without changing anything in the Encoding menu, verify that the special characters are properly displayed.

If either of those two checks fail, there's an encoding problem in the input file.

Thanks David. Looks like I found a bug!

Reply