Question

ListKeyValuePairExtractor - encoding error


Badge

Hello,

I'm facing a new issue with ListKeyValuePairExtractor  componant, for some data it encounter an encoding error :

 

 

Python Exception <UnicodeDecodeError>: 'ascii' codec can't decode byte 0xa0 in position 12: ordinal not in range(128)Traceback (most recent call last):  File "<string>", line 30, in input  File "<string>", line 118, in get_list_attribute_namesUnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 12: ordinal not in range(128)Error encountered while calling method `input'f_60(PythonFactory): PythonFactory failed to process featureA fatal error has occurred. Check the logfile above for detailsError encountered while calling method `input'f_42(PythonFactory): PythonFactory failed to process featureA fatal error has occurred. Check the logfile above for details

 

 

I tried a lot of things, encode all attributes on utf-8 or windows-1252, I've try a stringreplacer with this regex ([^\w\S"]{0,2}$). But anyway still the same issue, the job stop on the ListKeyValuePairExtractor  .

 

 

Any idea how to fix that or ignore those errors and continue to process the other things ? 

 

Kind regards,

Nicolas


23 replies

Badge +3
I think the solution should be sought in the characters that could not be decoded. Is ti possible to find the specific characters?

 

 

Userlevel 4

Hopefully hope @sander can help us out, he seems to be the developer :-)

Userlevel 4

A small thing to test: modify line 113 of the PythonCaller to read:

prefix, suffix = pattern.decode('cp1252')

You may have to play around with the code page number

Badge

thanks for your reply @david_r but I'm not an expert on FME (self trained on it), so how can I modify Python code of that transformer ?

Badge
I think the solution should be sought in the characters that could not be decoded. Is ti possible to find the specific characters?

 

 

because of the error code "0xa0" it seems to be a white space

 

 

Userlevel 4

You need to right-click on the custom transformer and select Edit.

The custom transformer will open in a new instance of FME Workbench, where you can edit the PythonCaller. When saving the custom transformer, the easiest is to not change the transformer version number (you will be asked before saving).

Badge

You need to right-click on the custom transformer and select Edit.

The custom transformer will open in a new instance of FME Workbench, where you can edit the PythonCaller. When saving the custom transformer, the easiest is to not change the transformer version number (you will be asked before saving).

OK, it's done and now here are the errors :

 

 

Python Exception <AttributeError>: 'tuple' object has no attribute 'decode'
Traceback (most recent call last):
  File "<string>", line 30, in input
  File "<string>", line 113, in get_list_attribute_names
AttributeError: 'tuple' object has no attribute 'decode'
Error encountered while calling method `input'
f_42(PythonFactory): PythonFactory failed to process feature
A fatal error has occurred. Check the logfile above for details A fatal error has occurred. Check the logfile above for details

 

Userlevel 4

Remote debugging is hard :-)

Try the following instead:

prefix, suffix = pattern
prefix = prefix.decode('cp1252')
suffix = suffix.decode('cp1252')
Badge

Remote debugging is hard :-)

Try the following instead:

prefix, suffix = pattern
prefix = prefix.decode('cp1252')
suffix = suffix.decode('cp1252')

 

Yes, and I'm very sorry about that, how could I help ?

 

 

There's a new error now :

 

 

Traceback (most recent call last):
  File "<string>", line 30, in input
  File "<string>", line 120, in get_list_attribute_names
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 12: ordinal not in range(128)
Error encountered while calling method `input'
f_56(PythonFactory): PythonFactory failed to process feature
A fatal error has occurred. Check the logfile above for details
Error encountered while calling method `input'

 

 

 

I'm trying to decode key_attr but didn't worked.

 

 

key_attr = feature.getAttribute('__key_attr'.decode('cp1252'))
Badge +4

If you are the author of the code I would recomend changing python interpreter to 3.6+. it is default to unicode characters.

Badge +4

If you are the author of the code I would recomend changing python interpreter to 3.6+. it is default to unicode characters. 

and if you for some reason prefer to use python 2.7 take a look at http://python-future.org/imports.html

 

you can add:

 

from __future__ import unicode_literals 
and if you did that one, you could do

 

from __future__ import absolute_import, division, print_function, unicode_literals
which makes it compatible with python 3.
Badge

 

Yes, and I'm very sorry about that, how could I help ?

 

 

There's a new error now :

 

 

Traceback (most recent call last):
  File "<string>", line 30, in input
  File "<string>", line 120, in get_list_attribute_names
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 12: ordinal not in range(128)
Error encountered while calling method `input'
f_56(PythonFactory): PythonFactory failed to process feature
A fatal error has occurred. Check the logfile above for details
Error encountered while calling method `input'

 

 

 

I'm trying to decode key_attr but didn't worked.

 

 

key_attr = feature.getAttribute('__key_attr'.decode('cp1252'))
Ok, I found a solution, add line 120 this :  attr =  attr.decode('cp1252')

 

 

It seems to work, I'll launch a complete test, before closing that post

 

 

Userlevel 4
Ok, I found a solution, add line 120 this :  attr =  attr.decode('cp1252')

 

 

It seems to work, I'll launch a complete test, before closing that post

 

 

Great to hear that you found a solution.

 

For reference, you could probably have done this as well:

 

key_attr = feature.getAttribute('__key_attr').decode('cp1252')
Notice how it's the result from getAttribute() that you need to convert to unicode, not the attribute name.
Userlevel 4

If you are the author of the code I would recomend changing python interpreter to 3.6+. it is default to unicode characters.

I suspect that the custom transformer in question hasn't been tested with Python 3.x yet, and since the Python code is rather consequential I'm not sure how much work it would be.

 

Hopefully @sander can help us out with a more permanent solution.

 

Otherwise I agree completely.
Badge +4
I suspect that the custom transformer in question hasn't been tested with Python 3.x yet, and since the Python code is rather consequential I'm not sure how much work it would be.

 

Hopefully @sander can help us out with a more permanent solution.

 

Otherwise I agree completely.
yes, I can see it is allready written to be compatible with both. The he should just make sure his python environment is set to 3.x+

 

 

Userlevel 4
yes, I can see it is allready written to be compatible with both. The he should just make sure his python environment is set to 3.x+

 

 

Good catch!
Badge
Great to hear that you found a solution.

 

For reference, you could probably have done this as well:

 

key_attr = feature.getAttribute('__key_attr').decode('cp1252')
Notice how it's the result from getAttribute() that you need to convert to unicode, not the attribute name.
Ok, it seem the job doesn't keep the changes done on when I run it with a lot of lines, and use the basic one. How can I use the new transformer with the modifications ?

 

 

Userlevel 4
Ok, it seem the job doesn't keep the changes done on when I run it with a lot of lines, and use the basic one. How can I use the new transformer with the modifications ?

 

 

If possible, try simply skipping the modifications to the custom transformer and rather configure your main workspace to use Python 3.x:

 

 

Badge

(Thanks for tagging me, @david_r!)

Hi @nmeriotdev/Nicolas,

Sorry to hear about the trouble you are experiencing with "my" ListKeyValuePairExtractor!

 

I did test it on some special characters/encodings, but with this stuff, it's hard to catch all cases...

Would you mind sending me (a part of) your workspace and some test data, so I can hopefully reproduce the issue and improve/fix the ListKeyValuePairExtractor? As I'm quite busy at the moment, I might need 1-2 weeks or so...

You could try David's suggestions, but I doubt if they will work. The decode() function converts a string into a unicode object, but the prefix and suffix objects are probably of type unicode already. Instead, I expect that the attr object should be decoded as a unicode object, so the startswith() and endswith() functions will be called on objects of the same type.

The first thing we could try, is to switch FME from Python 2.7 to 3.x (Workspace Parameters > Scripting > Python Compatibility). If you have 3.x installed and have the option to use it, that is. The advantage of Python 3.x, is that the unicode type no longer exists and that it always compares bytes objects.

If 3.x doesn't work for you, you could change the code as follows:

def get_list_attribute_names(self, feature, pattern):
    prefix, suffix = pattern
    attributes = feature.getAllAttributeNames()     # get all attribute names
    result_list = []
    for attr in attributes:
        # filter out the attributes we need
        attr = attr.decode('cp1252')
        if attr.startswith(prefix) and attr.endswith(suffix):
            index = self.regex1.findall(attr)[-1]   # gets the last number (= list index)
            result_list.append((index, attr))
    return sorted(result_list)

As David already mentioned, you might need to play with the exact encoding here. 

 

What does FME say in the log file around line 40? For instance, on my machine, it says:

Operating System: Microsoft Windows 10 64-bit  (Build 16299)

 

FME Platform: WIN32

 

Locale: en_GB

 

Code Page: 1252  (ANSI - Latin I)

Alternatively, you can also change this line (from the code example above):

attr = attr.decode('cp1252')

into:

attr = attr.decode('cp1252', errors='ignore')

This will simply tell Python to ignore any bytes it cannot decode. Actually a bad solution, but at least your workspace will no longer crash here.

 

 

If this also doesn't work (i.e. the workspace still crashes), you could replace the get_list_attribute_names() function by this piece of code:
def get_list_attribute_names(self, feature, pattern):
    prefix, suffix = pattern
    attributes = feature.getAllAttributeNames()     # get all attribute names
    result_list = []
    for attr in attributes:
        # filter out the attributes we need
        try:
            if attr.startswith(prefix) and attr.endswith(suffix):
                index = self.regex1.findall(attr)[-1]   # gets the last number (= list index)
                result_list.append((index, attr))          except UnicodeError:            
            pass
    return sorted(result_list)

This is just a temporary solution though. I would love to properly fix the ListKeyValuePairExtractor...

 

I hope you are able to share your (partial) workspace + data here.

Badge +4

I'm not sure if this gives the same result, but I gave it a try to make the whole thing a little bit shorter.

It will be easier to investigate if the result is close.

listkeyvalueextractor.fmx

Badge
Good catch!
That is exactly what I said in my response, but the answer I wrote (as always) was quite lengthy ;)

 

 

Badge

I'm not sure if this gives the same result, but I gave it a try to make the whole thing a little bit shorter.

It will be easier to investigate if the result is close.

listkeyvalueextractor.fmx

This version might work for you, @nmeriotdev. However, it will only work for basic lists (nice consecutive items) and it also doesn't work correctly for nested lists (lists inside lists). Hence the lengthy code...

 

Please note that the transformers input fields might not work (anymore) as intended. You actually need to hack the parameters section in the FMX file to do so.
Badge

Has anyone had any luck fixing this? We've had a similar issue, hoping @sander might be able to help resolve. 

The transformer does what we want it to and can get some features running through, but it causes the workspace to fail. 

Here's an extract of the log file with the pertinent errors: 

2019-12-31 13:57:24|   2.6|  0.1|WARN  |Python Exception <UnboundLocalError>: local variable 'error' referenced before assignment
2019-12-31 13:57:24|   2.6|  0.0|WARN  |Traceback (most recent call last):
  File "<string>", line 63, in input
UnboundLocalError: local variable 'error' referenced before assignment

2019-12-31 13:57:24|   2.6|  0.0|ERROR |Error encountered while calling method `input'
2019-12-31 13:57:24|   2.6|  0.0|FATAL |ListKeyValuePairExtractor_PythonCaller (PythonFactory): PythonFactory failed to process feature

Reply