Question

How to detect missing values dynamically using PythonCaller?


Badge

I am trying to create a custom tool to help me do QA better and faster.

I have no problem detecting columns with null, empty ... etc but I really struggle to do so with <missing> values in PythonCaller.

Thanks for your help in advance.


11 replies

Userlevel 1
Badge +21

Do you have a list of attribute names that you are looking for?

Badge

Do you have a list of attribute names that you are looking for?

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

Badge +22

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

Badge

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

Thanks, I will give that a try!

Userlevel 1
Badge +21

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

This will only give you the names of the attributes on the feature. You'll need a list of the attributes you are expecting to have to work out which ones are missing from the feature.

Badge

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

Badge +22

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

schema_list is a placeholder variable.

 

 

In order to know what attributes are missing, you need to know what attributes should be present.

 

 

That can be hardcoded into the workspace, or preferably come from an external file.
Userlevel 4

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Badge

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Hi David, the function works great when I do one attribute at a time:

0684Q00000ArMXgQAN.png

 

But it doesn't seem to work properly when I do it through a for loop:0684Q00000ArMG1QAN.png

Do you see anything wrong with my for loop?

Badge +22

Hi David, the function works great when I do one attribute at a time:

 

But it doesn't seem to work properly when I do it through a for loop:

Do you see anything wrong with my for loop?

the getAllAttributeNames() only returns attributes present on the feature. If 'Column 3' isn't present, it's never going to be checked in your loop.

 

 

That is why you need a list of all the attributes you want to validate.
Badge

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Oh I see, is there another solution to get all attribute names to use for the function then?

Reply