Skip to main content
Question

How to detect missing values dynamically using PythonCaller?


dataninja
Forum|alt.badge.img

I am trying to create a custom tool to help me do QA better and faster.

I have no problem detecting columns with null, empty ... etc but I really struggle to do so with <missing> values in PythonCaller.

Thanks for your help in advance.

11 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • June 27, 2019

Do you have a list of attribute names that you are looking for?


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019
ebygomm wrote:

Do you have a list of attribute names that you are looking for?

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019
jdh wrote:

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

Thanks, I will give that a try!


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • June 27, 2019
dataninja wrote:

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

This will only give you the names of the attributes on the feature. You'll need a list of the attributes you are expecting to have to work out which ones are missing from the feature.


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019
jdh wrote:

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • June 27, 2019
dataninja wrote:

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

schema_list is a placeholder variable.

 

 

In order to know what attributes are missing, you need to know what attributes should be present.

 

 

That can be hardcoded into the workspace, or preferably come from an external file.

david_r
Evangelist
  • June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019
david_r wrote:

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Hi David, the function works great when I do one attribute at a time:

0684Q00000ArMXgQAN.png

 

But it doesn't seem to work properly when I do it through a for loop:0684Q00000ArMG1QAN.png

Do you see anything wrong with my for loop?


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • June 27, 2019
dataninja wrote:

Hi David, the function works great when I do one attribute at a time:

 

But it doesn't seem to work properly when I do it through a for loop:

Do you see anything wrong with my for loop?

the getAllAttributeNames() only returns attributes present on the feature. If 'Column 3' isn't present, it's never going to be checked in your loop.

 

 

That is why you need a list of all the attributes you want to validate.

dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019
david_r wrote:

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Oh I see, is there another solution to get all attribute names to use for the function then?


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings