Skip to main content
Question

How to detect missing values dynamically using PythonCaller?

  • June 27, 2019
  • 11 replies
  • 128 views

dataninja
Forum|alt.badge.img

I am trying to create a custom tool to help me do QA better and faster.

I have no problem detecting columns with null, empty ... etc but I really struggle to do so with <missing> values in PythonCaller.

Thanks for your help in advance.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

11 replies

ebygomm
Influencer
Forum|alt.badge.img+46
  • Influencer
  • June 27, 2019

Do you have a list of attribute names that you are looking for?


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019

Do you have a list of attribute names that you are looking for?

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()


jdh
Contributor
Forum|alt.badge.img+40
  • Contributor
  • June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

Thanks, I will give that a try!


ebygomm
Influencer
Forum|alt.badge.img+46
  • Influencer
  • June 27, 2019

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

This will only give you the names of the attributes on the feature. You'll need a list of the attributes you are expecting to have to work out which ones are missing from the feature.


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

 

 

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

 

You can then use
list(set(feature.getAllAttributeNames()).difference(schema_list))

 

 

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.


jdh
Contributor
Forum|alt.badge.img+40
  • Contributor
  • June 27, 2019

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

schema_list is a placeholder variable.

 

 

In order to know what attributes are missing, you need to know what attributes should be present.

 

 

That can be hardcoded into the workspace, or preferably come from an external file.

david_r
Celebrity
  • June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 


dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Hi David, the function works great when I do one attribute at a time:

0684Q00000ArMXgQAN.png

 

But it doesn't seem to work properly when I do it through a for loop:0684Q00000ArMG1QAN.png

Do you see anything wrong with my for loop?


jdh
Contributor
Forum|alt.badge.img+40
  • Contributor
  • June 27, 2019

Hi David, the function works great when I do one attribute at a time:

 

But it doesn't seem to work properly when I do it through a for loop:

Do you see anything wrong with my for loop?

the getAllAttributeNames() only returns attributes present on the feature. If 'Column 3' isn't present, it's never going to be checked in your loop.

 

 

That is why you need a list of all the attributes you want to validate.

dataninja
Forum|alt.badge.img
  • Author
  • June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

 

Oh I see, is there another solution to get all attribute names to use for the function then?