Question

How to detect missing values dynamically using PythonCaller?

Forum|Forum|6 years ago
June 27, 2019
11 replies
149 views

dataninja

I am trying to create a custom tool to help me do QA better and faster.

I have no problem detecting columns with null, empty ... etc but I really struggle to do so with <missing> values in PythonCaller.

Thanks for your help in advance.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

+46

ebygomm
Influencer
Forum|Forum|6 years ago
June 27, 2019

Do you have a list of attribute names that you are looking for?

Upvote

dataninja
Author
Forum|Forum|6 years ago
June 27, 2019

Do you have a list of attribute names that you are looking for?

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

Upvote

+40

jdh
Contributor
Forum|Forum|6 years ago
June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

You can then use

list(set(feature.getAllAttributeNames()).difference(schema_list))

Upvote

dataninja
Author
Forum|Forum|6 years ago
June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

You can then use

list(set(feature.getAllAttributeNames()).difference(schema_list))

Thanks, I will give that a try!

Upvote

+46

ebygomm
Influencer
Forum|Forum|6 years ago
June 27, 2019

Yes, I use the following function to do so.

Attribute_Names = feature.getAllAttributeNames()

This will only give you the names of the attributes on the feature. You'll need a list of the attributes you are expecting to have to work out which ones are missing from the feature.

Upvote

dataninja
Author
Forum|Forum|6 years ago
June 27, 2019

You would need to have a reference schema, ie all the attributes that should be present.

If you bring that in to the python caller as a list (either reading an external file, or a list attribute)

You can then use

list(set(feature.getAllAttributeNames()).difference(schema_list))

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

Upvote

+40

jdh
Contributor
Forum|Forum|6 years ago
June 27, 2019

What is schema_list? I keep getting this error - Python Exception <NameError>: global name 'schema_list' is not defined.

schema_list is a placeholder variable.

In order to know what attributes are missing, you need to know what attributes should be present.

That can be hardcoded into the workspace, or preferably come from an external file.

Upvote

david_r
Forum|Forum|6 years ago
June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

Upvote

dataninja
Author
Forum|Forum|6 years ago
June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

Hi David, the function works great when I do one attribute at a time:

But it doesn't seem to work properly when I do it through a for loop:

Do you see anything wrong with my for loop?

Upvote

+40

jdh
Contributor
Forum|Forum|6 years ago
June 27, 2019

Hi David, the function works great when I do one attribute at a time:

But it doesn't seem to work properly when I do it through a for loop:

Do you see anything wrong with my for loop?

the getAllAttributeNames() only returns attributes present on the feature. If 'Column 3' isn't present, it's never going to be checked in your loop.

That is why you need a list of all the attributes you want to validate.

Upvote

dataninja
Author
Forum|Forum|6 years ago
June 27, 2019

You can also use the FMEFeature.getAttributeNullMissingAndType() method:

isnull, ismissing, attr_type = feature.getAttributeNullMissingAndType('my_attr')

This will let know know the disctinction between a missing attribute and a null value.

Oh I see, is there another solution to get all attribute names to use for the function then?

Upvote

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded