Question

I have a dataset which contain null/non-null values in a particular column, I have to segregate it in the way that if any null value exists, all null/non-null values will go in 1 direction else only the non-null values will go in 2 direction (port),

2 years ago
August 18, 2022
2 replies
79 views

+1

bhavyagandhi
Contributor
14 replies

I have a dataset which contain null/non-null values in a particular column, I have to segregate it in the way that if any null value exists, all null/non-null values will go in 1 direction else only the non-null values will go in 2 direction (port), help?

david_r
8341 replies
2 years ago
August 19, 2022

I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
Use a Tester to see if "null_counts" > 0
Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.

+1

bhavyagandhi
Author
Contributor
14 replies
2 years ago
August 29, 2022

david_r wrote:

I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
Use a Tester to see if "null_counts" > 0
Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.

Thanks for that, it helped.