Skip to main content
Question

I have a dataset which contain null/non-null values in a particular column, I have to segregate it in the way that if any null value exists, all null/non-null values will go in 1 direction else only the non-null values will go in 2 direction (port),


bhavyagandhi
Contributor
Forum|alt.badge.img+1
I have a dataset which contain null/non-null values in a particular column, I have to segregate it in the way that if any null value exists, all null/non-null values will go in 1 direction else only the non-null values will go in 2 direction (port), help?

2 replies

david_r
Celebrity
  • August 19, 2022

I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

  • SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
  • Use a Tester to see if "null_counts" > 0
  • Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

 

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.


bhavyagandhi
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • August 29, 2022
david_r wrote:

I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

  • SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
  • Use a Tester to see if "null_counts" > 0
  • Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

 

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.

Thanks for that, it helped.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings