Skip to main content
I have a dataset which contain null/non-null values in a particular column, I have to segregate it in the way that if any null value exists, all null/non-null values will go in 1 direction else only the non-null values will go in 2 direction (port), help?

I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

  • SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
  • Use a Tester to see if "null_counts" > 0
  • Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

 

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.


I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:

  • SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
  • Use a Tester to see if "null_counts" > 0
  • Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly

If the SQLExecutor is slow, make sure "my_column" is indexed in the database.

 

If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.

Thanks for that, it helped.


Reply