I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:
- SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
- Use a Tester to see if "null_counts" > 0
- Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly
If the SQLExecutor is slow, make sure "my_column" is indexed in the database.
If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.
I would say it depends on the location and size of your input data: if your data is in a queryable format (e.g. SQL), the fastest is normally to query the source table to see if there are any null values in the selected column. Something like:
- SQLExecutor (e.g. "select count(*) as null_counts from my_table where my_column is null", you'll have to manually expose "null_counts" or "NULL_COUNTS", depending on your DB)
- Use a Tester to see if "null_counts" > 0
- Attach two different FeatureReaders to the Tester outputs. Make sure to adapt the WHERE-clauses accordingly
If the SQLExecutor is slow, make sure "my_column" is indexed in the database.
If the source data isn't in a queryable format, you could use the InlineQuerier to build a temporary database and use the same query there. Or you could use e.g. the StatisticsCalculator to find all unique values for "my_column" and take it from there.
Thanks for that, it helped.