Question

Searching complete dataset for null values


Userlevel 3
Badge +26

I'm trying to calculate the percentage of null values per feature class in a dataset. Null values are defined by '-99999' or 'noinformation' or 'No Information'.

I have hundreds of feature classes, each with hundreds of attributes named differently. So building a workbench calling out each attribute specifically would be very time consuming. This needs to be a dynamic solution.

So far I've tried an AttributeExploder followed by ListSearcher, but I'm having trouble calculating totals based on feature class.


2 replies

Userlevel 4

Use the Group By in the StatisticsCalculator to calculate totals by group, in this case using the feature class name (fme_feature_type).

If your data is stored in a SQL database, a much faster solution would be to use a SQLExecutor with something like

select sum(
    case when your_attribute in ('-99999', 'noinformation', 'No Information') then 1
    else 0 end) * 100 / count(*) as percentage_empty
from your_feature_class

You can e.g. use the "Schema (any format)" reader to extract a list of all the feature classes and all the attribute names to feed into the SQLExecutor.

Badge +3

If you use an attribute exploder you can create list grouped by attr_value.

Add all the list element counts for total. (better to count them prior to list building though, saves some effort)

Then test the attr_value for the criteria you mentioned. Add their counts and divide by total.

Reply