Question

Data profile - count, missing, unique, null

  • 9 October 2019
  • 1 reply
  • 8 views

Badge

Hello,

I'm trying to figure out a way how to do a data profile, similiar to skim in R.

Something like this:

But idealy without having to use the Python, or R caller...

The statistics calculator does count, min, max, etc, that helps. But I need unique, missing and nulls as well.

Is there a simple way how to do this in FME?

 


1 reply

Badge +22

I would actually use python for this, but in pure FME I think I would go with an Aggregator (generate list, count attribute) followed by a listHistogrammer.

You can use a ListElementCounter on the list and compare that to the count attribute from the aggregator to get the number of missing attribute

You can use a ListElementCounter on the histogram to get the number of unique values (possibly - 1 if there are null values present and you don't want to count it).

 

 

You can use a listSearcher on the histogram to get the null attribute count.

 

 

You can sort the histogram by value and the first and last values (ListIndexer) are your min and max. (again excepting the null if present),

 

 

Reply