Skip to main content
Question

Data profile - count, missing, unique, null

  • October 9, 2019
  • 1 reply
  • 50 views

Forum|alt.badge.img

Hello,

I'm trying to figure out a way how to do a data profile, similiar to skim in R.

Something like this:

But idealy without having to use the Python, or R caller...

The statistics calculator does count, min, max, etc, that helps. But I need unique, missing and nulls as well.

Is there a simple way how to do this in FME?

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

1 reply

jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • October 9, 2019

I would actually use python for this, but in pure FME I think I would go with an Aggregator (generate list, count attribute) followed by a listHistogrammer.

You can use a ListElementCounter on the list and compare that to the count attribute from the aggregator to get the number of missing attribute

You can use a ListElementCounter on the histogram to get the number of unique values (possibly - 1 if there are null values present and you don't want to count it).

 

 

You can use a listSearcher on the histogram to get the null attribute count.

 

 

You can sort the histogram by value and the first and last values (ListIndexer) are your min and max. (again excepting the null if present),