Skip to main content
Question

Statistical Calculator

  • October 11, 2019
  • 7 replies
  • 218 views

freddy17
Contributor
Forum|alt.badge.img+8

Dear FME Forum,

I am relatively new to the Software, and just wondered if it is possible to count "Null" or "Missing" values with the Statistical Calculator?

With the statistical calculator it is possible to count the Total amount of Values for an attribute but not how many that are empty. The Null Mapper or Tester is great, but I am looking for a Transformer for the whole range of attributes.

Should I use a Python Caller for this or is there an existing transformer?

 

My Question stems from PowerQuery

In Microsoft`s PowerQuery it is possible to get an overview of how complete your data is, I am looking for the same result (please see Screenshot below)

 

Here is a link to the PowerQuery Table Profile:

https://docs.microsoft.com/en-us/powerquery-m/table-profile

 

Thanks in advance

 

Fred

7 replies

philippeb
Enthusiast
Forum|alt.badge.img+19
  • Enthusiast
  • October 11, 2019

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 11, 2019

Are you looking to find out how many features contain null or missing data in any attribute or a breakdown for each attribute?


takashi
Influencer
  • October 12, 2019

Hi @freddy17, if only numeric values are valid, you can use the StatisticsCalculator to count the number of numeric values and compute the number of invalid values by subtracting the number of numeric values from the total number of features. See the Numeric Count Attribute parameter.


freddy17
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 15, 2019

I have attached a FME File + Excel file of what I am trying to achieve. @takashi, as the Values I am looking for are mostly strings. The numeric count wont work.

 

@egomm. I am trying to calculate a percentage of values per attribute, so yes the breakdown for each attribute.

 

I have attached a simple example, but it seems very convoluted. In my real dataset I have about 300 Attributes. Fruit Null Tester.fmw

Fruit Test.xlsx

 

I will read up on the Histogram a bit.

Thanks


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 15, 2019

My cheats way if i want to avoid python and just need to count null/empty/missing (as opposed to identifying) is to use an arithmetic editor to calculate string length and turn any value of zero to 1 and any value greater than zero to 0 and then use a statistics calculator to sum all the 1's

null_fruit.fmwt


freddy17
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 15, 2019
ebygomm wrote:

My cheats way if i want to avoid python and just need to count null/empty/missing (as opposed to identifying) is to use an arithmetic editor to calculate string length and turn any value of zero to 1 and any value greater than zero to 0 and then use a statistics calculator to sum all the 1's

null_fruit.fmwt

Thanks, that has simplified things a bit. I think I will try with Python as another approach, as I have a few hundred Attributes.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 15, 2019

An alternative approaching using the attribute validator - could be slow if you have lots of attributes and lots missing/empty/null because of exploding the list


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings