Skip to main content
Solved

Count Values in Excell fields


Hi, I want to count values for individual fields in excel. There are 110 fields. I want total counts column by column for all values that are not missing, empty, or null.

Best answer by ebygomm

There is a custom transformer on the hub - NullAttributeCounter that will count all the nulls and output a count of Null and Not Null.

 

Depending on how you are reading your excel file you may need to use a NullAttributeMapper first (or choose to Read Blank cells as Null in the excel reader parameters)

View original
Did this help you find an answer to your question?

12 replies

warrendev
Enthusiast
Forum|alt.badge.img+23
  • Enthusiast
  • November 14, 2020

Hi @jencinas​ ,

 

I think the statistics calculator transformer will work for what you want. Set the statistics calculator mode to Sum.

 

excelstatistics_calculatorsummary


  • Author
  • November 16, 2020
warrendev wrote:

Hi @jencinas​ ,

 

I think the statistics calculator transformer will work for what you want. Set the statistics calculator mode to Sum.

 

excelstatistics_calculatorsummary

@Chris Warren​ , Hey this tool works pretty good for the numeric features in my excel spreadsheet. I was able to create a workbench to calculate the count and numeric count. However, for the fields with strings, I can only calculate the total count of features. This doesn't work for me as I want to determine the number of values available in those fields as well. In some cases, I might have 828 records in excel. But a given field may only have 642 values entered in total. Any ideas how I can get this "true" value for these fields as well as the numeric ones? Thanks. Joe

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • Best Answer
  • November 16, 2020

There is a custom transformer on the hub - NullAttributeCounter that will count all the nulls and output a count of Null and Not Null.

 

Depending on how you are reading your excel file you may need to use a NullAttributeMapper first (or choose to Read Blank cells as Null in the excel reader parameters)


  • Author
  • November 16, 2020

Thanks @ebygomm​ .

I got this to work. It accomplished the task I was after. I was able to create an output with the null versus not null values for each field. I did this for the original and processed spreadsheets so I can compare them and find data that didn't successfully move through one of my ETLs. I have numerous spreadsheets to migrate from one format to another. I needed a method to assess and compare the original to the processed data so I could quickly find the problems and address them. Thanks. Joe


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • November 16, 2020
jencinas wrote:

Thanks @ebygomm​ .

I got this to work. It accomplished the task I was after. I was able to create an output with the null versus not null values for each field. I did this for the original and processed spreadsheets so I can compare them and find data that didn't successfully move through one of my ETLs. I have numerous spreadsheets to migrate from one format to another. I needed a method to assess and compare the original to the processed data so I could quickly find the problems and address them. Thanks. Joe

If you're looking at doing a before and after comparison, the changedetector might be worth a look


  • Author
  • November 17, 2020
ebygomm wrote:

If you're looking at doing a before and after comparison, the changedetector might be worth a look

@ebygomm​ , Thanks for the suggestion. I am trying to compare two datasets, (Before & After). I looked into the ChangeDetector transformer. The issue I determined was that I can't readily apply that transformer to my situation. My ETL changes the headers in the original spreadsheet to headers that will import into a Personal Geodatabase. The ChangeDectector seems to rely on those headers being the same. What I want to compare is the ratio of Null to NotNull numbers so I can determine which fields in the original spreadsheet aren't getting through my ETL. It is a QAQC approach. Instead, I used a writer to place both the before and after spreadsheets into an MS Access db. Then, I set up a crosswalk table that connects the original field headers to the new ones. Then I was able to write a query that displays only the records where the null value fields don't match. It works. But it is more work than I want to go through. Since I have numerous spreadsheets to process, I don't want this workflow to be more arduous than doing a simple visual inspection of the two spreadsheets which would be faster. Any other ideas? Joe


  • Author
  • November 19, 2020

In short, I developed a workbench which calculates "empty" and "missing" values in the spreadsheet cells. I do this for for the original and final spreadsheets. Then the workbench applies the NullAttributeCounter transformer to generate an output which distinguishes the number of cells with Null values versus Non-Null values for the range of records in each spreasdsheet. The writer publishes the resultant tables (before and after) to an MS Access. I use the same naming convention for the tables in MS Access so they match the spreadsheets. Be sure to specify all three values in the "Update/Delete Key columns" parameter. I specified "insert" for the writer mode. This publishes the attributes for the AttrName, Null,NotNull fields that are generated from the transformer. Then, I have a canned query that pieces the fields of both tables together and finds the records where the "Null" fields between the two tables don't match. This is a QAQC query that pin points any fields that are not being processed in an earlier ETL I use to migrate the data from one format to another. A little work is involved as the headers in the before and after spreadsheets don't match. So, I had to create a crosswalk table and use it to join the other two in the Relationship feature of Access. Fortunately, I only have to do this once. I will be able to reuse this MS Access db for other groups of datasets. Thank you to @ebygomm​  and @Chris Warren​ for suggestions! Joe


  • Author
  • November 19, 2020
jencinas wrote:

In short, I developed a workbench which calculates "empty" and "missing" values in the spreadsheet cells. I do this for for the original and final spreadsheets. Then the workbench applies the NullAttributeCounter transformer to generate an output which distinguishes the number of cells with Null values versus Non-Null values for the range of records in each spreasdsheet. The writer publishes the resultant tables (before and after) to an MS Access. I use the same naming convention for the tables in MS Access so they match the spreadsheets. Be sure to specify all three values in the "Update/Delete Key columns" parameter. I specified "insert" for the writer mode. This publishes the attributes for the AttrName, Null,NotNull fields that are generated from the transformer. Then, I have a canned query that pieces the fields of both tables together and finds the records where the "Null" fields between the two tables don't match. This is a QAQC query that pin points any fields that are not being processed in an earlier ETL I use to migrate the data from one format to another. A little work is involved as the headers in the before and after spreadsheets don't match. So, I had to create a crosswalk table and use it to join the other two in the Relationship feature of Access. Fortunately, I only have to do this once. I will be able to reuse this MS Access db for other groups of datasets. Thank you to @ebygomm​  and @Chris Warren​ for suggestions! Joe

I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.


chrisatsafe
Contributor
Forum|alt.badge.img+2
  • Contributor
  • November 19, 2020
jencinas wrote:

@Chris Warren​ , Hey this tool works pretty good for the numeric features in my excel spreadsheet. I was able to create a workbench to calculate the count and numeric count. However, for the fields with strings, I can only calculate the total count of features. This doesn't work for me as I want to determine the number of values available in those fields as well. In some cases, I might have 828 records in excel. But a given field may only have 642 values entered in total. Any ideas how I can get this "true" value for these fields as well as the numeric ones? Thanks. Joe

 

Hi @jencinas​ ,

 

One way you could do this would be to create a temporary attribute and set the value of that attribute conditionally. Since you want to determine how many features have an attribute value, you can create a temporary attribute that sets the value to 1 if the attribute has a value and 0 if the attribute does not have a value. Then calculate the sum of the temp attribute in the StatisticsCalculator.

2020-11-19_15-04-51 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • November 20, 2020
jencinas wrote:

I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.

Are you sure that is the issue? The null counter will count a mix of string and numerical values for me


  • Author
  • November 20, 2020
jencinas wrote:

I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.

@ebygomm​ , I am not sure now. I did run it successfully against another spreadsheet with similar design. I will try again to make it run and troubleshoot the problem. The NullAttributeCounter does exactly what I want. So, when it failed this time and I got the Python error, I researched it on this knowledgebase. The closes example to the problem was what I already posted. Yet, that doesn't preclude that something else is causing the tool to fail. I will give it another try. Joe


  • Author
  • November 20, 2020
jencinas wrote:

I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.

@ebygomm​ , Here is the error I am getting. I will troubleshoot this further. Joe

 

Python Exception <IndexError>: list index out of range

Traceback (most recent call last):

 File "<string>", line 44, in close

IndexError: list index out of range

NullAttributeCounter_2_PythonCaller(PythonFactory): PythonFactory failed to close properly

NullAttributeCounter_2_PythonCaller(PythonFactory): A fatal error has occurred. Check the logfile above for details

A fatal error has occurred. Check the logfile above for details

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings