Hi, I want to count values for individual fields in excel. There are 110 fields. I want total counts column by column for all values that are not missing, empty, or null.
Hi @jencinas ,
I think the statistics calculator transformer will work for what you want. Set the statistics calculator mode to Sum.
Hi @jencinas ,
I think the statistics calculator transformer will work for what you want. Set the statistics calculator mode to Sum.
@Chris Warren , Hey this tool works pretty good for the numeric features in my excel spreadsheet. I was able to create a workbench to calculate the count and numeric count. However, for the fields with strings, I can only calculate the total count of features. This doesn't work for me as I want to determine the number of values available in those fields as well. In some cases, I might have 828 records in excel. But a given field may only have 642 values entered in total. Any ideas how I can get this "true" value for these fields as well as the numeric ones? Thanks. Joe
There is a custom transformer on the hub - NullAttributeCounter that will count all the nulls and output a count of Null and Not Null.
Depending on how you are reading your excel file you may need to use a NullAttributeMapper first (or choose to Read Blank cells as Null in the excel reader parameters)
Thanks @ebygomm .
I got this to work. It accomplished the task I was after. I was able to create an output with the null versus not null values for each field. I did this for the original and processed spreadsheets so I can compare them and find data that didn't successfully move through one of my ETLs. I have numerous spreadsheets to migrate from one format to another. I needed a method to assess and compare the original to the processed data so I could quickly find the problems and address them. Thanks. Joe
Thanks @ebygomm .
I got this to work. It accomplished the task I was after. I was able to create an output with the null versus not null values for each field. I did this for the original and processed spreadsheets so I can compare them and find data that didn't successfully move through one of my ETLs. I have numerous spreadsheets to migrate from one format to another. I needed a method to assess and compare the original to the processed data so I could quickly find the problems and address them. Thanks. Joe
If you're looking at doing a before and after comparison, the changedetector might be worth a look
If you're looking at doing a before and after comparison, the changedetector might be worth a look
@ebygomm , Thanks for the suggestion. I am trying to compare two datasets, (Before & After). I looked into the ChangeDetector transformer. The issue I determined was that I can't readily apply that transformer to my situation. My ETL changes the headers in the original spreadsheet to headers that will import into a Personal Geodatabase. The ChangeDectector seems to rely on those headers being the same. What I want to compare is the ratio of Null to NotNull numbers so I can determine which fields in the original spreadsheet aren't getting through my ETL. It is a QAQC approach. Instead, I used a writer to place both the before and after spreadsheets into an MS Access db. Then, I set up a crosswalk table that connects the original field headers to the new ones. Then I was able to write a query that displays only the records where the null value fields don't match. It works. But it is more work than I want to go through. Since I have numerous spreadsheets to process, I don't want this workflow to be more arduous than doing a simple visual inspection of the two spreadsheets which would be faster. Any other ideas? Joe
In short, I developed a workbench which calculates "empty" and "missing" values in the spreadsheet cells. I do this for for the original and final spreadsheets. Then the workbench applies the NullAttributeCounter transformer to generate an output which distinguishes the number of cells with Null values versus Non-Null values for the range of records in each spreasdsheet. The writer publishes the resultant tables (before and after) to an MS Access. I use the same naming convention for the tables in MS Access so they match the spreadsheets. Be sure to specify all three values in the "Update/Delete Key columns" parameter. I specified "insert" for the writer mode. This publishes the attributes for the AttrName, Null,NotNull fields that are generated from the transformer. Then, I have a canned query that pieces the fields of both tables together and finds the records where the "Null" fields between the two tables don't match. This is a QAQC query that pin points any fields that are not being processed in an earlier ETL I use to migrate the data from one format to another. A little work is involved as the headers in the before and after spreadsheets don't match. So, I had to create a crosswalk table and use it to join the other two in the Relationship feature of Access. Fortunately, I only have to do this once. I will be able to reuse this MS Access db for other groups of datasets. Thank you to @ebygomm and @Chris Warren for suggestions! Joe
In short, I developed a workbench which calculates "empty" and "missing" values in the spreadsheet cells. I do this for for the original and final spreadsheets. Then the workbench applies the NullAttributeCounter transformer to generate an output which distinguishes the number of cells with Null values versus Non-Null values for the range of records in each spreasdsheet. The writer publishes the resultant tables (before and after) to an MS Access. I use the same naming convention for the tables in MS Access so they match the spreadsheets. Be sure to specify all three values in the "Update/Delete Key columns" parameter. I specified "insert" for the writer mode. This publishes the attributes for the AttrName, Null,NotNull fields that are generated from the transformer. Then, I have a canned query that pieces the fields of both tables together and finds the records where the "Null" fields between the two tables don't match. This is a QAQC query that pin points any fields that are not being processed in an earlier ETL I use to migrate the data from one format to another. A little work is involved as the headers in the before and after spreadsheets don't match. So, I had to create a crosswalk table and use it to join the other two in the Relationship feature of Access. Fortunately, I only have to do this once. I will be able to reuse this MS Access db for other groups of datasets. Thank you to @ebygomm and @Chris Warren for suggestions! Joe
I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.
@Chris Warren , Hey this tool works pretty good for the numeric features in my excel spreadsheet. I was able to create a workbench to calculate the count and numeric count. However, for the fields with strings, I can only calculate the total count of features. This doesn't work for me as I want to determine the number of values available in those fields as well. In some cases, I might have 828 records in excel. But a given field may only have 642 values entered in total. Any ideas how I can get this "true" value for these fields as well as the numeric ones? Thanks. Joe
Hi @jencinas ,
One way you could do this would be to create a temporary attribute and set the value of that attribute conditionally. Since you want to determine how many features have an attribute value, you can create a temporary attribute that sets the value to 1 if the attribute has a value and 0 if the attribute does not have a value. Then calculate the sum of the temp attribute in the StatisticsCalculator.
I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.
Are you sure that is the issue? The null counter will count a mix of string and numerical values for me
I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.
@ebygomm , I am not sure now. I did run it successfully against another spreadsheet with similar design. I will try again to make it run and troubleshoot the problem. The NullAttributeCounter does exactly what I want. So, when it failed this time and I got the Python error, I researched it on this knowledgebase. The closes example to the problem was what I already posted. Yet, that doesn't preclude that something else is causing the tool to fail. I will give it another try. Joe
I spoke too soon. On a different dataset, I ran into a python error with the NullAttributeCounter transformer. Apparently, the pythoncode in the transformer doesn't handle fields that have both string and numerical values, which my dataset does. Now, I am rethinking how I can modify the data so I can still account for the nulls versus non-nulls. When this transformer works, it makes a really useful output. I have FME 2018 on a Citrix server.
@ebygomm , Here is the error I am getting. I will troubleshoot this further. Joe
Python Exception <IndexError>: list index out of range
Traceback (most recent call last):
File "<string>", line 44, in close
IndexError: list index out of range
NullAttributeCounter_2_PythonCaller(PythonFactory): PythonFactory failed to close properly
NullAttributeCounter_2_PythonCaller(PythonFactory): A fatal error has occurred. Check the logfile above for details
A fatal error has occurred. Check the logfile above for details