Question

Change values according to their count

  • 12 November 2018
  • 8 replies
  • 14 views

I am reading a CSV file and need to change the values of one of the attributes according to the count of these values. The values are typically in the range of whole numbers 1 - 6 (might be more) with various total counts. I need to change these values so the most frequent gets a value of 1, the second gets 2, third gets 3 and the rest is deleted.

In the example below: 6 is changed to 1, 4 is changed to 2, 1 to 3 and 2 is deleted.

If the counts of any values are equal I should get an error.

 

Attribute name

6

6

6

6

4

4

4

1

1

2

 

 

Thank you for your help.

 


8 replies

Badge +3

@bigboocha

 

1 Listbuilder grouped by Attribute name.

2 Listhistogrammer on the list.

3 sort descending on the histogram{}.count.

4. counter grouped by histogram{}.count.

5. listelement_counter and test for elementcount =1. The others are your "errors"

6 rename

Badge +2

Hi @bigboocha,

One option you have is to use the StatisticsCalculator to count and group your features based on common attribute values. Then, you can use a DuplicateFilter to verify that there are no equal counts - you can use a MessageLogger to log an error or create a new attribute (i.e. _error) with an error message and connect it to the duplicate port.

Next, to make this a little more dynamic, use a Sorter and set the order to numeric descending then use a Counter to order the features (count starting at 1). You can let the Counter create a new attribute or replace the existing values by setting the Count Output Attribute to the attribute name. Lastly, use a Tester to filter out features where the count is 1 and remove any unwanted attributes with an AttributeRemover.

You can accomplish a similar result using conditional values inside the AttributeManager.

Count_and_Change.fmwt

Badge +2

Hi @bigboocha,

Please find attached a suggested method of how you could achieve this...

1) Use the Aggregator to Group by attribute Name and add the total count value

2) Use a DuplicateFilter to test if any counts are equal - if Yes then there is a terminator to error the translation

3) Use a Sorter to sort the total count by numeric, descending and then a Sampler to keep the first 3 counts ( ie the three highest)

4) Use a counter to replace the Attribute Name values with 1, 2, 3

5) Use the Deaggregator to remove grouping

xlsxr2none.fmwt

Badge +2

Hi @bigboocha,

One option you have is to use the StatisticsCalculator to count and group your features based on common attribute values. Then, you can use a DuplicateFilter to verify that there are no equal counts - you can use a MessageLogger to log an error or create a new attribute (i.e. _error) with an error message and connect it to the duplicate port.

Next, to make this a little more dynamic, use a Sorter and set the order to numeric descending then use a Counter to order the features (count starting at 1). You can let the Counter create a new attribute or replace the existing values by setting the Count Output Attribute to the attribute name. Lastly, use a Tester to filter out features where the count is 1 and remove any unwanted attributes with an AttributeRemover.

You can accomplish a similar result using conditional values inside the AttributeManager.

Count_and_Change.fmwt

@ChrisAtSafe great minds think a like.. looks like we have suggested very similar methods!

Hi @bigboocha,

Please find attached a suggested method of how you could achieve this...

1) Use the Aggregator to Group by attribute Name and add the total count value

2) Use a DuplicateFilter to test if any counts are equal - if Yes then there is a terminator to error the translation

3) Use a Sorter to sort the total count by numeric, descending and then a Sampler to keep the first 3 counts ( ie the three highest)

4) Use a counter to replace the Attribute Name values with 1, 2, 3

5) Use the Deaggregator to remove grouping

xlsxr2none.fmwt

Thank you very much. This worked like a charm.

Hi @bigboocha,

Please find attached a suggested method of how you could achieve this...

1) Use the Aggregator to Group by attribute Name and add the total count value

2) Use a DuplicateFilter to test if any counts are equal - if Yes then there is a terminator to error the translation

3) Use a Sorter to sort the total count by numeric, descending and then a Sampler to keep the first 3 counts ( ie the three highest)

4) Use a counter to replace the Attribute Name values with 1, 2, 3

5) Use the Deaggregator to remove grouping

xlsxr2none.fmwt

@hollyatsafe Thank you for the answer. I found one glitch eventually. When I put the data through the Deaggregator to remove the grouping, what I get is the original number of data rows (before grouping and sorting) but all of the data is the same. It just copies all the attribute values from the three grouped rows. Is there any solution for this so I get all the original data back and only one attribute is altered? Thank you!

Badge +2

@hollyatsafe Thank you for the answer. I found one glitch eventually. When I put the data through the Deaggregator to remove the grouping, what I get is the original number of data rows (before grouping and sorting) but all of the data is the same. It just copies all the attribute values from the three grouped rows. Is there any solution for this so I get all the original data back and only one attribute is altered? Thank you!

Hi @bigboocha,

Ah yes, in the Aggregator under Attribute Accumulation check the 'Generate List' box, give the list a name and for the Add to List parameter select 'All Attributes'.

 

Now go to the Deaggregator transformer and under Attribute Accumulation select this list for the List Attribute to Explode parameter.

 

I believe this should now retain all other existing attribute information.

Hi @bigboocha,

Ah yes, in the Aggregator under Attribute Accumulation check the 'Generate List' box, give the list a name and for the Add to List parameter select 'All Attributes'.

 

Now go to the Deaggregator transformer and under Attribute Accumulation select this list for the List Attribute to Explode parameter.

 

I believe this should now retain all other existing attribute information.

@hollyatsafe Thank you. That solved it.

Reply