Solved

Keeping unique values from multiple Excel columns

  • 7 November 2018
  • 7 replies
  • 11 views

Badge

Hello,

I have an Excel file that consists of 171 columns. I would like to keep the unique values for every single column.

I tried the DuplicateFilter but it returns me the unique values of only the 1st column. This means I would have to repeat the process for 171 times.

Do you reckon that Python is a one-way road? I would appreciate your thoughts on this.

Thank you for your time.

Kind regards,

George

icon

Best answer by jdh 7 November 2018, 21:21

View original

7 replies

Userlevel 4
Badge +30

Hi @georgefloros

 

Please, check the transformer Matcher.

This transformer has 3 output ports and is possible to select more than 1 Attribute to analyse:

 

Thanks,

Danilo

Badge

Hello @danilo_fme,

I also came across to this, but it looks like it does not do what I really want. Please find attached a screenshot of the outcome. Again, it works fine for one column, but when I select 3 it also brings values from other columns that do now allow me to identify unique values for every column. AO1 location should have only 3 unique values but now I cannot identify them.

Hopefully the screenshot would be more enlighting.

Userlevel 4
Badge +25

So, to be sure I understand, do you want to keep every record, as long as one of its values is unique for a column?

eg if the columns were ABC and you had...

Record 1

 

1

 

2

 

3

 

Record 2

 

2

 

3

 

5Record 3

 

3

 

4

 

5

 

Record 4

 

3

 

3

 

3

 

Record 5

 

3

 

5

 

5

 

 

Then you could keep record 1 and record 2 (because their values for Col A are unique) and keep record 3 (because it's value for Col B is unique), you would not keep record 4 (because it has no unique values) but you would keep record 5 (because it's value for Col B is unique).

Or do you just want to know that column A has unique values (1,2) and column B has unique values (2,4,5), but column C has no unique values.

Are either of those right? If not, can you explain with an example?

Badge +22

If I understand your needs you want a list of unique values for each column, not the unique combination of values.

 

 

I would do this in python, but it should be possible using an AttributeExploder, DuplicateFilter on both _attr_name and _attr_value, and a ListBuilder (Group By _attr_name, selected attribute _attr_value).

 

 

That should give you a feature for each column with a list of unique values for that column.

 

 

Depending on what exactly you want to do with the information, there are other options in the recombining.
Badge

Hello @Mark2AtSafe,

Thank you for your reply.

So based on your example I want this:

from column A I want to keep Record 1-2-3-4-5.

from column B I want to keep values 1,2,3

from column C I want to keep 2,3,4,5

from column D I want to keep 3,5.

I want a functionality similar when you filter in Excel and you can see all the available values that exist in a column, but not by count, just the distinct values that exist there. Hopefully this provides an answer to @jdh as well.

Thank you both very much.

George

Badge

If I understand your needs you want a list of unique values for each column, not the unique combination of values.

 

 

I would do this in python, but it should be possible using an AttributeExploder, DuplicateFilter on both _attr_name and _attr_value, and a ListBuilder (Group By _attr_name, selected attribute _attr_value).

 

 

That should give you a feature for each column with a list of unique values for that column.

 

 

Depending on what exactly you want to do with the information, there are other options in the recombining.

Hello @jdh

Thank you very much! This has worked for me. Now, is it feasible to create a table from this list, with the attribute names as columns and the unique values under each column?

Thank you for your time.

Kind regards,

George

 

P.S (Edited): I understand that the purpose of this workflow is to create a list. However what I want as a final output is a CSV/Excel file having as Column Names the attt_names and as Values the Unique attr_values.

Badge +22

Hello @jdh

Thank you very much! This has worked for me. Now, is it feasible to create a table from this list, with the attribute names as columns and the unique values under each column?

Thank you for your time.

Kind regards,

George

 

P.S (Edited): I understand that the purpose of this workflow is to create a list. However what I want as a final output is a CSV/Excel file having as Column Names the attt_names and as Values the Unique attr_values.

Hmm, how about dropping the ListBuilder, putting a Counter after the DuplicateFilter with the counter name set to _attr_name. This will give you a value to group the features by later. The first unique value of each column will be assigned 0, the second 1 and so on.

An attributeCreator set to @Value(_attr_name) : @Value(_attr_value). This will restore the original attribute names so you can use a dynamic writer.

Aggregator group by _count, merge attributes.

Excel Writer, either in dynamic mode or import from reader. Automatic mode will not work.

 

 

You may wish to sort your features before the counter on _attr_value.

Reply