Skip to main content
Solved

Keeping unique values from multiple Excel columns

  • November 7, 2018
  • 7 replies
  • 144 views

Forum|alt.badge.img

Hello,

I have an Excel file that consists of 171 columns. I would like to keep the unique values for every single column.

I tried the DuplicateFilter but it returns me the unique values of only the 1st column. This means I would have to repeat the process for 171 times.

Do you reckon that Python is a one-way road? I would appreciate your thoughts on this.

Thank you for your time.

Kind regards,

George

Best answer by jdh

If I understand your needs you want a list of unique values for each column, not the unique combination of values.

 

 

I would do this in python, but it should be possible using an AttributeExploder, DuplicateFilter on both _attr_name and _attr_value, and a ListBuilder (Group By _attr_name, selected attribute _attr_value).

 

 

That should give you a feature for each column with a list of unique values for that column.

 

 

Depending on what exactly you want to do with the information, there are other options in the recombining.
View original
Did this help you find an answer to your question?

7 replies

danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • November 7, 2018

Hi @georgefloros

 

Please, check the transformer Matcher.

This transformer has 3 output ports and is possible to select more than 1 Attribute to analyse:

 

Thanks,

Danilo


Forum|alt.badge.img

Hello @danilo_fme,

I also came across to this, but it looks like it does not do what I really want. Please find attached a screenshot of the outcome. Again, it works fine for one column, but when I select 3 it also brings values from other columns that do now allow me to identify unique values for every column. AO1 location should have only 3 unique values but now I cannot identify them.

Hopefully the screenshot would be more enlighting.


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • November 7, 2018

So, to be sure I understand, do you want to keep every record, as long as one of its values is unique for a column?

eg if the columns were ABC and you had...

Record 1

 

1

 

2

 

3

 

Record 2

 

2

 

3

 

5Record 3

 

3

 

4

 

5

 

Record 4

 

3

 

3

 

3

 

Record 5

 

3

 

5

 

5

 

 

Then you could keep record 1 and record 2 (because their values for Col A are unique) and keep record 3 (because it's value for Col B is unique), you would not keep record 4 (because it has no unique values) but you would keep record 5 (because it's value for Col B is unique).

Or do you just want to know that column A has unique values (1,2) and column B has unique values (2,4,5), but column C has no unique values.

Are either of those right? If not, can you explain with an example?


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • Best Answer
  • November 7, 2018

If I understand your needs you want a list of unique values for each column, not the unique combination of values.

 

 

I would do this in python, but it should be possible using an AttributeExploder, DuplicateFilter on both _attr_name and _attr_value, and a ListBuilder (Group By _attr_name, selected attribute _attr_value).

 

 

That should give you a feature for each column with a list of unique values for that column.

 

 

Depending on what exactly you want to do with the information, there are other options in the recombining.

Forum|alt.badge.img

Hello @Mark2AtSafe,

Thank you for your reply.

So based on your example I want this:

from column A I want to keep Record 1-2-3-4-5.

from column B I want to keep values 1,2,3

from column C I want to keep 2,3,4,5

from column D I want to keep 3,5.

I want a functionality similar when you filter in Excel and you can see all the available values that exist in a column, but not by count, just the distinct values that exist there. Hopefully this provides an answer to @jdh as well.

Thank you both very much.

George


Forum|alt.badge.img
jdh wrote:

If I understand your needs you want a list of unique values for each column, not the unique combination of values.

 

 

I would do this in python, but it should be possible using an AttributeExploder, DuplicateFilter on both _attr_name and _attr_value, and a ListBuilder (Group By _attr_name, selected attribute _attr_value).

 

 

That should give you a feature for each column with a list of unique values for that column.

 

 

Depending on what exactly you want to do with the information, there are other options in the recombining.

Hello @jdh

Thank you very much! This has worked for me. Now, is it feasible to create a table from this list, with the attribute names as columns and the unique values under each column?

Thank you for your time.

Kind regards,

George

 

P.S (Edited): I understand that the purpose of this workflow is to create a list. However what I want as a final output is a CSV/Excel file having as Column Names the attt_names and as Values the Unique attr_values.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • November 8, 2018
georgefloros wrote:

Hello @jdh

Thank you very much! This has worked for me. Now, is it feasible to create a table from this list, with the attribute names as columns and the unique values under each column?

Thank you for your time.

Kind regards,

George

 

P.S (Edited): I understand that the purpose of this workflow is to create a list. However what I want as a final output is a CSV/Excel file having as Column Names the attt_names and as Values the Unique attr_values.

Hmm, how about dropping the ListBuilder, putting a Counter after the DuplicateFilter with the counter name set to _attr_name. This will give you a value to group the features by later. The first unique value of each column will be assigned 0, the second 1 and so on.

An attributeCreator set to @Value(_attr_name) : @Value(_attr_value). This will restore the original attribute names so you can use a dynamic writer.

Aggregator group by _count, merge attributes.

Excel Writer, either in dynamic mode or import from reader. Automatic mode will not work.

 

 

You may wish to sort your features before the counter on _attr_value.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings