Solved

Matrix of all unique values per attribute field of a dataset

  • 31 January 2018
  • 5 replies
  • 10 views

Badge +1

Hi,

I have a new source with poor metadata. I would like to create a quick overview of the dataset by producing a matrix, where I have each attribute field and all unique values within that field listed. With aggregator or duplicate filter I can easily do one. But I have close to 100 fields, so there must be a more clever way...

Thanks for any suggestions!

icon

Best answer by nielsgerrits 31 January 2018, 10:57

View original

5 replies

Userlevel 5
Badge +25

If you use a Sampler, set it to pass only the first feature and then group it by all the fields you want to check it should output all unique attribute combinations for those fields. Not quite what you're looking for, but getting there.

Badge +1

If you use a Sampler, set it to pass only the first feature and then group it by all the fields you want to check it should output all unique attribute combinations for those fields. Not quite what you're looking for, but getting there.

thanks! I was thinking something similar with the aggregator, but indeed the answer is not quite there yet. need to remove all field level duplicates as there are a few million rows...

 

 

Badge +1
Currently thinking in the lines of "Compute histograms" within the StatisticsCalculator transformer. But have not figured the way yet...

 

 

Userlevel 6
Badge +32

AttributeExploder to create features of cells then aggregate by _attribute_name.

Badge +1

AttributeExploder to create features of cells then aggregate by _attribute_name.

Yes! this is exactly what I wanted. Thanks!

 

 

Reply