Skip to main content
Question

Summing similar attributes without transposing

  • January 19, 2015
  • 6 replies
  • 27 views

Hi,

 

I have a csv containing >250,000 features and around 50 attributes, which I want to sum based on a common property (e.g. a suffix/preffix in the atribute name). I've done this before on smaller datasets, by using AttributeExploder to transpose the attributes; adding the common feature I want to base the sum on using AttributeValueMapper; aggregating them  and pivoting the output.

 

However, I think with this many attributes it will be quite memory hungry, is there an alternative approach I could take - possibly using SchemaMapper or something?

 

Thanks,

 

Ed
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

6 replies

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • January 19, 2015
You could attribute explode and  use a regexp to select the similar -attr_name and then create a accumulative attribute and push _attr_value to a variable using a variablesetter.

 

(You could create a list of the _attr_name and use  a fuzzy comparer to try and automatically match similars)

 

VariabelRetriever is then Called upon the regexp finding subsequent similar attributes.

 

 

You would be progressively summing attributes.

  • Author
  • January 19, 2015
Hi Gio,

 

Thanks for your response.

 

I was hoping to avoid using AttributeExploader as with >250,000 features it'll create >10,000,000 features. I was hoping to be able to map a preffix to each atribute name using SchemaMapper and then aggregate those with the same preffix, without having to transpose the attributes (exploding).

 

I'm sure there must be a way, that doesn't require generating unecessary features

 

Ed

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • January 19, 2015
I see.

 

 

Well it seems to me you can make a schema, using schemamapper. Then sum them grouping by schemanames and using a aggregator.

 

Did you not try it yet?

 

50 attributes is not that much, but you could create the schemamap using regular expression search.

 

 

Alternatively maybe you can use a BulkAttribute renamer with RegularExpression Replace. But then you would have to make as many BA's as there are groups..

  • Author
  • January 20, 2015
Yeah I've got it working with smaller datasets (as explained in the question) - The problem with that though is that you can't aggregate the attributes without transposing them (as far as I can tell anyway?).

 

 

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • January 20, 2015
Hi,

 

 

Another possibility might be the use of a listpopulator followed by a listindexer and sum the items on same index (same record bassicaly). 

 

 

U create a common prefix, the populator crates a list based on this prefix.

 

 

Of course then u would have to create a list, wich u also might not want?

 

 

  • Author
  • January 20, 2015
That sounds like a good idea, cheers Gio