Question

Summing similar attributes without transposing

  • 19 January 2015
  • 6 replies
  • 1 view

Hi,

 

I have a csv containing >250,000 features and around 50 attributes, which I want to sum based on a common property (e.g. a suffix/preffix in the atribute name). I've done this before on smaller datasets, by using AttributeExploder to transpose the attributes; adding the common feature I want to base the sum on using AttributeValueMapper; aggregating them  and pivoting the output.

 

However, I think with this many attributes it will be quite memory hungry, is there an alternative approach I could take - possibly using SchemaMapper or something?

 

Thanks,

 

Ed

6 replies

Badge +3
You could attribute explode and  use a regexp to select the similar -attr_name and then create a accumulative attribute and push _attr_value to a variable using a variablesetter.

 

(You could create a list of the _attr_name and use  a fuzzy comparer to try and automatically match similars)

 

VariabelRetriever is then Called upon the regexp finding subsequent similar attributes.

 

 

You would be progressively summing attributes.
Hi Gio,

 

Thanks for your response.

 

I was hoping to avoid using AttributeExploader as with >250,000 features it'll create >10,000,000 features. I was hoping to be able to map a preffix to each atribute name using SchemaMapper and then aggregate those with the same preffix, without having to transpose the attributes (exploding).

 

I'm sure there must be a way, that doesn't require generating unecessary features

 

Ed
Badge +3
I see.

 

 

Well it seems to me you can make a schema, using schemamapper. Then sum them grouping by schemanames and using a aggregator.

 

Did you not try it yet?

 

50 attributes is not that much, but you could create the schemamap using regular expression search.

 

 

Alternatively maybe you can use a BulkAttribute renamer with RegularExpression Replace. But then you would have to make as many BA's as there are groups..
Yeah I've got it working with smaller datasets (as explained in the question) - The problem with that though is that you can't aggregate the attributes without transposing them (as far as I can tell anyway?).

 

 
Badge +3
Hi,

 

 

Another possibility might be the use of a listpopulator followed by a listindexer and sum the items on same index (same record bassicaly). 

 

 

U create a common prefix, the populator crates a list based on this prefix.

 

 

Of course then u would have to create a list, wich u also might not want?

 

 
That sounds like a good idea, cheers Gio

Reply