Skip to main content

I am trying to create a workflow to normalize a table with potentially thousands of columns while maintaining the id column. This would be an easy solution if the AttributeExploder allowed you to specify a column (or columns) to keep unexploded.

Since this transformer does not allow this, a workaround I have come up with is to use the AttributeExploder with the option in the transformer to "keep all attributes" and then use an attribute keeper transformer to keep the "id", "_attr_name","_attr_value" columns. This would generally solve the problem but since there is thousands of columns, if we use the "keep all attributes" option in the AttributeExploder the workbench performs exponentially slower to complete (it will go from taking 2 hours to complete, to 40 hours to complete). Is there potential workaround which can solve this problem within a more reasonable processing time?

 

Can you use the AttributeKeeper before the exploder?


Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.


Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.

Anyway, the performance of "exploder" transformers is not good generally. If the performance is critical, it might be better that you think of another approach to design the entire workflow without using either AttributeExploder or ListExploder.

 


Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.

Thank you so much takashi! This solution worked flawlessly and the performance was 10x better than the workflow I posted above.

 

 


Can you use the AttributeKeeper before the exploder?

Hi Matt, no because I technically need to maintain the relationship between all the columns to know which belongs to the id. So I would technically need to keep all the attributes.

 

 


Reply