Skip to main content
Solved

Passing a table to the attributeExploder but keeping specific attributes unexploded


adriano
Forum|alt.badge.img+1

I am trying to create a workflow to normalize a table with potentially thousands of columns while maintaining the id column. This would be an easy solution if the AttributeExploder allowed you to specify a column (or columns) to keep unexploded.

Since this transformer does not allow this, a workaround I have come up with is to use the AttributeExploder with the option in the transformer to "keep all attributes" and then use an attribute keeper transformer to keep the "id", "_attr_name","_attr_value" columns. This would generally solve the problem but since there is thousands of columns, if we use the "keep all attributes" option in the AttributeExploder the workbench performs exponentially slower to complete (it will go from taking 2 hours to complete, to 40 hours to complete). Is there potential workaround which can solve this problem within a more reasonable processing time?

 

Best answer by takashi

Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.

View original
Did this help you find an answer to your question?

5 replies

fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • March 7, 2018

Can you use the AttributeKeeper before the exploder?


takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • Best Answer
  • March 7, 2018

Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.


takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • March 7, 2018
takashi wrote:

Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.

Anyway, the performance of "exploder" transformers is not good generally. If the performance is critical, it might be better that you think of another approach to design the entire workflow without using either AttributeExploder or ListExploder.

 


adriano
Forum|alt.badge.img+1
  • Author
  • March 8, 2018
takashi wrote:

Hi @adriano_n90, if you need to explode all the thousands columns into individual features, but only the attribute "id" should be kept, this procedure could be better on the performance.

  1. AttributeExploder (Exploding Type: List, Keep Attributes: Yes)
  2. AttributeKeeper (Attriubutes to Keep: id, Lists to Keep: _attr_list{}._attr_name, _attr_list{}._attr_name)
  3. ListExploder (List Attribute: _attr_list{})

If you are familiar with Python scripting, the PythonCaller could be a better alternative.

Thank you so much takashi! This solution worked flawlessly and the performance was 10x better than the workflow I posted above.

 

 


adriano
Forum|alt.badge.img+1
  • Author
  • March 8, 2018
fmelizard wrote:

Can you use the AttributeKeeper before the exploder?

Hi Matt, no because I technically need to maintain the relationship between all the columns to know which belongs to the id. So I would technically need to keep all the attributes.

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings