Skip to main content
Archived

FeatureMerger - Joining syncronously

Related products:Transformers
  • February 27, 2017
  • 0 replies
  • 6 views

Forum|alt.badge.img

Similar to this idea (

https://knowledge.safe.com/content/idea/19290/two-input-port-ordered-by-group-asynchronously.html - which I still really need to be able to optimise many workspaces), it would be good if the FeatureMerger could handle Join On fields that come in synchronously (Note: Join On, not Group By - assume Group By is empty for this example), but not grouped as All Suppliers, then All Requesters.

Currently if I have this input:

Requester_id = 1

....Supplier_id = 1

Requester_id = 2

....Supplier_id = 2

And so FeatureMerger will cache the both Request and Supplier entirely before doing the merges; a massive problem for anything more than a medium sized dataset, and makes quite a few workspaces effectively impossible.

I guess a cardinality parameter like DatabaseJoiner has could be used, but that's optional.

This would make a lot of workspaces for any large datasets actually workable. Currently I spend a lot of time splitting up datasets in overly-complex ways so that I can process them in chunks to work around memory limits.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.