Skip to main content
Solved

Sorter vs Statistical Calculator

  • November 9, 2022
  • 4 replies
  • 32 views

Forum|alt.badge.img+1

I need to remove duplicates from varying sized datasets. Would it be more efficient to use a sorter with a sampler or a statistical calculator with a tester?

Best answer by dustin

I would use the Sampler with Group Processing set to whatever your duplicate value is. No need to use the Sorter before it. Alternatively, you can use the DuplicateFilter.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

dustin
Influencer
Forum|alt.badge.img+31
  • Influencer
  • 627 replies
  • Best Answer
  • November 9, 2022

I would use the Sampler with Group Processing set to whatever your duplicate value is. No need to use the Sorter before it. Alternatively, you can use the DuplicateFilter.


Forum|alt.badge.img+1
  • Author
  • 14 replies
  • November 9, 2022

Didn't even think of that! That really helps keeps things simple!

 


david_r
Celebrity
  • 8391 replies
  • November 10, 2022

I would avoid both the Sorter and the StatisticsCalculator, as they are both blocking and can consume a lot of memory for larger datasets.

As @dustin​ mentions, either the Sampler with a Group-By, or the DuplicateFilter (my personal recommendation) will work.


Forum|alt.badge.img+2
  • 1891 replies
  • November 10, 2022

@bibold​ I would agree with @david_r​ - try the DuplicateFilter first. It should give better performance than sorting