Sorter vs Statistical Calculator

I need to remove duplicates from varying sized datasets. Would it be more efficient to use a sorter with a sampler or a statistical calculator with a tester?

Page 1 / 1

I would use the Sampler with Group Processing set to whatever your duplicate value is. No need to use the Sorter before it. Alternatively, you can use the DuplicateFilter.

Didn't even think of that! That really helps keeps things simple!

I would avoid both the Sorter and the StatisticsCalculator, as they are both blocking and can consume a lot of memory for larger datasets.

As @dustin mentions, either the Sampler with a Group-By, or the DuplicateFilter (my personal recommendation) will work.

@bibold I would agree with @david_r - try the DuplicateFilter first. It should give better performance than sorting

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded