Skip to main content
Solved

Remove duplicate entries

  • October 27, 2016
  • 4 replies
  • 35 views

Forum|alt.badge.img

Dear FME community,

I have a dataset which has for example the following two attributes:

- grundbescheid_nr

- bescheid_nr

If there are two identical rows with the same value in grundbescheid_nr (e.g. grundbescheid_nr= 183101), then only the entry with the higher number in bescheid_nr (in the example bescheid_nr = 186754) should be preserved. The entry marked with the blue background (bescheid_nr = 185582) and it's geometry should be deleted. The lower one(s) should therefore be eliminated.

I suppose I could do it with a ListBuilder somehow, but I'm not yet sure how to do it exactly. Is there an easy way to solve this?

Kind regards

Thomas

Best answer by david_r

You can use a Sorter and sort descending on bescheid_nr. Then use a DuplicateFilter (was called DuplicateRemover before FME 2016) on grundbescheid_nr and only take the unique record.

This works since the unique output port will always output the first unique value entering.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

david_r
Celebrity
  • Best Answer
  • October 27, 2016

You can use a Sorter and sort descending on bescheid_nr. Then use a DuplicateFilter (was called DuplicateRemover before FME 2016) on grundbescheid_nr and only take the unique record.

This works since the unique output port will always output the first unique value entering.


Forum|alt.badge.img
  • Author
  • October 27, 2016

You can use a Sorter and sort descending on bescheid_nr. Then use a DuplicateFilter (was called DuplicateRemover before FME 2016) on grundbescheid_nr and only take the unique record.

This works since the unique output port will always output the first unique value entering.

Wonderful, that works perfectly! Many thanks.

 


takashi
Celebrity
  • October 28, 2016

Hi @thomask, I also use the combination of Sorter and DuplicateFilter usually in a similar case, but it's not the only way. For example, the InlineQuerier with this SQL statement could also be effective. FYI.

select a.* from features as a
inner join (select grundbescheid_nr, max(bescheid_nr) as maxBescheidNr
    from features group by grundbescheid_nr) as b
on a.grundbescheid_nr = b.grundbescheid_nr and a.bescheid_nr = b.maxBescheidNr

Forum|alt.badge.img
  • Author
  • October 28, 2016

Thank you very much @takashi! This look good, but I'm not so experienced in SQL. It's been a while since we had it in our lectures :-)