Skip to main content
Solved

finding duplicates then selecting one based on a value in another attribute

  • September 7, 2017
  • 6 replies
  • 417 views

tnarladni
Enthusiast
Forum|alt.badge.img+23

I have 3 attributes that could potentially have duplicates. A, B, C. After finding the duplicates, I have to make a decision on which ones to keep based on attribute D. The scenario is as followed: if A,B,C are the same, take the record where D is not null. If D is null for both records, take any one. If D both have values, take both unless the value in D is the same then take just one. Help!

Best answer by erik_jan

If the features do not have a geometry, this could work:

Use an Aggregator, group by A, B, C, creating a list for all other attributes.

Use the ListSorter on D (to move the object with a D value to the top of the list).

Then use the ListIndexer (index 0) and ListRemover to keep the first object in the list.

If you do have geometries I would first use the GeometryExtractor to safe the geometry in an attribute and reverse that at the end using the GeometryReplacer.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

6 replies

erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • Best Answer
  • September 7, 2017

If the features do not have a geometry, this could work:

Use an Aggregator, group by A, B, C, creating a list for all other attributes.

Use the ListSorter on D (to move the object with a D value to the top of the list).

Then use the ListIndexer (index 0) and ListRemover to keep the first object in the list.

If you do have geometries I would first use the GeometryExtractor to safe the geometry in an attribute and reverse that at the end using the GeometryReplacer.


tnarladni
Enthusiast
Forum|alt.badge.img+23
  • Author
  • Enthusiast
  • 88 replies
  • September 7, 2017

Thanks @erik_jan! That worked beautifully. I was messing with duplicateFilter and Matcher all without success. Here's my final workspace if anyone happens upon this question.


takashi
Celebrity
  • 7843 replies
  • September 7, 2017

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.


takashi
Celebrity
  • 7843 replies
  • September 7, 2017

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

Edited the workflow (screenshot).

 

 


tnarladni
Enthusiast
Forum|alt.badge.img+23
  • Author
  • Enthusiast
  • 88 replies
  • September 11, 2017

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

Ah...I forgot to sort D first, that was my problem.

 

 


kimo
Contributor
Forum|alt.badge.img+10
  • Contributor
  • 96 replies
  • September 12, 2017

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

I have an identical problem, but the second value is a timestamp, so I want to keep the latest.