Skip to main content
Solved

finding duplicates then selecting one based on a value in another attribute

  • September 7, 2017
  • 6 replies
  • 356 views

tnarladni
Enthusiast
Forum|alt.badge.img+16

I have 3 attributes that could potentially have duplicates. A, B, C. After finding the duplicates, I have to make a decision on which ones to keep based on attribute D. The scenario is as followed: if A,B,C are the same, take the record where D is not null. If D is null for both records, take any one. If D both have values, take both unless the value in D is the same then take just one. Help!

Best answer by erik_jan

If the features do not have a geometry, this could work:

Use an Aggregator, group by A, B, C, creating a list for all other attributes.

Use the ListSorter on D (to move the object with a D value to the top of the list).

Then use the ListIndexer (index 0) and ListRemover to keep the first object in the list.

If you do have geometries I would first use the GeometryExtractor to safe the geometry in an attribute and reverse that at the end using the GeometryReplacer.

View original
Did this help you find an answer to your question?

6 replies

erik_jan
Contributor
Forum|alt.badge.img+18
  • Contributor
  • Best Answer
  • September 7, 2017

If the features do not have a geometry, this could work:

Use an Aggregator, group by A, B, C, creating a list for all other attributes.

Use the ListSorter on D (to move the object with a D value to the top of the list).

Then use the ListIndexer (index 0) and ListRemover to keep the first object in the list.

If you do have geometries I would first use the GeometryExtractor to safe the geometry in an attribute and reverse that at the end using the GeometryReplacer.


tnarladni
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 7, 2017

Thanks @erik_jan! That worked beautifully. I was messing with duplicateFilter and Matcher all without success. Here's my final workspace if anyone happens upon this question.


takashi
Influencer
  • September 7, 2017

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.


takashi
Influencer
  • September 7, 2017
takashi wrote:

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

Edited the workflow (screenshot).

 

 


tnarladni
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • September 11, 2017
takashi wrote:

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

Ah...I forgot to sort D first, that was my problem.

 

 


kimo
Contributor
Forum|alt.badge.img+10
  • Contributor
  • September 12, 2017
takashi wrote:

Hi @tnarladni, if I understood the requirement correctly, the DuplicateFilter can be used like this.

Note: It assumed D always stores a non-empty value or the null. If D could store the empty string or could be missing, a minor change would be necessary depending on how the empty and missing should be treated.

I have an identical problem, but the second value is a timestamp, so I want to keep the latest.

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings