Question

How to remove duplicates from a join column.


Badge
  • Contributor
  • 8 replies

I have a dataset which came with duplicates in what I need to be a join column (PID) therefore unique.  How can I filter and remove the duplicate rows based on some rules. In this case I want to keep the consolidated value (NE,NW) in the qtr column and remove the other two rows with the common PID of 176182. (Shown above and below the row I wish to keep.)

 

FME


5 replies

Badge

duplicate filter transformer based on PID

Badge

That has no way to select which of the three gets passed. Thanks anyway.

Badge

That has no way to select which of the three gets passed. Thanks anyway.

hi mdk - if you know what the result is in column 2 you could create a rule for that in test filter. However, if it is random I don't see how you could select that field specifically.

If it is NE,NW then create a rule for that. Otherwise, there is no solution.

Userlevel 4
Badge +36

It looks like there are several ways to distinguish the consolidated row from the other rows with the same pid: the qtr column of the consolidated row has the longest string, also it contains a comma.

Depending on the rest of your data, one of these may be enough to set up your filter.

Userlevel 3
Badge +17

If you want the consolidated value above the unconsolidated value you could for example sort the data so the consolidated go first. Or test on values that have a comma and merge on those first then merge the non-comma-containing values with a second featureMerger to the unmerged requestors.

 

I think an overview of the total workbench could help us lead you to the right direction. Now we don't understand how you get consolidated values in the first place and why you only want to merge the consolidated value.

Reply