Hi,
The requirement and actual data condition is vague, but the Matcher might help you.
If you explain more concretely the condition using simplified example, we can think of more concrete solution.
Takashi
Sorry for any vagueness, here is a crude example of the data with only the relevant attributes filled in;
As you can see, some of the ParcelIDs are unique, that is fine, but some of them exist more than once and, for those that occur more than once, sometimes the Category code is the same across instances and sometimes it is different (ID 112 is the same but you can see 113's are not all the same - i am concerned where they are all the same).
I have put in a column saying 'duplicate' which is what im aiming to automate across many millions of features.
Actually I have come up with a very crude working using Listbuilder to extract Category grouped by ParcelID then AttributeCreator and a long series of testers and composite testers to see if categories match each other or not. There has to be an easier way as i've used about 12 transformers and still havent got it right!
I thought you are trying to find not-unique pair of Parcel ID and Category code. That is, expected result should be:
Unique:
111 CA
113 GP
114 F
Duplicated:
112 CP
113 GT ??
115 CP
But in your example, "113 GT" is not classified as "Duplicated".
Could you please define the condition where the pair of Parcel ID and Category should be decided as "Duplicated"?
Hi Takashi, no not duplicated pairs, I am looking for duplicates to be identified when all the parcels of a given ID have the same category. If one instance of a parcel ID has a different category to the others, then it is not duplicate.
OK. I think this workflow would be one of ways to do that.
1) Aggregator_1 counts number of records for every identical ParcelID.
-----
Group By: ParcelID
Mode: Attributes Only
Count Attributes: _num1
2) Sampler selects records having unique combination of ParcelID and Category.
-----
Group By: ParcelID, Category
Sampling Type: First N Features
Sampling Amount: 1
3) Aggregator_2 counts number of the combinations for every identical ParcelID. If all the records of an identical PracelID has the same Category code, _num2 will be 1.
-----
Group By: ParcelID
Mode: Attributes Only
Count Attributes: _num2
4) FeatureMerger merges _num1 and _num2 to original records.
-----
Join On: ParcelID
Feature Merge Type: Attribute Only
Process Duplicate Suppliers: Yes *this is important
5) AttributeCreator creates "Duplicate" attribute. If _num1 is greater than 1 and also _num2 is equal to 1, set "yes". Otherwise set "no".
-----
Attribute Name: Duplicate
Value: 2 Possible Values *use Conditional Value setting
If 1 < @Value(_num1) AND @Value(_num2) = 1 Then yes
Else no
The workflow can be replaces with an InlineQuerier.
SQL Query Example:
-----
select *,
case
when 1 < _num1 and _num2 = 1 then 'yes'
else 'no'
end as Duplicate
from Parcel as a
inner join (select ParcelID, count(*) as _num1
from Parcel group by ParcelID) as b
on b.ParcelID = a.ParcelID
inner join (select ParcelID, count(*) as _num2
from (select distinct ParcelID, Category from Parcel) group by ParcelID) as c
on c.ParcelID = a.ParcelID
-----
Thanks very much Takashi, i'll give it a go today
At first inspection the workflow works correctly, thank you very much indeed. S.