Solved

Finding matching text - grouped by attribute

11 years ago
March 7, 2014
8 replies
44 views

shemnel
25 replies

Hi there, I thought this seemed simple but i've become a bit stuck:

I have x features (many millions but im running a testset of 20k) and they all have an Parcel number. Each feature has an associated Category (one of eight) that is made up of 1 to 3 letters.

Now, each parcel number does not necessarily occur just once; there could be up to 9 instances of each parcel number and each instance may, or may not, have the same Category code.

I wanted to test the Category attribute - grouped by Parcel - to see if there was a complete set of Category duplicates ('Yes') or if the Category differed at all ('No') for a given parcel.

(Bonus qu: In some cases where there are 6 instances of a Parcel and there are 3 different categories, it is clear to see that the 3 different categories have been duplicated by mistake - because there are associated identical areas - does anyone know how to clean those out??)

thanks all

Best answer by takashi

OK. I think this workflow would be one of ways to do that.

1) Aggregator_1 counts number of records for every identical ParcelID.

-----

Group By: ParcelID

Mode: Attributes Only

Count Attributes: _num1

2) Sampler selects records having unique combination of ParcelID and Category.

-----

Group By: ParcelID, Category

Sampling Type: First N Features

Sampling Amount: 1

3) Aggregator_2 counts number of the combinations for every identical ParcelID. If all the records of an identical PracelID has the same Category code, _num2 will be 1.

-----

Group By: ParcelID

Mode: Attributes Only

Count Attributes: _num2

4) FeatureMerger merges _num1 and _num2 to original records.

-----

Join On: ParcelID

Feature Merge Type: Attribute Only

Process Duplicate Suppliers: Yes *this is important

5) AttributeCreator creates "Duplicate" attribute. If _num1 is greater than 1 and also _num2 is equal to 1, set "yes". Otherwise set "no".

-----

Attribute Name: Duplicate

Value: 2 Possible Values *use Conditional Value setting

If 1 < @Value(_num1) AND @Value(_num2) = 1 Then yes

Else no

View original

Did this help you find an answer to your question?

This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

takashi
7683 replies
11 years ago
March 9, 2014

Hi,

The requirement and actual data condition is vague, but the Matcher might help you.

If you explain more concretely the condition using simplified example, we can think of more concrete solution.

Takashi

shemnel
Author
25 replies
11 years ago
March 9, 2014

Sorry for any vagueness, here is a crude example of the data with only the relevant attributes filled in;

As you can see, some of the ParcelIDs are unique, that is fine, but some of them exist more than once and, for those that occur more than once, sometimes the Category code is the same across instances and sometimes it is different (ID 112 is the same but you can see 113's are not all the same - i am concerned where they are all the same).

I have put in a column saying 'duplicate' which is what im aiming to automate across many millions of features.

Actually I have come up with a very crude working using Listbuilder to extract Category grouped by ParcelID then AttributeCreator and a long series of testers and composite testers to see if categories match each other or not. There has to be an easier way as i've used about 12 transformers and still havent got it right!

takashi
7683 replies
11 years ago
March 9, 2014

I thought you are trying to find not-unique pair of Parcel ID and Category code. That is, expected result should be:

Unique:

111 CA

113 GP

114 F

Duplicated:

112 CP

113 GT ??

115 CP

But in your example, "113 GT" is not classified as "Duplicated".

Could you please define the condition where the pair of Parcel ID and Category should be decided as "Duplicated"?

shemnel
Author
25 replies
11 years ago
March 9, 2014

Hi Takashi, no not duplicated pairs, I am looking for duplicates to be identified when all the parcels of a given ID have the same category. If one instance of a parcel ID has a different category to the others, then it is not duplicate.

takashi
7683 replies
Best Answer
11 years ago
March 10, 2014

OK. I think this workflow would be one of ways to do that.

1) Aggregator_1 counts number of records for every identical ParcelID.

-----

Group By: ParcelID

Mode: Attributes Only

Count Attributes: _num1

2) Sampler selects records having unique combination of ParcelID and Category.

-----

Group By: ParcelID, Category

Sampling Type: First N Features

Sampling Amount: 1

3) Aggregator_2 counts number of the combinations for every identical ParcelID. If all the records of an identical PracelID has the same Category code, _num2 will be 1.

-----

Group By: ParcelID

Mode: Attributes Only

Count Attributes: _num2

4) FeatureMerger merges _num1 and _num2 to original records.

-----

Join On: ParcelID

Feature Merge Type: Attribute Only

Process Duplicate Suppliers: Yes *this is important

5) AttributeCreator creates "Duplicate" attribute. If _num1 is greater than 1 and also _num2 is equal to 1, set "yes". Otherwise set "no".

-----

Attribute Name: Duplicate

Value: 2 Possible Values *use Conditional Value setting

If 1 < @Value(_num1) AND @Value(_num2) = 1 Then yes

Else no

takashi
7683 replies
11 years ago
March 10, 2014

The workflow can be replaces with an InlineQuerier.

SQL Query Example:

-----

select *,

case

when 1 < _num1 and _num2 = 1 then 'yes'

else 'no'

end as Duplicate

from Parcel as a

inner join (select ParcelID, count(*) as _num1

from Parcel group by ParcelID) as b

on b.ParcelID = a.ParcelID

inner join (select ParcelID, count(*) as _num2

from (select distinct ParcelID, Category from Parcel) group by ParcelID) as c

on c.ParcelID = a.ParcelID

-----

shemnel
Author
25 replies
11 years ago
March 10, 2014

Thanks very much Takashi, i'll give it a go today

shemnel
Author
25 replies
11 years ago
March 10, 2014

At first inspection the workflow works correctly, thank you very much indeed. S.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Finding matching text - grouped by attribute

8 replies

Helpful Members This Week

Recently Solved Questions

Running Mammoth .docx to HTML with SystemCaller

Group processing by objects count and parallelize treatment

Using a Flow app as the trigger for a Flow automation

Rows to Columns

SQL Executor Read Only on SQLite

Community Stats

Latest FME

Cookie policy

Cookie settings

Related Topics

FME 2023.1 - Feature Highlights

Finding related points.icon

Sorting text data inside a "group by" from AttributeCombiner, Emailer formating optionicon

LISTHISTOGRAMMER AttributeCreator erroricon

Group By result not as expected with PointOnLine and NeighborFindericon

Helpful Members This Week

Recently Solved Questions

Running Mammoth .docx to HTML with SystemCaller

Group processing by objects count and parallelize treatment

Using a Flow app as the trigger for a Flow automation

Rows to Columns

SQL Executor Read Only on SQLite

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings