Skip to main content
Question

Compare Two Lists with FuzzyStringComparer and Grouping


Forum|alt.badge.img

Hello kind people

I have a workspace which brings in two datasets, each of which holds a property reference number and a name. What I need to do is compare the similarity of the names held in each dataset for each property. Both datasets have multiple (but different numbers of) names per property i.e.

DATASET 1:

1, BOB

1, ANNA

2, TED

DATASET 2:

1, BOBBY

1, TINA

1, IAN

2, TIM

2, BOB

What I want to do is use FuzzyStringComparer to evaluate each record in dataset 1 against all other records in dataset 2 which share the same property reference number.

So in the example above, the highest value FuzzyStringComparer would return would be for property 1 (BOB v BOBBY).

Can anyone offer any suggestions on how best to approach this?

Thanks in advance,

Riley

2 replies

takashi
Influencer
  • August 27, 2016

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.


Forum|alt.badge.img
  • Author
  • August 31, 2016
takashi wrote:

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.

thanks @takashi and sorry for the slow reply. For some reason I wasn't notified of your reply, but I'd got to the same solution eventually.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings