Skip to main content
Question

Compare Two Lists with FuzzyStringComparer and Grouping

  • August 26, 2016
  • 2 replies
  • 102 views

Forum|alt.badge.img

Hello kind people

I have a workspace which brings in two datasets, each of which holds a property reference number and a name. What I need to do is compare the similarity of the names held in each dataset for each property. Both datasets have multiple (but different numbers of) names per property i.e.

DATASET 1:

1, BOB

1, ANNA

2, TED

DATASET 2:

1, BOBBY

1, TINA

1, IAN

2, TIM

2, BOB

What I want to do is use FuzzyStringComparer to evaluate each record in dataset 1 against all other records in dataset 2 which share the same property reference number.

So in the example above, the highest value FuzzyStringComparer would return would be for property 1 (BOB v BOBBY).

Can anyone offer any suggestions on how best to approach this?

Thanks in advance,

Riley

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

2 replies

takashi
Celebrity
  • August 27, 2016

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.


Forum|alt.badge.img
  • Author
  • August 31, 2016

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.

thanks @takashi and sorry for the slow reply. For some reason I wasn't notified of your reply, but I'd got to the same solution eventually.