Question

Compare Two Lists with FuzzyStringComparer and Grouping

  • 26 August 2016
  • 2 replies
  • 1 view

Badge

Hello kind people

I have a workspace which brings in two datasets, each of which holds a property reference number and a name. What I need to do is compare the similarity of the names held in each dataset for each property. Both datasets have multiple (but different numbers of) names per property i.e.

DATASET 1:

1, BOB

1, ANNA

2, TED

DATASET 2:

1, BOBBY

1, TINA

1, IAN

2, TIM

2, BOB

What I want to do is use FuzzyStringComparer to evaluate each record in dataset 1 against all other records in dataset 2 which share the same property reference number.

So in the example above, the highest value FuzzyStringComparer would return would be for property 1 (BOB v BOBBY).

Can anyone offer any suggestions on how best to approach this?

Thanks in advance,

Riley


2 replies

Userlevel 2
Badge +17

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.

Badge

Hi @rileym, you can use the FeatureMerger and ListExploder to create every combination of names with the same reference number.

  • FeatureMerger: Send dataset 1 features to the Requestor port, send dataset 2 features to the Supplier port. Set reference number attribute to the 'Join On' parameter, check the 'Generate List' and set a list name to the 'List Name' parameter.
  • ListExploder: Connect a ListExploder to the 'Merged' port and then set the list name to its 'List Attribute' parameter.

Each output feature from the ListExploder will have a combination of names from the two datasets. You can then compare them with the FuzzyStringComparer.

Alternatively, the InlineQuerier can also be used.

thanks @takashi and sorry for the slow reply. For some reason I wasn't notified of your reply, but I'd got to the same solution eventually.

Reply