Skip to main content
Question

FuzzyStringCompareFrom 2 data set not working properly

  • February 5, 2019
  • 5 replies
  • 22 views

boubcher
Contributor
Forum|alt.badge.img+11

@DaveAtSafe

hello, Dave, we are using the fuzzyStringcompare software in order to compare Arabic strings from 2 different data set, but it's not giving the expected result even when the ration is high,

Since you have been involved in this costom transformer any idea on how we could fix this

Thanks

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • 1637 replies
  • February 5, 2019

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.


boubcher
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • 212 replies
  • February 6, 2019

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • 1637 replies
  • February 6, 2019

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 

Hi @boubcher,

I'm sorry, I was looking at the wrong transformer. I didn't write the FuzzyStringCompareFrom2Datasets, but I do see what it is doing.

It is finding the best match for the first dataset in the second dataset, and adding that value and the ratio to the data from the first dataset. From the results you posted, it seems to be working correctly.

The best match is not necessarily a good match, so you may want to use a Tester to test the match ratio to remove the low quality matches.


boubcher
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • 212 replies
  • February 7, 2019

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • 1637 replies
  • February 7, 2019

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??

Hi @boubcher,

I believe it is comparing letter by letter, but for more complete information please see the Python difflib documentation