Skip to main content
Question

FuzzyStringCompareFrom 2 data set not working properly


boubcher
Contributor
Forum|alt.badge.img+11

@DaveAtSafe

hello, Dave, we are using the fuzzyStringcompare software in order to compare Arabic strings from 2 different data set, but it's not giving the expected result even when the ration is high,

Since you have been involved in this costom transformer any idea on how we could fix this

Thanks

 

5 replies

daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • February 5, 2019

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.


boubcher
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • February 6, 2019
daveatsafe wrote:

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • February 6, 2019
boubcher wrote:

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 

Hi @boubcher,

I'm sorry, I was looking at the wrong transformer. I didn't write the FuzzyStringCompareFrom2Datasets, but I do see what it is doing.

It is finding the best match for the first dataset in the second dataset, and adding that value and the ratio to the data from the first dataset. From the results you posted, it seems to be working correctly.

The best match is not necessarily a good match, so you may want to use a Tester to test the match ratio to remove the low quality matches.


boubcher
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • February 7, 2019
boubcher wrote:

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

 

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • February 7, 2019
boubcher wrote:

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??

Hi @boubcher,

I believe it is comparing letter by letter, but for more complete information please see the Python difflib documentation


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings