Question

FuzzyStringCompareFrom 2 data set not working properly

6 years ago
February 5, 2019
5 replies
5 views

+11

boubcher
Contributor
212 replies

@DaveAtSafe

hello, Dave, we are using the fuzzyStringcompare software in order to compare Arabic strings from 2 different data set, but it's not giving the expected result even when the ration is high,

Since you have been involved in this costom transformer any idea on how we could fix this

Thanks

+19

daveatsafe
Safer
1623 replies
6 years ago
February 5, 2019

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

+11

boubcher
Author
Contributor
212 replies
6 years ago
February 6, 2019

daveatsafe wrote:

Hi @boubcher,

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

+19

daveatsafe
Safer
1623 replies
6 years ago
February 6, 2019

boubcher wrote:

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

Hi @boubcher,

I'm sorry, I was looking at the wrong transformer. I didn't write the FuzzyStringCompareFrom2Datasets, but I do see what it is doing.

It is finding the best match for the first dataset in the second dataset, and adding that value and the ratio to the data from the first dataset. From the results you posted, it seems to be working correctly.

The best match is not necessarily a good match, so you may want to use a Tester to test the match ratio to remove the low quality matches.

+11

boubcher
Author
Contributor
212 replies
6 years ago
February 7, 2019

boubcher wrote:

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??

+19

daveatsafe
Safer
1623 replies
6 years ago
February 7, 2019

boubcher wrote:

@DaveAtSafe

Thanks for your response

Hi @boubcher,

I believe it is comparing letter by letter, but for more complete information please see the Python difflib documentation

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

FuzzyStringCompareFrom 2 data set not working properly