@DaveAtSafehello, Dave, we are using the fuzzyStringcompare software in order to compare Arabic strings from 2 different data set, but it's not giving the expected result even when the ration is high,Since you have been involved in this costom transformer any idea on how we could fix this Thanks

FuzzyStringCompareFrom 2 data set not working properly

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

Hi @boubcher,

The transformer uses the Python difflib module to calculate the similarity ratio, after converting both strings to lower case. According to the Python documentation the ratio is calculated as follows:

"Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common."

If you supply the transformer with an Output Comparison Attribute, it will give you a more detailed view of how the two attribute values differ.

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

Hi @boubcher,

I'm sorry, I was looking at the wrong transformer. I didn't write the FuzzyStringCompareFrom2Datasets, but I do see what it is doing.

It is finding the best match for the first dataset in the second dataset, and adding that value and the ratio to the data from the first dataset. From the results you posted, it seems to be working correctly.

The best match is not necessarily a good match, so you may want to use a Tester to test the match ratio to remove the low quality matches.

@DaveAtSafe

Thanks, Dave

if you mean I have to put the none matching result again into the transformation process. which I did but didn't work give exactly the same, or do you mean something else

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??

@DaveAtSafe

Thanks for your response

the transformer is working fine I did use a tester for all ration above 0.7, but I am wondering why is giving a ratio of 0.57, for example when both words completely different in the spelling. is he comparing letter by letter and word by word ??

Hi @boubcher,

I believe it is comparing letter by letter, but for more complete information please see the Python difflib documentation

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded