I'm trying to compare two datasets by looking at the address attribute string used in both datasets and find the fuzzy matching ratios. The read process was fast and read the 350,000 records in dataset 1 and the 14,000 records in dataset 2 in less than a minute. I then sort both lists separately and then use the FuzzyStringCompareFrom2Datasets transformer. I have been running this workspace all day (about 5 hours so far) and it has only output 288 records. Is there a way to speed this up?
Solved
FuzzyStringCompareFrom2Datasets Slow
Best answer by paalped
FuzzyStringCompare2datsets does not look to support those big datasets, cause what is does it takes for every feature of your 350 000 features and adds a list of 14 000 features it the searches through that list and compare each string to the string to the string attribute you choose to compare(which equals to approx 4 900 000 000 comparison), then it sorts every list ( 350 000 times its sorts a list of length 14 000) by its ratio, and chooses the one with greates accuracy. this will of course be very time consuming with the sizes you are operating with.
Reply
Rich Text Editor, editor1
Editor toolbars
Press ALT 0 for help
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.