Question

I have 2 address based datasets that need matching. I have one data set with UPRNs and the second containing all subscriber addresses that need to be matched up. How can I format my data to better my fuzzy match output?

3 years ago
July 19, 2021
1 reply
79 views

jr06

Within the two data sets there is the common field of subscriber address and address. However I believe there is an issue with the formatting as when I fuzzy matched the Subscriber address with the UPRN dataset the ratio was poor yet when manually checking the fuzzy match output they were mostly correct. Is there an extra step to reduce the issue of factors such as spacing and extra information such as "Nottinghamshire" through transformers? I hope the blurry images below help explain the formats

Thank you for any help or advice!!

+29

jkr_wrk
381 replies
3 years ago
July 20, 2021

I guess the UPRN Data set contained line breaks but those got lost along the way?

Maybe try to find those linebreaks again because it makes matching the data more complex.

Or you could try to split out the city names based on some dataset with city names. And first try to match on that.

In the Netherlands postal codes are a very strict and recognizable format 4 digits and 2 characters. And it narrows down the search almost to the exact street within a town. So first try to find the postal codes and match on those would help a lot. It is probably always the last 7 characters of your dataset?

💡 Did you know... The FeatureWriter is more intuitive. 😏

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

I have 2 address based datasets that need matching. I have one data set with UPRNs and the second containing all subscriber addresses that need to be matched up. How can I format my data to better my fuzzy match output?

2 Attachments