Skip to main content
Question

Fuzzy search for duplicates in one column

  • July 3, 2019
  • 1 reply
  • 56 views

jayqueue
Forum|alt.badge.img

Hello,

I have a dataset

Example:

NameFirstnameAddressConcattedStringMichaelsJohnFirststreet 5Michaels_John_Firststreet 5MichaelsJonFirststreet 5Michaels_Jon_Firststreet 5MychaelJohnFirtstreet 5Mychael_John_Firtstreet 5

 

"ConcattedString" is a field I generated with AttributeCreator because I think it's easier to find duplicates.

I don't want to remove them just want to show possible candidates as a "group".

I experimented with FuzzyDuplicateRemover, Matcher, ... with no luck.

Read the https://knowledge.safe.com/articles/53183/data-qa-identifying-duplicate-attribute-values.html, but I can't figure it out.

Is it even possible? If yes, can someone give me a little push in the right direction? :-)

Using FME 2019 build 19253

 

TIA

 

-Jonathan

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

1 reply

jayqueue
Forum|alt.badge.img
  • Author
  • 24 replies
  • July 3, 2019