Question

Comparing fuzzy strings with multiple features, to identify the closest match

  • 18 May 2018
  • 1 reply
  • 12 views

Badge

Hi Guys,

I'm here with yet another brick wall I'm facing. This time a pretty complex one, at least in my mind.

What I have as my inputs:

Two line layers with name attributes:

Let's call the Blue lines my base layer, and the Red lines the match layer. The red lines overlap the Blue lines everywhere, so in other words, where there is Red, there is Blue as well underneath, although the geometry will not always match 100%. Both layers have name attributes.

What I need:

I need to look at the name of each Red feature, and see if I can find any matches in nearby Blue features in its name attribute. If there are multiple matches, I need to find which one is the closest, percentage wise. In the end, the idea is to identify Red lines where there are no blue lines with names that are >70 similar to Blue lines within a 20 meter radius.

What I have so far:

I've used a NeighborFinder to get a maximum of 5 neighbors within a 20m range. I then have a ListConcatenator to add an Attribute of each neighbor Blue line name to each Red Line in my output, comma separated. I also have a FuzzyStringComparer, which gives me a match percentage between the Red and Blue line name value, though I'm assuming this is just for the first neighbor found.

So I can see what all the neighbors are, I can see what each of their names are, and I can see a match between Red name and (one of?) Blue name values. I'm just not sure how to bring this all together and ensure that the Red name gets compared against all neighbor blue lines, and that I can also identify the match of the lot with the highest percentage, so that I can then determine which are above and below 70.

I know this is quite convoluted so please feel free to ask if any of my explanation is unclear. :)

Thanks a million in advance,


1 reply

Badge

I don't know if this will help you but I would split up the red line into points using an ArcStroker + Chopper. Then I would use the neighborfinder to find the closest point on the blue line. In the neighborfinder output you will then have the distance for each feature to the blue line. Then you can use a tester and remove each part that does not satisfies the criteria.

Another suggestion would be to use the LineOnLineOverlayer.

I hope any of this will help you.

/Fredrik

Reply