Solved

Using NeighborFinder to find the best fit match for clusters of data


Badge

I am trying to get NeighborFinder to match effectively two different datasets in such as why that each candidate is only matched to one base but not all bases have candidates. (Should be able to find these by utilizing the Maximum Distance Field). However, when the data gets chaotic and there are 10 measurements on top of each other, the matching candidates (which are labels) are often not the closest candidates. I am wondering if anyone has an idea about how to match a cluster of measurements with their candidates in such as way that it produces the lowest average distance between the bases and candidates and makes sure that each candidate finds a reasonably close base. The post: https://knowledge.safe.com/questions/59605/neighborfinder-and-excluding-candidates-after-matc.html gives me an idea, but I do not think that it provides a complete answer to my problem.

This image illustrates the problem. I bet that using a waterfall technique would get about 20% of this wrong.

icon

Best answer by jdh 15 May 2018, 20:27

View original

4 replies

Badge +22

So what you are looking for is an implementation of a optimization for bipartite graphs. The most well known algorithm is the Hungarian Algorithm (Kuhn–Munkres algorithm)

There are R packages and python codes available online. Those can be integrated into FME.

 

 

That said, I think if you replace your hatch marks with a point at the intersection, and have the labels be your base and the hatch points your candidates, a "waterfall approach" could be effective.
Badge +22

So what you are looking for is an implementation of a optimization for bipartite graphs. The most well known algorithm is the Hungarian Algorithm (Kuhn–Munkres algorithm)

There are R packages and python codes available online. Those can be integrated into FME.

 

 

That said, I think if you replace your hatch marks with a point at the intersection, and have the labels be your base and the hatch points your candidates, a "waterfall approach" could be effective.
I would create a close candidate list and then drop into python to

 

A) pair all the labels/hatches that have only one candidate

 

B) remove the hatches that were paired from

 

C) pair all the labels/hatches that now have only one candidate

 

Repeat B/C until all you are left with is labels with multiple candidates

 

D) pair the overall closest candidate (not the closest for each label)

 

E) remove that hatch from any other close candidate list.

 

repeat B-D until all candidates are paired.

 

 

Badge
I would create a close candidate list and then drop into python to

 

A) pair all the labels/hatches that have only one candidate

 

B) remove the hatches that were paired from

 

C) pair all the labels/hatches that now have only one candidate

 

Repeat B/C until all you are left with is labels with multiple candidates

 

D) pair the overall closest candidate (not the closest for each label)

 

E) remove that hatch from any other close candidate list.

 

repeat B-D until all candidates are paired.

 

 

 

Could you briefly elaborate on what you mean with step D? What logic test might you use?
Badge +22

 

Could you briefly elaborate on what you mean with step D? What logic test might you use?
Once you've eliminated all the "only option" pairs, you will have a set of label features with a '_distance' attribute to the nearest hatch (as well as the list of all hatches within the tolerance). If you sort the features so that the least _distance is first, that first feature should be you next match, and you should remove the corresponding hatch from the remaining label lists.

 

If removing that hatch causes any other labels to have only one candidate, then start again from A, if they all still have two or more, then take the next smallest distance.

 

 

Reply