Question

When points are too close, all but one should be deleted


A shapefile has a lot of points in it, and many of these are too close to each other to be shown on a map.

I would like to

- identify all points which are closer than 20 meters from each other

- delete all but one of them

- and add the text "multi" into a field for each point which is left remaining but had a point too close to it


10 replies

Badge +22

If there is no criteria as to which point in a cluster to keep, then an easy way to reduce the points is using a Snapper with the Snapping Tolerance set to 20m, the first feature in a cluster will go out the untouched port, every other point in the cluster will go out the Snapped port.

 

 

To get the count, you could then use a SpatialRelator with the Requestor being the Untouched points and the Supplier being the Snapped points, your multi field would be the _related_candidates attribute.

 

 

For a more sophisticated method of point reduction, for example keeping the point with the highest value for a given attribute, you could use a NeighborFinder with a Close Candidate List Name, use the ListRangeExtractor to get the maximum value of the attribute and then a Tester to see whether the feature's value is >= max value of the list. If the feature has a greater value then all it's neighbours, then it's the one to keep, if it's equal then you have two or more points at the max value and you need to sort against a second criteria, or randomly sample it, if it's less than it's a feature to discard.

 

 

The multi field would be the length of the CloseCandidate list of the feature that is kept.

@jdh thank you, the snapper solution I understand, but as you seem to have guessed :), the better solution would be the NeighborFinder: Where points are closer than 20m to eachother, I would need to keep the point with the lowest value for a given attribute "LUOKKA" (there are only three possible values, maybe I do not need ListRangeExtractor?). I have now tried to understand how to do this from your explanation, but I am afraid I am too much of a FME newbie. This is how I have tried (but as you see, I have not understood how to use the ListRangeExtractor and possible Tester correctly):

 

 

1. I add a reader with the shapefiles that have the points in them (all same projection) as Single Merged Feature Type.

 

 

2. I add a writer, a new shapefile

3. I use the transformer NeighborFinder with Input "Candidates Only", Maximum Distance "20" and Number of Neighbors to Find for instance "10" and connect the reader to this.

 

 

4. (????) I put in the ListRangeExtractor and Tester between the Reader and the NeighborFinder (?), but I cannot figure out how / what to connect with what. I have three possible values in "LUOKKA", 38511, 38512 and 38513. If two or more points are closer than 20 meters from each other, then ONE random 38511 should be saved if present and the other 38511, 38512 and 38513 deleted, if no 38511 is present then ONE 38512 and nothing else should be saved, if no 38512 is present then ONE 38513 should be saved... But I have no clue how to achieve this.

 

5. I connect the NeighborFinder "UnmatchedCandidate" with the writer (?) and start.

Am I at all on the right track here :)? Very thankful for any help!

Badge +22

@jdh thank you, the snapper solution I understand, but as you seem to have guessed :), the better solution would be the NeighborFinder: Where points are closer than 20m to eachother, I would need to keep the point with the lowest value for a given attribute "LUOKKA" (there are only three possible values, maybe I do not need ListRangeExtractor?). I have now tried to understand how to do this from your explanation, but I am afraid I am too much of a FME newbie. This is how I have tried (but as you see, I have not understood how to use the ListRangeExtractor and possible Tester correctly):

 

 

1. I add a reader with the shapefiles that have the points in them (all same projection) as Single Merged Feature Type.

 

 

2. I add a writer, a new shapefile

3. I use the transformer NeighborFinder with Input "Candidates Only", Maximum Distance "20" and Number of Neighbors to Find for instance "10" and connect the reader to this.

 

 

4. (????) I put in the ListRangeExtractor and Tester between the Reader and the NeighborFinder (?), but I cannot figure out how / what to connect with what. I have three possible values in "LUOKKA", 38511, 38512 and 38513. If two or more points are closer than 20 meters from each other, then ONE random 38511 should be saved if present and the other 38511, 38512 and 38513 deleted, if no 38511 is present then ONE 38512 and nothing else should be saved, if no 38512 is present then ONE 38513 should be saved... But I have no clue how to achieve this.

 

5. I connect the NeighborFinder "UnmatchedCandidate" with the writer (?) and start.

Am I at all on the right track here :)? Very thankful for any help!

You could try sorting your points by LUOKKA prior to the snapper to see if that meets your needs.

 

 

If no one else chimes in, I can try to explain the NeighborFinder a bit later when I have some time.
Badge

Hi @benjamin

I would like to suggest a slightly different approach. I would implement two steps. In the first step I suggest you use the NeighborFinder in order to find all those points that are not closely surrounded with other points (red in image) and the second part where you can create buffers around the other points which will be dissolved in order to select the 'outputpoint' for groups of points (based on an attribute value) (blue in image).

A screenshot from the input (grey) and the output (blue and red):

And a screenshot from the Workspace used to create the result:

The first step is rather easy. Just get those points that cannot be linked with a neighbor within a certain radius (btw. You don't have to fill in the 'neighbors to find' when you fill in a distance).

The second step (the two bookmarks on the right) is a bit more elaborate. First we store the original point geometry of every point still in the workspace (GeometryExtractor). Afterwards we buffer those points with a certain amount (10 000 in my example - 20 for your data) and dissolve the result. Since those points are all located within this radius from each other (otherwise they wouldn't reach this step), the created buffers will overlap and therefore will be dissolved into one big polygon per group.

The trick is to create a list of all points that are dissolved. By creating a groupidentifier (counter) and recreating a feature for every point (ListExploder), we for every point to which group it belongs. Afterwards I have used a sorter to make sure our first pick to stay in the resulting file is on top and a sampler to only retain 1 feature per group (group by groupidentifier) in the result. Once this selection is made, I only had to restore the original point geometry.

The used Workspace:

filteringpoints.fmw

Good luck!

Badge +22

Hi @benjamin

I would like to suggest a slightly different approach. I would implement two steps. In the first step I suggest you use the NeighborFinder in order to find all those points that are not closely surrounded with other points (red in image) and the second part where you can create buffers around the other points which will be dissolved in order to select the 'outputpoint' for groups of points (based on an attribute value) (blue in image).

A screenshot from the input (grey) and the output (blue and red):

And a screenshot from the Workspace used to create the result:

The first step is rather easy. Just get those points that cannot be linked with a neighbor within a certain radius (btw. You don't have to fill in the 'neighbors to find' when you fill in a distance).

The second step (the two bookmarks on the right) is a bit more elaborate. First we store the original point geometry of every point still in the workspace (GeometryExtractor). Afterwards we buffer those points with a certain amount (10 000 in my example - 20 for your data) and dissolve the result. Since those points are all located within this radius from each other (otherwise they wouldn't reach this step), the created buffers will overlap and therefore will be dissolved into one big polygon per group.

The trick is to create a list of all points that are dissolved. By creating a groupidentifier (counter) and recreating a feature for every point (ListExploder), we for every point to which group it belongs. Afterwards I have used a sorter to make sure our first pick to stay in the resulting file is on top and a sampler to only retain 1 feature per group (group by groupidentifier) in the result. Once this selection is made, I only had to restore the original point geometry.

The used Workspace:

filteringpoints.fmw

Good luck!

The issue with the dissolved buffer approach, is that you can run into a linear points situation. Consider a road with lights spaced at or just under the minimum distance, if you buffer and dissolve, you would end up with a single feature for the entire road, and end up removing lights that are kilometers away from the point that is kept.

 

Badge
The issue with the dissolved buffer approach, is that you can run into a linear points situation. Consider a road with lights spaced at or just under the minimum distance, if you buffer and dissolve, you would end up with a single feature for the entire road, and end up removing lights that are kilometers away from the point that is kept.

 

Yes, that is true indeed. Although for a lot of situations this method will work nice. Another possibility is to divide the extent in different squares and only keep one point per square. Depending on the spread of the data, one implementation works better than the other one.

 

 

Badge
Hi @benjamin

 

Could you close your question If you've got an answer on it?

 

Userlevel 2
Badge +17

Hi @benjamin, I think you can achieve the goal with only the combination of a Sorter (sort the features by LUOKKA ascending) and a Snapper (Snapping Tolerance: 20), as @jdh suggested already.

Gray: removed points (Snapped), Red: outputs (Untouched), Purple: radius = 20, center = red point

Am I missing something?

Userlevel 4

If there is no criteria as to which point in a cluster to keep, then an easy way to reduce the points is using a Snapper with the Snapping Tolerance set to 20m, the first feature in a cluster will go out the untouched port, every other point in the cluster will go out the Snapped port.

 

 

To get the count, you could then use a SpatialRelator with the Requestor being the Untouched points and the Supplier being the Snapped points, your multi field would be the _related_candidates attribute.

 

 

For a more sophisticated method of point reduction, for example keeping the point with the highest value for a given attribute, you could use a NeighborFinder with a Close Candidate List Name, use the ListRangeExtractor to get the maximum value of the attribute and then a Tester to see whether the feature's value is >= max value of the list. If the feature has a greater value then all it's neighbours, then it's the one to keep, if it's equal then you have two or more points at the max value and you need to sort against a second criteria, or randomly sample it, if it's less than it's a feature to discard.

 

 

The multi field would be the length of the CloseCandidate list of the feature that is kept.
Agree, this is a pretty good way of doing it. But rather than using the ListRangeExtractor to find the max value, I usually just sort the list (ListSorter) in descending order and take the first item (ListIndexer at item 0).
Badge +3

As stated with the road pole example, clusters can intersect.

Some screenshots don't do justice to reality...like clusters conveniently out of range of each other...like galaxies.

I simply create a cluster for each point.

By unconditionally merging them ((bulk)renaming the supplier or requestor attributes) en calculating the distance (extract geometry before, unless you intend to use vertex creator after). Then select all distances <20 and aggregate them. Remove geometry de-aggregate and then replace geometry with supplier geometry) and aggregate again.

You end up with clusters centered around each input object.

100 objects ending up in 100 clusters. Overlaps prevent form actually seeing them all.

(randomly created with 100m max. @rand()*100. Radically different when using 1000 or 10000m max.)

At this point some will overlap and one wonders what criteria to use to select one from each cluster. You'll have to prevent selecting objects twice or more.

Reply