Skip to main content

Hi,

I have 100,000+ points I would like to group into a fixed number of members as spatially optimal as possible.

The data comes delivered to me in either text or csv containing a variety of attributes including postcode. I have geo-referenced each row to the nominal centre of each postcode using OS Code Point. This gives me the easting and northing for each row. I have tried to convert this to an xyz to then follow the suggestions in this post https://knowledge.safe.com/questions/24064/cluster-points-based-on-location-k-means-method.html. However, I suspect due to my limited working knowledge of FME, I'm struggling to create the correct workflow.

I have tried another approach in FME creating voronoi polygons from the points and then using PointOnAreaOverlayer to add the overlap/count of points in the polygons. I thought I could then look to dissolve the polygons based on nearest neighbour locations until the desired count is reached.

Can anyone provide any advice, tips or instructions on how to best come up with a solution?

Thanks for taking the time to look

B

Do you check this example of clustering : https://knowledge.safe.com/articles/1258/cluster-or-density-modelling.html

https://hub.safe.com/transformers/clustermodeller

Good luck @bieahart


Ouch! That's a tough one. Really I think that needs a form of spatial index, where you sort features into an order that is related to their closeness to each other.

For what it's worth, attached is a workspace that I was working on. It's my attempt at a 'Hilbert Curve'. The idea - which I never did get time to complete - is that you'd overlay this curve onto your data and snap your points onto it. You then sort your points into order depending on their distance along the line.

Once your features are in order, you can create the fixed groups as you need (eg the first 100, the next 100, etc) and they will be spatially optimized groups.

The workspace here will create the curve - but that's as far as I got. It may or may not help, but it's the best I've got right now.

hilbertcurve.fmw


I would try creating a PointCloud from the input data and use the PointCloudThinner transformer.

Hope this helps.


Reply