Skip to main content

Hi everyone,

I am trying to build a workflow that does the following greedy clustering using either FME transformers or Python Caller. Does anyone have tips for me to approach this challenging problem? Thanks in advance!

@takashi @david_r

You can do this with a neighbourfinder and a listcombiner (custom transformer)


You can do this with a neighbourfinder and a listcombiner (custom transformer)

Hi, So I guess I can use Neighborfinder to get all points within x distance of each point but to make the workflow efficient, I need to choose a new point that is not part of a cluster (Step 4). What do you think is the best way to do that? Thanks.


Hi, So I guess I can use Neighborfinder to get all points within x distance of each point but to make the workflow efficient, I need to choose a new point that is not part of a cluster (Step 4). What do you think is the best way to do that? Thanks.

I think i slightly misunderstood your requirement, as you want fixed size clusters, not all features within a certain distance of each other.

 

You say pick any point to start and then choose any point afterwards, but your example picks a very logical start point. I'm not sure you'd get the best results without considering where to start. I guess it depends on what your end requirement is, do you care about having the least number of clusters to cover all points?


I think i slightly misunderstood your requirement, as you want fixed size clusters, not all features within a certain distance of each other.

 

You say pick any point to start and then choose any point afterwards, but your example picks a very logical start point. I'm not sure you'd get the best results without considering where to start. I guess it depends on what your end requirement is, do you care about having the least number of clusters to cover all points?

You are 100% right that it's impossible to get the best results without considering where to start but unfortunately, I do not have any attributes to use to add some logic to it. My objective for this workflow is to find the cluster that covers the highest number of points.


Hi @trungn1993,

After finding the neighboring points with the NeighborFinder, use the ListElementCounter to count the number of neighbors within the radius, then use a Sorter to sort by that number, descending. This will put the points with the largest number of neighbors first, which is a good starting point.

Use a ListExploder on a copy of the first point, then use a VariableSetter to create variables with the name being the id of the matching point and the value being 1. Also do the ID of the first point. This will serve as the list of used points.

As you check subsequent points, you can use a VariableRetriever to retrieve the variable named for their ID. If you get 1, then that ID is already used in a grouping, and should be skipped. You will also need to explode the matched list to see if those IDs have already been used, and keep only the unused ones. Use an Aggregator to rebuild the list after testing.


Hi @trungn1993,

After finding the neighboring points with the NeighborFinder, use the ListElementCounter to count the number of neighbors within the radius, then use a Sorter to sort by that number, descending. This will put the points with the largest number of neighbors first, which is a good starting point.

Use a ListExploder on a copy of the first point, then use a VariableSetter to create variables with the name being the id of the matching point and the value being 1. Also do the ID of the first point. This will serve as the list of used points.

As you check subsequent points, you can use a VariableRetriever to retrieve the variable named for their ID. If you get 1, then that ID is already used in a grouping, and should be skipped. You will also need to explode the matched list to see if those IDs have already been used, and keep only the unused ones. Use an Aggregator to rebuild the list after testing.

Thanks @DaveAtSafe, I will give your methodology a try and let you know how it goes!


Thanks @DaveAtSafe, I will give your methodology a try and let you know how it goes!

Greedy Clustering.fmw

Hi @DaveAtSafe, can you take a look at what I have so far? I am not sure how to keep going with subsequent points. Do I need to do like a for loop?


Greedy Clustering.fmw

Hi @DaveAtSafe, can you take a look at what I have so far? I am not sure how to keep going with subsequent points. Do I need to do like a for loop?

Hi @trungn1993,

The workspace seems to be missing the SWCircleReplacer transformer. Can you embed it within the workspace, then re-send.


Hi @trungn1993,

The workspace seems to be missing the SWCircleReplacer transformer. Can you embed it within the workspace, then re-send.

Greedy Clustering.fmwt

Thanks Dave!


Hi @trungn1993,

I am attaching two workspaces that use the approach I outlined below:

m1-greedy-clustering.fmw - this uses the NeighborFinder to find close points, then builds the clusters by keeping track of points used in previous clusters.

m2-greedy-clustering.fmw - this refines the process by generating voronoi cells around the cluster anchors, then uses the cells to cluster the points. This results in a more compact grouping of the points.


Reply