Question

Clustering Points in FME (Cluster Size)


Hi All,

I have a total number of n (points) spatially divided over a given area. I want to group them into x clusters using y members per cluster. This has to be as spatially optimal as possible.

So I would like set the following parameters in the transformer:

- The number of clusters

- The size of the clusters or the min. and max. size of cluster

Anyone got any ideas? I already tried working with the cluster modeller, K-means & RClusterCalculatowith no success.

Thanks in advance!


10 replies

Userlevel 4
Badge +30

Hi @bishoymf

I believe that this article can be helpful for you https://knowledge.safe.com/idea/59941/k-means-point-clustering-using-fme.html

Thanks,

Danilo

Hi @bishoymf

I believe that this article can be helpful for you https://knowledge.safe.com/idea/59941/k-means-point-clustering-using-fme.html

Thanks,

Danilo

Hi Danilo,

 

Thanks for your answer. I walked through the article but the challenge which I'm facing currently is how to define the size of the cluster not to build it.

 

Regards,

 

Bishoy

 

Userlevel 4
Badge +30

Did you download the template transformer and changed the option Number of Cluster?

Thanks,

 

Danilo

Badge +22

The PointClusterer custom transformer on the hub requires the number of clusters to be defined.

 

 

You generally can't constrain both the number of clusters and the maximum size of the cluster, because you'll run into errors if your number of points exceeds the product of the two.

 

 

That said, for univariate k-means clustering, we've had reasonable success in using both R ((Ckmeans.1d.dp)) and Python to create 'optimized clustering' defining a minimum and maximum number of clusters, and/or cluster size and looping through the possibilities, rejecting those that exceed the constraints, and then selecting the optimised cluster based on additional criteria.

It should be fairly straightforward to replace the ckmeans.1d.dp with a 2d k-means classifier.

 

 

 

Did you download the template transformer and changed the option Number of Cluster?

Thanks,

 

Danilo

Yes, i tired PointClusterer transformer, i can set the number of the output clusters. But i cannot define the size of each cluster.

 

 

Badge +22

Here's an option if you don't want to implement a different clustering algorithm that allows you to specify the cluster membership rather than the number of clusters.

 

 

With the assumption that you want the fewest number of clusters without exceeding a maximum value for the members of each cluster (MaxNo)

 

The minimum number of clusters is ceiling(TotalPoints/MaxNo).

 

Set that value to the PointClusterer transformer, check the number of members in each cluster, if any cluster exceeds the MaxNo, increase the number of clusters by one, and rerun the PointClusterer with the new number of clusters.

 

Repeat this loop until all clusters are under MaxNo.

Here's an option if you don't want to implement a different clustering algorithm that allows you to specify the cluster membership rather than the number of clusters.

 

 

With the assumption that you want the fewest number of clusters without exceeding a maximum value for the members of each cluster (MaxNo)

 

The minimum number of clusters is ceiling(TotalPoints/MaxNo).

 

Set that value to the PointClusterer transformer, check the number of members in each cluster, if any cluster exceeds the MaxNo, increase the number of clusters by one, and rerun the PointClusterer with the new number of clusters.

 

Repeat this loop until all clusters are under MaxNo.
Dear jdh,

 

Thanks a lot for your support, i tried PointClusterer as you mentioned. But i cannot control the MinNo as well. For example some time i get a clusters with 10 points or less.

 

 

What i was trying to do is to define the size of the clusters or at least define a range (MinNo and MaxNo) of the clusters.

 

 

BR,

 

Bishoy

 

The PointClusterer custom transformer on the hub requires the number of clusters to be defined.

 

 

You generally can't constrain both the number of clusters and the maximum size of the cluster, because you'll run into errors if your number of points exceeds the product of the two.

 

 

That said, for univariate k-means clustering, we've had reasonable success in using both R ((Ckmeans.1d.dp)) and Python to create 'optimized clustering' defining a minimum and maximum number of clusters, and/or cluster size and looping through the possibilities, rejecting those that exceed the constraints, and then selecting the optimised cluster based on additional criteria.

It should be fairly straightforward to replace the ckmeans.1d.dp with a 2d k-means classifier.

 

 

 

That's sounds good! but to make sure I fully understand your kind answer.

 

- You build/wrote an algorithm with Python and R languages using this package (Ckmeans.1d.dp) to build "Optimal and Fast Univariate Clustering"

 

- And you were able to define the minimum and the maximum number of clusters and/or the cluster size

 

 

If that's right, is there a transformation for FME to implement that which you could share or i need to develop it from scratch?

 

Badge +22
That's sounds good! but to make sure I fully understand your kind answer.

 

- You build/wrote an algorithm with Python and R languages using this package (Ckmeans.1d.dp) to build "Optimal and Fast Univariate Clustering"

 

- And you were able to define the minimum and the maximum number of clusters and/or the cluster size

 

 

If that's right, is there a transformation for FME to implement that which you could share or i need to develop it from scratch?

 

Over the years we developed a couple of different custom transformers using either python or R (not both in the same CT), unfortunately I am not allowed to share the, here.

 

 

What I suggest is you find a clustering algorithm that can accommodate your constraints (min number of members, max number of members, number of clusters) and then look for an existing R or python implementation.

 

I am not aware of one off the top of my head that can constrain all 3, but clustering algorithms are not my area of expertise.

 

 

Assuming you find one, then it's just a matter of a bit of scripting to push the feature data into either the R frame or the pythonCaller.

 

 

Hi , how to extract the cluster points collection of lat long after using the PointClusterer ?

Reply