Open

K-Means point clustering using FME

Clustering Points With K-Means in FME

This article has been generated using FME 2017.1

For a project to assess the quality of suppliers, five suppliers were approached to perform comparisons on aerial imagery. They were asked to select areas from aerial photographs where potentially dangerous situations might arise as a result of human activity. After the first delivery of reports from the suppliers within a test area, one of the obstacles for the project appeared to be the identification of clusters of reports. A cluster of such reports represent an area where a potentially dangerous situation has come about. A criterion for selecting the best supplier to detect the situations from the imagery, could be as follows: the best supplier is the one that “touches” the maximum number of clusters with the reports generated, with as few notifications as possible. Because FME played an important role in this project, but no transformer seemed available within FME, a study was made to implement a so-called K-Means clustering algorithm in an FME transformer.

Background article

The result of a K-Means clustering is a grouping of points in clusters, where the sum of the distances to the centroid of the cluster for all points within it is minimal, and smaller than the distance to other centroids. As a starting point for the clustering analysis is, apart from a list of points, the number of clusters to be generated. A nice article on this subject can be found here:

https://datasciencelab.wordpress.com/2013/12/12/clustering-with-k-means-in-python/

The article also contains a detailed Python script that might be implemented in FME. In the following, therefore, a description of that implementation is given. For further information on the Python algorithm or on the subject of K-means clustering, please refer to the article above.

Outline

The solution in FME is explained in the following steps:

1. Import and control of source dataset

2. Collect input coordinates and call the Python-script

3. Collect the resulting clusters

A so-called FME Custom Transformer, the PointClusterer, was built for the solution. Below is an explanation of the steps taken.

Import and control of source dataset

The custom transformer has only one parameter, the number of clusters to generate. During experiments, the algorithm appeared to stumble when e.g. entering a negative number for the number of clusters to be generated, or entering a value greater than the total number of points in the source data. So, introducing a user parameter for the number of clusters to generate (type number, positive, and with 0 decimals), the input parameter is forced to an integer. In that way the user is not allowed to enter invalid values. A separate check (using a Tester) is built in, to ensure the number of clusters to be generated does not exceed the number of points entered.

Collect input coordinates and call the Python-script

After the points are entered, they are checked for geometry: only point geometry is allowed. From the point information, the coordinates are extracted and placed in a list of X, Y using an Aggregator. The number of points in the input is counted and passed on using an attribute value to a PythonCaller, in which the actual call to the clustering algorithm takes place. The same way, the PythonCaller reads the number of clusters to be generated also, originating from the user parameter. After that, a numpy array is filled with the coordinates that are in the list X, Y (using feature.getAttribute).

Collect the resulting clusters

The call to the find_centers function in the python script provides the convergence criterion and the list of clusters that were generated (numpy array format). To get the clusternumber along with the coordinates, the list is iterated and all coordinates with their assigned cluster number are placed in a new list attribute on the FME feature (feature.setAttribute). Once outside of the PythonCaller, the list created is split into their originating parts, getting back the original coordinates with the assigned cluster number. The coordinate values are used to build the point geometries again (using a VertexCreator) and the coordinates are put together in an aggregate by means of their cluster number (using an Aggregator).

Results

The custom transformer which has been created, can be downloaded from the FME Hub, https://hub.safe.com/transformers/pointclusterer. It has been used on two small datasets, yielding the following results:

- A regular grid of 10x10 points and 7 clusters [fig. 1], total duration for the translations was 1 second.

- A series of approx. 300 Dutch municipalities grouped into 7 clusters [fig. 2] took 4 seconds processing time (which included about 3 seconds to read the data from a PostGIS reader and replacing the polygon features by an inside point).

For the sake of clarity, the points here are buffered and provided with a color.

When larger numbers (about 3000 points, 60 clusters) are used, the algorithm takes about 30 seconds to arrive at a result.

Fig. 1: 7 clusters from a regular 10x10 grid Fig. 2: 7 clusters from approx..300 Dutch municipalities

The last example might be an idea for a regional reclassification in the Netherlands?

Conclusions / Recommendations

It turns out to be quite easy to incorporate a piece of Python functionality in FME. The created transformer can easily be shared with other interested parties via the FME Hub platform. The transformer that was created could be extended with other clustering methods, or adapted in such a way that other objects than just points can be clustered together. The transformer was not used in the project mentioned in the introduction, it was merely a source of inspiration. When used in the project, the transformer could be adapted by choosing an algorithm that derives the optimal number of clusters from the data itself. See for example https://beta.vu.nl/nl/Images/werkstuk-hendrikson_tcm235-91358.pdf

B

bishoymf
6 replies
6 years ago
July 5, 2018

I wounder if I can the define the number of members per cluster. So I would like set the following parameters in the transformer: - The number of clusters - The size of the clusters or the min. and max. size of cluster

Is the applicable?

B

bishoymf
6 replies
6 years ago
July 5, 2018

I wounder if I can the define the number of members per cluster. So I would like set the following parameters in the transformer: - The number of clusters - The size of the clusters or the min. and max. size of cluster

Is the applicable?

+15

LizAtSafe
Safer
1507 replies
1 year ago
February 12, 2024

The following idea has been merged into this idea:

Develop more statistical transformers or functions Archived
0 Votes

All the votes have been transferred into this idea.

+15

LizAtSafe
Safer
1507 replies
1 year ago
February 12, 2024

The following idea has been merged into this idea:

M

DBSCAN Density point clustering Archived
0 Votes

All the votes have been transferred into this idea.

Reply

Rich Text Editor, editor1

K-Means point clustering using FME

Clustering Points With K-Means in FME

Background article