Cluster points

Clusters objects (based on centroid locations) using K-Means++ and/or DBSCAN algorithms.

Description

Clusters objects (based on centroid locations) using K-Means++ and/or DBSCAN algorithms. In K-Means++ [1], an optimisation of the standard K-Means algorithm, the points are assigned to a pre-determined number of clusters such that each point is assigned to its closest cluster mean position (this process is repeated until the cluster assignments stabilise or a maximum number of iterations is reached). For DBSCAN [2], points are clustered based on a minimum number of neighbours within a specified spatial range. As such, this algorithm doesn't require prior knowledge of the number of clusters. Both algorithms use their respective Apache Commons Math implementations.

References:
[1] Arthur, D.; Vassilvitskii, S. (2007). "k-means++: the advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035
[2] Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. (1996). "A density-based algorithm for discovering clusters in large spatial databases with noise." Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231

Parameters

Parameter Description
Input image
Output objects
Binary logic
Clustering algorithm The clustering algorithm to use:
  • "DBSCAN" Points are clustered based on a minimum number of neighbours ("Minimum number of points per cluster") within a specified distance ("Neighbourhood for clustering (epsilon)"). All proximal points which satisfy these criteria are added to a common cluster. This uses the Apache Commons Math3 implementation of DBSCAN, which describes the algorithm as: "A point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.".
  • "KMeans++" Points are assigned into a pre-determined number of clusters (defined by "Number of clusters"), with each point assigned to the cluster with the closest centroid. Since the cluster centroids will vary with each added point, this process is optimised iteratively. The algorithm continues until either no points switch clusters or the maximum number of allowed iterations ("Maximum number of iterations") is reached.
Number of clusters If "Clustering algorithm" is set to "KMeans++", this is the number of clusters the points will be assigned to.
Maximum number of iterations If "Clustering algorithm" is set to "KMeans++", this is the maximum number of optimisation iterations that will be performed. If cluster assignment has stabilised prior to reaching this number of iterations the algorithm will terminate early.
Neighbourhood for clustering (epsilon) If "Clustering algorithm" is set to "DBSCAN", this is the minimum distance to neighbour points that must be satisfied for a point to be added to a cluster. This distance is specified in pixel units.
Minimum number of points per cluster If "Clustering algorithm" is set to "DBSCAN", this is the minimum number of neighbour points which must be within a specified distance ("Neighbourhood for clustering (epsilon)") of a point for that point to be included in the cluster.