Single class cluster

Clusters objects (based on centroid locations) using K-Means++ and/or DBSCAN algorithms.

Description

Clusters objects (based on centroid locations) using K-Means++ and/or DBSCAN algorithms. In K-Means++ [1], an optimisation of the standard K-Means algorithm, the points are assigned to a pre-determined number of clusters such that each point is assigned to its closest cluster mean position (this process is repeated until the cluster assignments stabilise or a maximum number of iterations is reached). For DBSCAN [2], points are clustered based on a minimum number of neighbours within a specified spatial range. As such, this algorithm doesn't require prior knowledge of the number of clusters. Both algorithms use their respective Apache Commons Math implementations.

References:
[1] Arthur, D.; Vassilvitskii, S. (2007). "k-means++: the advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035
[2] Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. (1996). "A density-based algorithm for discovering clusters in large spatial databases with noise." Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231

Parameters

Parameter Description
Input objects Objects from the workspace to be grouped into clusters. Clusters are determined based on the centroid postions of the input objects. Input objects will be children of their assigned clusters. Each input object will be assigned to a single cluster.
Cluster (parent) objects Output cluster objects to be added to the workspace. Each cluster will be a parent of the associated input objects.
Apply volume When selected, the output cluster object will gain all coordinates of any child objects.
Clustering algorithm The clustering algorithm to use:
  • "DBSCAN" Points are clustered based on a minimum number of neighbours ("Minimum number of points per cluster") within a specified distance ("Neighbourhood for clustering (epsilon)"). All proximal points which satisfy these criteria are added to a common cluster. This uses the Apache Commons Math3 implementation of DBSCAN, which describes the algorithm as: "A point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.".
  • "KMeans++" Points are assigned into a pre-determined number of clusters (defined by "Number of clusters"), with each point assigned to the cluster with the closest centroid. Since the cluster centroids will vary with each added point, this process is optimised iteratively. The algorithm continues until either no points switch clusters or the maximum number of allowed iterations ("Maximum number of iterations") is reached.
Number of clusters If "Clustering algorithm" is set to "KMeans++", this is the number of clusters the points will be assigned to.
Maximum number of iterations If "Clustering algorithm" is set to "KMeans++", this is the maximum number of optimisation iterations that will be performed. If cluster assignment has stabilised prior to reaching this number of iterations the algorithm will terminate early.
Neighbourhood for clustering (epsilon) If "Clustering algorithm" is set to "DBSCAN", this is the minimum distance to neighbour points that must be satisfied for a point to be added to a cluster. This distance is specified in pixel units.
Minimum number of points per cluster If "Clustering algorithm" is set to "DBSCAN", this is the minimum number of neighbour points which must be within a specified distance ("Neighbourhood for clustering (epsilon)") of a point for that point to be included in the cluster.
Only link objects in same frame When selected, objects must be in the same time frame for them to be assigned to a common cluster.