Ward clustering
How fast is Ward clustering? This is an agglomerative/hierarchical clustering algorithm which can be implemented in terms of joining data points by minimizing pairwise distances (like single linkage), and it can be interpreted as minimizing the within-cluster sum of squares (like binary segmentation).
In theory
The hierarchical clustering wikipedia page says it is cubic, O(N^3) in the general case.
- Have to compute 
O(N^2)pairwise distance matrix. - Then need to perform 
N-1joins. - So overall depends on how fast each join can be computed.
 - Each join using single linkage (min) criterion is 
O(N), so overallO(N^2). - Ward algorithm is a special case of the Lance-Williams algorithms, which is when each new distance, after cluster join, can be computed using a simple 
O(1)update rule. So each join event requiresO(N)time to re-compute distances. Computing allN-1join events therefore takesO(N^2)time. 
With constraints
In an application to HiC data, the N samples we want to cluster occur on a 2D grid.
Assuming 4-connectivity, we can only join with 4 neighbors (not N-1), so the algorithm should be faster.
But we want to avoid computing the O(N^2) pairwise distance matrix, so we can no longer use a Lance-Williams update rule.
Instead we would have to update cluster means, which we use to compute new distances.
The time complexity of the algorithm depends on how many distances we have to compute at each iteration.
Best case is constant O(1) number of distances to compute in each iteration, for total O(N log N) time.
Worst case is linear O(N) number of distance to compute in each iteration, for total O(N^2) time.
In code
_agglomerative.py has ward_tree() has a recursive merge loop.
- It uses a heap, keeps distances sorted (log-time insertion, constant time min).
 - It uses moments matrices: 
moments_1is the number of samples (clusters x 1), andmoments_2is the matrix of sums (clusters x features). - It sets 
n_nodesto the number of output nodes in the tree, and initially allocates various arrays of this size (moments, used indicator, parent, etc). used_nodeindicator marks joins which are no longer possible, because the corresponding nodes have already been involved in other joins. There is awhileloop which keeps going until it finds a join which involves only unused nodes. This is slightly sub-optimal (in C++ this could be reduced from log to constant time using pointers, but this does not matter much).
_hierarchical_fast.pyx
has compute_ward_dist() which has two for loops: size_max (I guess clusters) and n_features.
Using the cumulative sum trick would be about as fast (same asymptotic complexity class) as this python code, which seems to compute the new distances in constant time given the means, based on the formula involving the difference of two means from Hierarchical Clustering, Cluster Linkage.
Exercises for the reader
- Compute empirical asymptotic time complexity of 
ward_tree(). - Show that the three segmentation algos give the same result in 1D (Ward via Lance-Williams, Ward via moments as in python, cumsum trick).
 
Other software and reading
- Table 6 in A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets has a nice review of computational complexity (big-O notation) of different 2d image segmentation algorithms.
 - skimage.segmentation has slic which is k-means with 2d constraints.
 - opencv segmentation