Kmeans Algorithm
Steps:
- Given a dataset of size N, randomly initialize k centroids if you feel the dataset can be clustered into k different clusters
\(C = \Set{c_1,..,c_K}\) | Set of K Centroids
\(S = \Set{s_1,..,s_K}\) | Set of K Clusters
Intra-cluster variance is the objective function which has to be minimized in k-means algorithm:

-
Class assignment

-
Update centroids

-
Repeat the process until the cluster centroids change any further, i.e. repeat until convergence
Complexity in one iteration: - k n t k = K clustures | n = No of samples | t = time taken
It is sensitive to the randomly chosen centroids
Clustering performance
Silhouette Coefficient
a : the mean distance between a datapoint and the other points from the same cluster
b : the mean distance between a datapoint and all the other points in the next nearest cluster.