Cluster Analysis

Cluster Analysis (page 3 of 6)

The K-means clustering algorithm is a simple and elegant (yet powerful) algorithm for partitioning a data set into K distinct, nonoverlapping clusters. To perform K-means clustering, we must first specify the desired number of clusters K; then, the K-means algorithm will assign each observation to exactly one of the K clusters based on assigned centroids.

The algorithm starts by randomly assigning centroids and placing each observation to the clusters with the nearest centroid. Next, the algorithm resets each centroid based on the observations assigned to the cluster. Then, the algorithm re-assigns each observation based on the new centroids. This process is followed until a set number of iterations is reached or until re-adjusting the centroids results in no observations changing clusters.

The approach kmeans follows to solve the problem is called Expectation-Maximization. The E-step assigns the observations to the closest cluster. The M-step computes the centroid of each cluster.