Introduction to K-Means and Hierarchical clustering

This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.

K-Means Clustering
This is a popular unsupervised machine learning algorithms. The objective of K-means is simple: group similar data points together and discover underlying patterns.
The Step - 1 is to identify the number of clusters.
Suppose we have K=3, the Step - 2 is to select randomly 3 data points and these are our Initial Cluster Points.
The Step - 3 consists to calculate the Euclidian Distance between the observations and the Initial Cluster Points.

The Step - 4 is assign the observations to the nearest cluster.
The Step - 5 calculate the Mean of each cluster. Now we have to asses the right cluster by adding-up the Variation within each cluster. We don’t know the best clustering, and so, we can only try again to select randomly the Initial Cluster Points and compare the Variation with the previous attempt. And we do this over again with different starting points.

With this approach, we can find the Best Initial Cluster Points that have the lowest variation, but we still we don’t know if that variation is the lowest overall. To solve this question, could be very useful to find a method to identify the Best Numebr of K. Essentially, each time we add a new cluster, the Total Variation within each cluster is smaller than before.

The goal is to stop the number of K when the variatioin stops to decrease, that is why this method is called Elbow. The Elbow Plot helps us to find when the variation is stabilized based on the number of K.

Hierarchical Clustering
This approach is often associated with Heatmaps.
Heatmaps often come with Dendrograms that indicates the similarity and the cluster of membership, and the order the clusters are formed.

Cluster with more similarity have shorter dendrogram branch.
The defined method for determining similarity is the Euclidean Distance between Observations, we can also use the Manhattan Distance that is the absolute value of the differences, as shown below.

Now, to assign new observations to a cluster based on the Center Point, we have to explore all the possible methods that are:
1 - Centroid: the average of each cluster.
2 - Single Linkage: the closest point in each cluster.
3 - Complete Linkage: the furthest point in each cluster.
The ultimate goal is to find the best method able to gives more insight into our data.

How is K-Means clustering different from Hierarchical clustering?
The K-Means clustering specifically tries to put the data into the number of clusters we tell it to.
The Hierarchical clustering just tells us, pairwise, what two things are most similar.