This article is a summary of the **StatQuest** video made by **Josh Starmer**.

Click **here** to see the video explained by Josh Starmer.

**K-Means Clustering**

This is a popular unsupervised machine learning algorithms. The objective of K-means is simple: group similar data points together and discover underlying patterns.

The **Step - 1** is to identify the number of clusters.

Suppose we have K=3, the **Step - 2** is to select randomly 3 data points and these are our **Initial Cluster Points**.

The **Step - 3** consists to calculate the **Euclidian Distance** between the observations and the **Initial Cluster Points**.

The **Step - 4** is assign the observations to the nearest cluster.

The **Step - 5** calculate the **Mean** of each cluster. Now we have to asses the right cluster by adding-up the **Variation** within each cluster. We don’t know the best clustering, and so, we can only try again to select randomly the Initial Cluster Points and compare the Variation with the previous attempt. And we do this over again with different starting points.

With this approach, we can find the **Best Initial Cluster Points** that have the lowest variation, but we still we don’t know if that variation is the lowest overall. To solve this question, could be very useful to find a method to identify the **Best Numebr of K**. Essentially, each time we add a new cluster, the **Total Variation** within each cluster is smaller than before.

The goal is to stop the number of K when the variatioin stops to decrease, that is why this method is called **Elbow**. The **Elbow Plot** helps us to find when the variation is stabilized based on the number of K.

**Hierarchical Clustering**

This approach is often associated with **Heatmaps**.

Heatmaps often come with **Dendrograms** that indicates the similarity and the cluster of membership, and the order the clusters are formed.

Cluster with more similarity have shorter dendrogram branch.

The defined method for determining similarity is the **Euclidean Distance** between Observations, we can also use the **Manhattan Distance** that is the absolute value of the differences, as shown below.

Now, to assign new observations to a cluster based on the **Center Point**, we have to explore all the possible methods that are:

**1 - Centroid**: the average of each cluster.

**2 - Single Linkage**: the closest point in each cluster.

**3 - Complete Linkage**: the furthest point in each cluster.

The ultimate goal is to find the best method able to gives more insight into our data.

**How is K-Means clustering different from Hierarchical clustering?**

The **K-Means clustering** specifically tries to put the data into the number of clusters we tell it to.

The **Hierarchical clustering** just tells us, pairwise, what two things are most similar.