Unsupervised Learning January 23 ,2025

Hierarchical Clustering in Machine Learning

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ depending on how they work. As there is no requirement to predetermine the number of clusters as we did in the K-Means algorithm.

Types of Hierarchical Clustering

There are two main types of hierarchical clustering:

Agglomerative (Bottom-Up Approach):
- Initially, each data point is considered as its own cluster.
- Pairs of clusters are merged based on their similarity, gradually forming a hierarchy.
- This is the most commonly used approach in hierarchical clustering.
Divisive (Top-Down Approach):
- Starts with all data points in a single cluster.
- The algorithm recursively splits the cluster into smaller clusters based on dissimilarity.
- This approach is less commonly used due to its complexity.

Why Hierarchical Clustering?

Hierarchical clustering is preferred when:

Hierarchy is meaningful: The data has a natural hierarchy or nested structure.
Number of clusters is unknown: There’s no prior knowledge of the number of clusters.
Visualization is key: Dendrograms offer a clear visualization of how clusters are formed.

Agglomerative Hierarchical Clustering

Agglomerative clustering is a bottom-up approach where each data point starts as its own cluster. Clusters are then iteratively merged based on similarity until a single cluster remains or a stopping criterion is met.

How Agglomerative Hierarchical Clustering Works

Initialization: Start with each data point as an individual cluster.
Compute distances: Calculate the distance between all pairs of clusters.
Merge clusters: Combine the two clusters that are closest based on a linkage criterion.
Update distances: Recalculate distances between the new cluster and all other clusters.
Repeat: Continue merging until only one cluster remains or the desired number of clusters is reached.

Measures for the Distance Between Two Clusters

Linkage methods are crucial in hierarchical clustering as they define how the distance between two clusters is calculated. The common linkage methods are:

Single Linkage: The distance between two clusters is defined as the minimum distance between any two points, where one point belongs to each cluster. This can lead to "chaining," where clusters are merged based on a few very close points, even if the majority of the points are far apart.
Complete Linkage: The distance between two clusters is defined as the maximum distance between any pair of points, where one point belongs to each cluster. This method tends to create more compact and spherical clusters.
Average Linkage: The distance between two clusters is the average of the distances between all pairs of points, one from each cluster. It balances the effect of single and complete linkage.
Ward’s Method: This minimizes the total within-cluster variance (sum of squared differences from the centroid of the cluster). It is known for producing compact, spherical clusters and is widely used in practice.

Working of Dendrogram in Hierarchical Clustering

A dendrogram is a tree-like diagram that shows the arrangement of clusters:

Leaves: Represent individual data points.
Branches: Indicate clusters formed by merging.
Height: Represents the distance or dissimilarity at which clusters are merged.

How to Interpret a Dendrogram:

Cutting the dendrogram: Choose a horizontal threshold to determine the number of clusters.
Branch lengths: Longer branches indicate greater dissimilarity.
Structure: The branching pattern reflects the hierarchy of clusters.

Steps Involved in Agglomerative Hierarchical Clustering

The agglomerative hierarchical clustering algorithm works through the following steps:

Initialize: Treat each data point as a single cluster.
Compute Pairwise Distance: Calculate the distance between all pairs of data points.
Merge Clusters: Find the two clusters that are closest to each other (using a linkage method) and merge them into a new cluster.
Repeat: Repeat the process of calculating distances and merging clusters until all data points are merged into a single cluster, or a desired number of clusters is reached.
Generate Dendrogram: Visualize the process of clustering through a dendrogram, which shows the merging steps along with the corresponding distance between clusters.

Divisive Hierarchical Clustering

What is Divisive Hierarchical Clustering?

Divisive hierarchical clustering is a top-down approach, where the entire dataset is initially treated as a single cluster. The algorithm then recursively splits the cluster into smaller clusters until each data point is in its own cluster.

In contrast to the agglomerative approach (which starts with individual data points and merges them into clusters), the divisive approach begins with one large cluster and works its way down.

How Divisive Clustering Works:

Start with a Single Cluster: Initially, the entire dataset is considered as one single cluster.
Split the Cluster: The dataset is split into two smaller clusters, usually by identifying the two most dissimilar points or by using a distance metric like the K-means algorithm to divide the data.
Repeat the Process: This splitting process continues recursively for each cluster, dividing it further until each cluster contains only one data point or meets a stopping criterion.
Stop When a Condition Is Met: The splitting stops when each cluster contains a single data point or when a predefined stopping condition is reached.

Example:

For instance, consider a dataset of points in a 2D space.

First, all the points are in one cluster.
The algorithm will split the cluster into two subclusters based on their similarity.
Each subcluster will continue splitting until there is no further meaningful division possible, typically when only one point remains in each cluster.

Key Features of Divisive Hierarchical Clustering:

Top-Down Approach: Starts with one large cluster and recursively divides it.
Tree Structure: Like the agglomerative method, divisive clustering results in a tree-like structure called a dendrogram, where each split forms a new branch.
Scalability: Divisive methods can be computationally expensive, especially for large datasets, since splitting the data involves complex decisions.

Pros of Divisive Hierarchical Clustering:

More Accurate for Some Datasets: The divisive method may be more effective when the dataset has a natural central cluster that can be split.
All Clusters Are Considered: Since you start with one cluster and split, all data points are involved in the process.

Cons of Divisive Hierarchical Clustering:

Computationally Intensive: It can be computationally expensive due to the need to evaluate possible splits at each step.
Less Common: It’s less commonly used than the agglomerative approach because it can be slower and harder to implement.

Divisive vs. Agglomerative:

Divisive: Starts with one cluster and recursively splits it. It’s a top-down approach.
Agglomerative: Starts with individual points and recursively merges them into larger clusters. It’s a bottom-up approach.

Advantages of Hierarchical Clustering

No Need to Specify the Number of Clusters: Unlike K-Means clustering, hierarchical clustering does not require specifying the number of clusters upfront. The tree-like structure allows us to explore the data at different levels of granularity and decide the number of clusters after examining the dendrogram.
Hierarchical Structure: The hierarchical structure of the clusters makes it easy to understand how data points are grouped together and how the clusters relate to each other.
Works Well with Any Shape of Clusters: Hierarchical clustering does not assume spherical or compact clusters, unlike K-Means, which works best with spherical clusters.
Flexibility: Hierarchical clustering can work with any distance metric, such as Euclidean, Manhattan, or others.

Disadvantages of Hierarchical Clustering

Computationally Expensive: Hierarchical clustering, especially the agglomerative approach, is computationally expensive with a time complexity of O(n3)O(n^3)O(n3) for computing the distance matrix and O(n2)O(n^2)O(n2) for the actual clustering, making it impractical for very large datasets.
Sensitivity to Noise: Hierarchical clustering is sensitive to noisy data and outliers, as these can affect the overall structure and produce misleading dendrograms.
Irreversible Merges: Once clusters are merged in agglomerative clustering, they cannot be split. This means that the algorithm might make suboptimal decisions during the merging process that cannot be undone.

Applications of Hierarchical Clustering

Gene Expression Analysis: In bioinformatics, hierarchical clustering is widely used for clustering genes with similar expression patterns.
Document Clustering: Grouping documents that are similar to each other in terms of content, which is useful for information retrieval and recommendation systems.
Market Segmentation: Hierarchical clustering is used to segment customers based on purchasing behavior, demographics, and other factors.
Image Segmentation: Hierarchical clustering is used in computer vision to group similar pixels together based on color or texture.

Key Takeaways-

What is Hierarchical Clustering?
- It's an unsupervised machine learning algorithm used to group data into clusters.
- The output is a tree-like structure called a dendrogram.
Types of Hierarchical Clustering:
- Agglomerative (Bottom-Up): Starts with each data point as its own cluster and merges them based on similarity.
- Divisive (Top-Down): Starts with all data points in a single cluster and recursively splits them.
Agglomerative Clustering Process:
- Initialize with each data point as a cluster.
- Calculate distances between clusters and merge the closest ones.
- Repeat until only one cluster remains or a stopping criterion is met.
Linkage Methods for Distance Measurement:
- Single Linkage: Minimum distance between clusters.
- Complete Linkage: Maximum distance between clusters.
- Average Linkage: Average distance between all pairs of points.
- Ward’s Method: Minimizes within-cluster variance.
Dendrogram:
- Visual representation of clustering.
- The height indicates the distance at which clusters are merged.
- Cutting the dendrogram at a level determines the number of clusters.
Divisive Clustering:
- Starts with one large cluster and recursively splits it until each point is in its own cluster.
- Less commonly used due to its complexity.
Advantages of Hierarchical Clustering:
- No need to predefine the number of clusters.
- Works with any shape of clusters and distance metrics.
- Provides a clear hierarchical structure.
Disadvantages of Hierarchical Clustering:
- Computationally expensive.
- Sensitive to noisy data and outliers.
- Irreversible merges (once clusters are merged, they can't be split).
Applications: Gene expression analysis, document clustering, market segmentation, and image segmentation.

Next Blog- Python Implementation of Agglomerative Hierarchical Clustering

Purnima

You must logged in to post comments.

Hierarchical Clustering in Machine Learning

Types of Hierarchical Clustering

Why Hierarchical Clustering?

Agglomerative Hierarchical Clustering

How Agglomerative Hierarchical Clustering Works

Measures for the Distance Between Two Clusters

Working of Dendrogram in Hierarchical Clustering

How to Interpret a Dendrogram:

Steps Involved in Agglomerative Hierarchical Clustering

Divisive Hierarchical Clustering

What is Divisive Hierarchical Clustering?

How Divisive Clustering Works:

Example:

Key Features of Divisive Hierarchical Clustering:

Pros of Divisive Hierarchical Clustering:

Cons of Divisive Hierarchical Clustering:

Divisive vs. Agglomerative:

Advantages of Hierarchical Clustering

Disadvantages of Hierarchical Clustering

Applications of Hierarchical Clustering

Key Takeaways-

Related Blogs

What is Unsupervised...

K-Means Clustering

DBSCAN Clustering

Python Implementatio...

Python Implementatio...

Python Implementatio...

Python implementatio...

Python implementatio...

Divisive Hierarchica...

Clustering Algorithm...

Python implementatio...

Clustering Algorithm...

Python Implementatio...

Dimensionality Reduc...

Principal Component...

Dimensionality Reduc...

Python implementatio...

Dimensionality Reduc...

Non-Negative Matrix...

Dimensionality Reduc...

Locally Linear Embed...

Dimensionality Reduc...

Step-by-Step Python...

Anomaly Detection in...

Get In Touch

Categories