Unsupervised Learning January 01 ,2025

Step-by-Step Python Implementation of Isomap

We'll use a well-known dataset, the Digits dataset from sklearn, for this demonstration. This dataset consists of 8x8 pixel images of handwritten digits (0-9) and is a good example to apply Isomap for dimensionality reduction.

Step 1: Import Libraries

We will start by importing necessary libraries.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA

numpy: Used for numerical operations.
matplotlib.pyplot: Used to visualize the results.
sklearn.datasets: To load datasets.
sklearn.manifold.Isomap: This is the core module for applying the Isomap algorithm.
sklearn.decomposition.PCA: We will also use PCA to visualize the dimensionality reduction comparison.

Step 2: Load Dataset

We'll load the Digits dataset from sklearn. This dataset consists of 1797 samples of 8x8 images of handwritten digits (0-9).

# Load Digits dataset
digits = datasets.load_digits()
X = digits.data  # Features
y = digits.target  # Labels

# Display the shape of data
print(f"Data shape: {X.shape}")

Output:

Data shape: (1797, 64)

Here, the dataset has 1797 samples, and each sample has 64 features (since it's 8x8 pixel images).

Step 3: Apply Isomap for Dimensionality Reduction

Now, we'll apply the Isomap algorithm to reduce the dimensionality of the data from 64 to 2 (for easy visualization).

# Apply Isomap with n_neighbors=10 and n_components=2 (for 2D visualization)
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)

# Display the shape of the transformed data
print(f"Transformed data shape: {X_isomap.shape}")

Output:

Transformed data shape: (1797, 2)

The dimensionality has been reduced from 64 to 2, and now the data is in 2D space.

Step 4: Visualize the Isomap Results

Let's visualize the 2D representation of the data after applying Isomap.

# Plot the 2D Isomap output
plt.figure(figsize=(8, 6))
plt.scatter(X_isomap[:, 0], X_isomap[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('Isomap Dimensionality Reduction (2D)')
plt.xlabel('First Component')
plt.ylabel('Second Component')
plt.show()

In the plot, each point represents a digit, and the color indicates the digit label (0-9).

Explanation:

X_isomap[:, 0] and X_isomap[:, 1] are the first and second dimensions (after reduction).
c=y colors the points based on their digit labels.
plt.cm.get_cmap('tab10', 10) ensures we have a distinct color map for each class.

Step 5: Compare with PCA

We'll also apply PCA (Principal Component Analysis) to compare how Isomap and PCA differ in dimensionality reduction.

# Apply PCA for comparison
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the 2D PCA output
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('PCA Dimensionality Reduction (2D)')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.show()

Explanation:

PCA projects the data onto the directions of maximum variance. It is a linear method and might not capture non-linear relationships in the data as Isomap does.
We use the same color mapping for consistency in comparison.

Step 6: Results and Analysis

Isomap vs PCA: After visualizing both Isomap and PCA, we can see that Isomap tends to better preserve the non-linear structure of the data. PCA, being linear, might struggle with non-linear manifolds, which are common in high-dimensional data such as image data.

Complete Code

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA

# Load Digits dataset
digits = datasets.load_digits()
X = digits.data  # Features
y = digits.target  # Labels

# Apply Isomap with n_neighbors=10 and n_components=2 (for 2D visualization)
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)

# Plot the 2D Isomap output
plt.figure(figsize=(8, 6))
plt.scatter(X_isomap[:, 0], X_isomap[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('Isomap Dimensionality Reduction (2D)')
plt.xlabel('First Component')
plt.ylabel('Second Component')
plt.show()

# Apply PCA for comparison
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the 2D PCA output
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('PCA Dimensionality Reduction (2D)')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.show()

Final Notes

Isomap:
- Captures non-linear relationships in the data.
- Works well for data lying on a low-dimensional manifold embedded in a higher-dimensional space.
- Computationally more expensive than PCA due to graph construction and shortest-path calculations.
PCA:
- Linear technique that works well for data with linear relationships.
- Faster and less computationally expensive than Isomap but may fail to capture complex structures.

This concludes the step-by-step implementation of Isomap for dimensionality reduction, along with a comparison to PCA.

Next Blog- Anomaly Detection in Unsupervised Learning: A Comprehensive Guide

Purnima

You must logged in to post comments.