Step-by-Step Python Implementation of Isomap
We'll use a well-known dataset, the Digits dataset from sklearn, for this demonstration. This dataset consists of 8x8 pixel images of handwritten digits (0-9) and is a good example to apply Isomap for dimensionality reduction.
Step 1: Import Libraries
We will start by importing necessary libraries.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA
- numpy: Used for numerical operations.
- matplotlib.pyplot: Used to visualize the results.
- sklearn.datasets: To load datasets.
- sklearn.manifold.Isomap: This is the core module for applying the Isomap algorithm.
- sklearn.decomposition.PCA: We will also use PCA to visualize the dimensionality reduction comparison.
Step 2: Load Dataset
We'll load the Digits dataset from sklearn. This dataset consists of 1797 samples of 8x8 images of handwritten digits (0-9).
# Load Digits dataset
digits = datasets.load_digits()
X = digits.data # Features
y = digits.target # Labels
# Display the shape of data
print(f"Data shape: {X.shape}")
Output:
Data shape: (1797, 64)
Here, the dataset has 1797 samples, and each sample has 64 features (since it's 8x8 pixel images).
Step 3: Apply Isomap for Dimensionality Reduction
Now, we'll apply the Isomap algorithm to reduce the dimensionality of the data from 64 to 2 (for easy visualization).
# Apply Isomap with n_neighbors=10 and n_components=2 (for 2D visualization)
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)
# Display the shape of the transformed data
print(f"Transformed data shape: {X_isomap.shape}")
Output:
Transformed data shape: (1797, 2)
- The dimensionality has been reduced from 64 to 2, and now the data is in 2D space.
Step 4: Visualize the Isomap Results
Let's visualize the 2D representation of the data after applying Isomap.
# Plot the 2D Isomap output
plt.figure(figsize=(8, 6))
plt.scatter(X_isomap[:, 0], X_isomap[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('Isomap Dimensionality Reduction (2D)')
plt.xlabel('First Component')
plt.ylabel('Second Component')
plt.show()
In the plot, each point represents a digit, and the color indicates the digit label (0-9).
Explanation:
- X_isomap[:, 0] and X_isomap[:, 1] are the first and second dimensions (after reduction).
- c=y colors the points based on their digit labels.
- plt.cm.get_cmap('tab10', 10) ensures we have a distinct color map for each class.
Step 5: Compare with PCA
We'll also apply PCA (Principal Component Analysis) to compare how Isomap and PCA differ in dimensionality reduction.
# Apply PCA for comparison
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Plot the 2D PCA output
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('PCA Dimensionality Reduction (2D)')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.show()
Explanation:
- PCA projects the data onto the directions of maximum variance. It is a linear method and might not capture non-linear relationships in the data as Isomap does.
- We use the same color mapping for consistency in comparison.
Step 6: Results and Analysis
- Isomap vs PCA: After visualizing both Isomap and PCA, we can see that Isomap tends to better preserve the non-linear structure of the data. PCA, being linear, might struggle with non-linear manifolds, which are common in high-dimensional data such as image data.
Complete Code
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA
# Load Digits dataset
digits = datasets.load_digits()
X = digits.data # Features
y = digits.target # Labels
# Apply Isomap with n_neighbors=10 and n_components=2 (for 2D visualization)
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)
# Plot the 2D Isomap output
plt.figure(figsize=(8, 6))
plt.scatter(X_isomap[:, 0], X_isomap[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('Isomap Dimensionality Reduction (2D)')
plt.xlabel('First Component')
plt.ylabel('Second Component')
plt.show()
# Apply PCA for comparison
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Plot the 2D PCA output
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.get_cmap('tab10', 10))
plt.colorbar()
plt.title('PCA Dimensionality Reduction (2D)')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.show()
Final Notes
- Isomap:
- Captures non-linear relationships in the data.
- Works well for data lying on a low-dimensional manifold embedded in a higher-dimensional space.
- Computationally more expensive than PCA due to graph construction and shortest-path calculations.
- PCA:
- Linear technique that works well for data with linear relationships.
- Faster and less computationally expensive than Isomap but may fail to capture complex structures.
This concludes the step-by-step implementation of Isomap for dimensionality reduction, along with a comparison to PCA.
Next Blog- Anomaly Detection in Unsupervised Learning: A Comprehensive Guide