Blogs

Unsupervised Learning January 01 ,2025

Step-by-Step Python Implementation of Spectral Clustering

Let's implement Spectral Clustering using Python with detailed steps, example data, and outputs.

Step 1: Install Required Libraries

We will use scikit-learn, numpy, and matplotlib for the Spectral Clustering algorithm, creating and visualizing data.

You can install these libraries using pip if you don't have them already:

pip install numpy matplotlib scikit-learn

Step 2: Import Required Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

Step 3: Generate Example Data

We'll create a toy dataset using make_moons from sklearn.datasets. This dataset is a simple two-class dataset, which is ideal for clustering.

# Create synthetic data - two interlocking crescent shapes (moons)
X, y = make_moons(n_samples=300, noise=0.1, random_state=42)

# Standardize the features for better results
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Plot the dataset to visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='viridis')
plt.title("Original Moon Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Step 4: Apply Spectral Clustering

Now, we’ll apply the Spectral Clustering algorithm from sklearn.cluster. The number of clusters kk will be set to 2, as our dataset consists of two clusters.

# Initialize SpectralClustering with 2 clusters
spectral = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', random_state=42)

# Fit the model and predict cluster labels
labels = spectral.fit_predict(X_scaled)

# Plot the clustering results
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels, cmap='viridis')
plt.title("Spectral Clustering Results")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Step 5: Understanding Parameters of Spectral Clustering

n_clusters: Specifies the number of clusters.
affinity: Defines how to compute the similarity matrix. The options are:
- nearest_neighbors: Uses a K-nearest neighbors graph to calculate similarities.
- rbf: Uses a Gaussian (RBF) kernel.
random_state: Controls the randomness of the clustering.

In this case, we use nearest_neighbors to compute the affinity matrix, which helps when dealing with non-convex clusters, like our moon-shaped dataset.

Step 6: Output and Interpretation

Visualizing Original Data and Clustering Output:

Original Data:
- The dataset consists of two crescent-shaped clusters (interlocking moons).
- We visualize the data with points colored according to their ground truth class labels (y).
Clustering Results:
- The points are colored according to the predicted cluster labels obtained from Spectral Clustering.
- The algorithm is able to correctly separate the two moon-shaped clusters.

Step 7: Performance Evaluation (Optional)

If you have the true labels (which we do, since we generated the synthetic dataset), you can evaluate the clustering performance using metrics such as accuracy, adjusted_rand_score, or silhouette_score.

from sklearn.metrics import accuracy_score, adjusted_rand_score

# Evaluating clustering performance using accuracy and ARI
accuracy = accuracy_score(y, labels)
ari = adjusted_rand_score(y, labels)

print(f"Clustering Accuracy: {accuracy:.2f}")
print(f"Adjusted Rand Index: {ari:.2f}")

Step 8: Modify Parameters and Experiment

You can experiment with different parameters of the SpectralClustering class to observe how the clustering results change:

Change affinity to rbf (Gaussian Kernel):

spectral_rbf = SpectralClustering(n_clusters=2, affinity='rbf', random_state=42)
labels_rbf = spectral_rbf.fit_predict(X_scaled)

plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels_rbf, cmap='viridis')
plt.title("Spectral Clustering with RBF Kernel")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Change the Number of Neighbors (for nearest_neighbors affinity):

spectral_k = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', n_neighbors=10, random_state=42)
labels_k = spectral_k.fit_predict(X_scaled)

plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels_k, cmap='viridis')
plt.title("Spectral Clustering with 10 Neighbors")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Step 9: Conclusion

This is a simple yet powerful example of applying Spectral Clustering to a dataset. Spectral Clustering is particularly useful for datasets that have non-convex, complex shapes. By using eigenvalues and eigenvectors, it transforms the problem of clustering into a more tractable problem in a lower-dimensional space.

Key Takeaways:

Spectral Clustering uses graph theory to cluster data points based on their similarities.
The affinity parameter determines how the similarity matrix is constructed. This allows flexibility in how the data is represented.
Spectral Clustering can handle clusters with arbitrary shapes, making it useful in many real-world scenarios where traditional algorithms like K-means fail.

Next Blog- Principal component analysis (PCA)

Purnima

You must logged in to post comments.