Unsupervised Learning January 01 ,2025

Step-by-Step Python Implementation of Spectral Clustering

Let's implement Spectral Clustering using Python with detailed steps, example data, and outputs.

Step 1: Install Required Libraries

We will use scikit-learn, numpy, and matplotlib for the Spectral Clustering algorithm, creating and visualizing data.

You can install these libraries using pip if you don't have them already:

pip install numpy matplotlib scikit-learn

Step 2: Import Required Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

Step 3: Generate Example Data

We'll create a toy dataset using make_moons from sklearn.datasets. This dataset is a simple two-class dataset, which is ideal for clustering.

# Create synthetic data - two interlocking crescent shapes (moons)
X, y = make_moons(n_samples=300, noise=0.1, random_state=42)

# Standardize the features for better results
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Plot the dataset to visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='viridis')
plt.title("Original Moon Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Step 4: Apply Spectral Clustering

Now, we’ll apply the Spectral Clustering algorithm from sklearn.cluster. The number of clusters kk will be set to 2, as our dataset consists of two clusters.

# Initialize SpectralClustering with 2 clusters
spectral = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', random_state=42)

# Fit the model and predict cluster labels
labels = spectral.fit_predict(X_scaled)

# Plot the clustering results
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels, cmap='viridis')
plt.title("Spectral Clustering Results")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Step 5: Understanding Parameters of Spectral Clustering

  • n_clusters: Specifies the number of clusters.
  • affinity: Defines how to compute the similarity matrix. The options are:
    • nearest_neighbors: Uses a K-nearest neighbors graph to calculate similarities.
    • rbf: Uses a Gaussian (RBF) kernel.
  • random_state: Controls the randomness of the clustering.

In this case, we use nearest_neighbors to compute the affinity matrix, which helps when dealing with non-convex clusters, like our moon-shaped dataset.

Step 6: Output and Interpretation

Visualizing Original Data and Clustering Output:

  1. Original Data:
    • The dataset consists of two crescent-shaped clusters (interlocking moons).
    • We visualize the data with points colored according to their ground truth class labels (y).
  2. Clustering Results:
    • The points are colored according to the predicted cluster labels obtained from Spectral Clustering.
    • The algorithm is able to correctly separate the two moon-shaped clusters.

Step 7: Performance Evaluation (Optional)

If you have the true labels (which we do, since we generated the synthetic dataset), you can evaluate the clustering performance using metrics such as accuracy, adjusted_rand_score, or silhouette_score.

from sklearn.metrics import accuracy_score, adjusted_rand_score

# Evaluating clustering performance using accuracy and ARI
accuracy = accuracy_score(y, labels)
ari = adjusted_rand_score(y, labels)

print(f"Clustering Accuracy: {accuracy:.2f}")
print(f"Adjusted Rand Index: {ari:.2f}")

Step 8: Modify Parameters and Experiment

You can experiment with different parameters of the SpectralClustering class to observe how the clustering results change:

  1. Change affinity to rbf (Gaussian Kernel):

    spectral_rbf = SpectralClustering(n_clusters=2, affinity='rbf', random_state=42)
    labels_rbf = spectral_rbf.fit_predict(X_scaled)
    
    plt.figure(figsize=(8, 6))
    plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels_rbf, cmap='viridis')
    plt.title("Spectral Clustering with RBF Kernel")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()
    
  2. Change the Number of Neighbors (for nearest_neighbors affinity):

    spectral_k = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', n_neighbors=10, random_state=42)
    labels_k = spectral_k.fit_predict(X_scaled)
    
    plt.figure(figsize=(8, 6))
    plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=labels_k, cmap='viridis')
    plt.title("Spectral Clustering with 10 Neighbors")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()
    

Step 9: Conclusion

This is a simple yet powerful example of applying Spectral Clustering to a dataset. Spectral Clustering is particularly useful for datasets that have non-convex, complex shapes. By using eigenvalues and eigenvectors, it transforms the problem of clustering into a more tractable problem in a lower-dimensional space.

Key Takeaways:

  • Spectral Clustering uses graph theory to cluster data points based on their similarities.
  • The affinity parameter determines how the similarity matrix is constructed. This allows flexibility in how the data is represented.
  • Spectral Clustering can handle clusters with arbitrary shapes, making it useful in many real-world scenarios where traditional algorithms like K-means fail.

Next Blog- Principal component analysis (PCA)

Purnima
0

You must logged in to post comments.

Related Blogs

Get In Touch

123 Street, New York, USA

+012 345 67890

techiefreak87@gmail.com

© Design & Developed by HW Infotech