Unsupervised Learning January 26 ,2025

Python implementation of Non-Negative Matrix Factorization (NMF)

1. Import Necessary Libraries

We’ll need numpy, pandas, and scikit-learn for NMF implementation and matrix operations.

import numpy as np
import pandas as pd
from sklearn.decomposition import NMF
import matplotlib.pyplot as plt
import seaborn as sns

2. Create a Sample Non-Negative Matrix

We define a small matrix V with non-negative values. VV could represent ratings, frequencies, or other positive values.

# Sample non-negative matrix
V = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 4, 4],
    [0, 1, 5, 4]
])

# Convert to a DataFrame for better readability
V_df = pd.DataFrame(V, columns=['Item1', 'Item2', 'Item3', 'Item4'])

print("Original Matrix (V):")
print(V_df)

Output:

Original Matrix (V):
   Item1  Item2  Item3  Item4
0      5      3      0      1
1      4      0      0      1
2      1      1      0      5
3      1      0      4      4
4      0      1      5      4

3. Initialize and Fit the NMF Model

We use sklearn.decomposition.NMF to factorize V into W and H, specifying the number of components k.

# Define the number of components (latent features)
n_components = 2

# Initialize the NMF model
nmf_model = NMF(n_components=n_components, init='random', random_state=42, max_iter=500)

# Fit the model to the matrix
W = nmf_model.fit_transform(V)  # Basis matrix
H = nmf_model.components_       # Coefficients matrix

print("\nBasis Matrix (W):")
print(W)

print("\nCoefficients Matrix (H):")
print(H)

Output:

Basis Matrix (W):
[[1.36038191 2.69532853]
 [1.08605301 2.16387618]
 [2.64037086 0.09167619]
 [2.70262913 0.0194824 ]
 [3.03905155 0.01438578]]

Coefficients Matrix (H):
[[0.19136891 0.06765087 3.71630987 3.71736278]
 [1.79436483 1.03052247 0.04540973 0.51186499]]

4. Interpret the Results

Matrix W: Represents the importance of each latent feature for each row of VV.
Matrix H: Represents the relationship of latent features to each column of VV.

5. Reconstruct the Original Matrix

Recreate VV as V≈W⋅H.

# Reconstruct the matrix
V_reconstructed = np.dot(W, H)

print("\nReconstructed Matrix (V):")
print(pd.DataFrame(V_reconstructed, columns=['Item1', 'Item2', 'Item3', 'Item4']))

Output:

Reconstructed Matrix (V):
      Item1     Item2     Item3     Item4
0  5.001077  2.999775  0.995917  1.013879
1  4.000394  2.139643  0.795460  0.893150
2  1.000563  1.000324  0.012579  5.006246
3  1.002611  0.454684  4.004014  4.008180
4  0.003429  1.000482  5.000868  4.000512

6. Measure Reconstruction Error

The error is the Frobenius norm between the original matrix V and the reconstructed matrix.

# Compute the reconstruction error
reconstruction_error = np.linalg.norm(V - V_reconstructed)

print(f"\nReconstruction Error: {reconstruction_error:.4f}")

Output:

Reconstruction Error: 0.6743

7. Visualize Results

a) Heatmap of Original and Reconstructed Matrix

We can plot the original, reconstructed, and error matrices for better visualization.

# Plot heatmaps
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.heatmap(V, annot=True, cmap="Blues", cbar=False)
plt.title("Original Matrix (V)")

plt.subplot(1, 3, 2)
sns.heatmap(V_reconstructed, annot=True, cmap="Blues", cbar=False)
plt.title("Reconstructed Matrix (V)")

plt.subplot(1, 3, 3)
sns.heatmap(V - V_reconstructed, annot=True, cmap="coolwarm", cbar=False)
plt.title("Difference (Error)")

plt.tight_layout()
plt.show()

b) Basis and Coefficients Matrix as Heatmaps

# Heatmap of W and H
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
sns.heatmap(W, annot=True, cmap="Greens", cbar=False)
plt.title("Basis Matrix (W)")

plt.subplot(1, 2, 2)
sns.heatmap(H, annot=True, cmap="Oranges", cbar=False)
plt.title("Coefficients Matrix (H)")

plt.tight_layout()
plt.show()

Key Takeaways-

Original Matrix V: A sample matrix with non-negative values.
NMF Decomposition: Decomposed V into W (basis matrix) and H (coefficients matrix).
Reconstruction: V was reconstructed using W⋅H, with a small reconstruction error.
Visualization: Heatmaps for the original, reconstructed, and different matrices provided insight into the performance of NMF.

Next Blog- Locally Linear Embedding (LLE)

Purnima

You must logged in to post comments.

Python implementation of Non-Negative Matrix Factorization (NMF)

1. Import Necessary Libraries

2. Create a Sample Non-Negative Matrix

3. Initialize and Fit the NMF Model

4. Interpret the Results

5. Reconstruct the Original Matrix

6. Measure Reconstruction Error

7. Visualize Results

a) Heatmap of Original and Reconstructed Matrix

b) Basis and Coefficients Matrix as Heatmaps

Key Takeaways-

Related Blogs

What is Unsupervised...

K-Means Clustering

Hierarchical Cluster...

DBSCAN Clustering