Unsupervised Learning January 01 ,2025

Python implementation of Non-Negative Matrix Factorization (NMF)

1. Import Necessary Libraries

We’ll need numpy, pandas, and scikit-learn for NMF implementation and matrix operations.

import numpy as np
import pandas as pd
from sklearn.decomposition import NMF
import matplotlib.pyplot as plt
import seaborn as sns

2. Create a Sample Non-Negative Matrix

We define a small matrix V with non-negative values. VV could represent ratings, frequencies, or other positive values.

# Sample non-negative matrix
V = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 4, 4],
    [0, 1, 5, 4]
])

# Convert to a DataFrame for better readability
V_df = pd.DataFrame(V, columns=['Item1', 'Item2', 'Item3', 'Item4'])

print("Original Matrix (V):")
print(V_df)

Output:

Original Matrix (V):
   Item1  Item2  Item3  Item4
0      5      3      0      1
1      4      0      0      1
2      1      1      0      5
3      1      0      4      4
4      0      1      5      4

3. Initialize and Fit the NMF Model

We use sklearn.decomposition.NMF to factorize V into W and H, specifying the number of components k.

# Define the number of components (latent features)
n_components = 2

# Initialize the NMF model
nmf_model = NMF(n_components=n_components, init='random', random_state=42, max_iter=500)

# Fit the model to the matrix
W = nmf_model.fit_transform(V)  # Basis matrix
H = nmf_model.components_       # Coefficients matrix

print("\nBasis Matrix (W):")
print(W)

print("\nCoefficients Matrix (H):")
print(H)

Output:

Basis Matrix (W):
[[1.36038191 2.69532853]
 [1.08605301 2.16387618]
 [2.64037086 0.09167619]
 [2.70262913 0.0194824 ]
 [3.03905155 0.01438578]]

Coefficients Matrix (H):
[[0.19136891 0.06765087 3.71630987 3.71736278]
 [1.79436483 1.03052247 0.04540973 0.51186499]]

4. Interpret the Results

Matrix W: Represents the importance of each latent feature for each row of VV.
Matrix H: Represents the relationship of latent features to each column of VV.

5. Reconstruct the Original Matrix

Recreate VV as V≈W⋅H.

# Reconstruct the matrix
V_reconstructed = np.dot(W, H)

print("\nReconstructed Matrix (V):")
print(pd.DataFrame(V_reconstructed, columns=['Item1', 'Item2', 'Item3', 'Item4']))

Output:

Reconstructed Matrix (V):
      Item1     Item2     Item3     Item4
0  5.001077  2.999775  0.995917  1.013879
1  4.000394  2.139643  0.795460  0.893150
2  1.000563  1.000324  0.012579  5.006246
3  1.002611  0.454684  4.004014  4.008180
4  0.003429  1.000482  5.000868  4.000512

6. Measure Reconstruction Error

The error is the Frobenius norm between the original matrix V and the reconstructed matrix.

# Compute the reconstruction error
reconstruction_error = np.linalg.norm(V - V_reconstructed)

print(f"\nReconstruction Error: {reconstruction_error:.4f}")

Output:

Reconstruction Error: 0.6743

7. Visualize Results

a) Heatmap of Original and Reconstructed Matrix

We can plot the original, reconstructed, and error matrices for better visualization.

# Plot heatmaps
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.heatmap(V, annot=True, cmap="Blues", cbar=False)
plt.title("Original Matrix (V)")

plt.subplot(1, 3, 2)
sns.heatmap(V_reconstructed, annot=True, cmap="Blues", cbar=False)
plt.title("Reconstructed Matrix (V)")

plt.subplot(1, 3, 3)
sns.heatmap(V - V_reconstructed, annot=True, cmap="coolwarm", cbar=False)
plt.title("Difference (Error)")

plt.tight_layout()
plt.show()

b) Basis and Coefficients Matrix as Heatmaps

# Heatmap of W and H
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
sns.heatmap(W, annot=True, cmap="Greens", cbar=False)
plt.title("Basis Matrix (W)")

plt.subplot(1, 2, 2)
sns.heatmap(H, annot=True, cmap="Oranges", cbar=False)
plt.title("Coefficients Matrix (H)")

plt.tight_layout()
plt.show()

Key Takeaways-

Original Matrix V: A sample matrix with non-negative values.
NMF Decomposition: Decomposed V into W (basis matrix) and H (coefficients matrix).
Reconstruction: V was reconstructed using W⋅H, with a small reconstruction error.
Visualization: Heatmaps for the original, reconstructed, and different matrices provided insight into the performance of NMF.

Next Blog- Locally Linear Embedding (LLE)

Purnima

You must logged in to post comments.