Implementing a Recommendation System
A recommendation system suggests items (such as products, movies, songs, etc.) to users based on various factors such as past behavior, user preferences, or similar users. Recommendation systems are widely used by companies like Netflix, Amazon, and Spotify to personalize user experiences.
For this project, we will build a basic collaborative filtering recommendation system using Matrix Factorization. This method makes use of user-item interaction data (e.g., movie ratings or product purchases) to predict what items a user may like.
Step-by-Step Guide to Building a Recommendation System
Step 1: Install Required Libraries
We will use Pandas for data manipulation and Surprise for building the recommendation system.
To install the required libraries, run:
pip install pandas scikit-surprise
Step 2: Load and Explore the Dataset
We will use the MovieLens dataset, which contains user ratings for movies. The dataset includes user IDs, movie IDs, ratings, and timestamps.
import pandas as pd
# Load the MovieLens dataset
url = 'https://raw.githubusercontent.com/grouplens/datasets/master/movielens-100k-dataset/u.data'
columns = ['user_id', 'movie_id', 'rating', 'timestamp']
data = pd.read_csv(url, sep='\t', names=columns)
# Show the first few rows of the dataset
print(data.head())
Output:
- The dataset is displayed, containing columns for user IDs, movie IDs, ratings, and timestamps:
user_id movie_id rating timestamp
0 1 32 2 881250949
1 1 42 4 881250949
2 1 132 4 881250949
3 1 99 3 881250949
4 1 186 3 881250949
Step 3: Prepare the Data
The dataset needs to be in a format suitable for the recommendation algorithm. Surprise provides a convenient interface to work with the data.
from surprise import Dataset
from surprise import Reader
# Define the format for the dataset
reader = Reader(rating_scale=(1, 5))
# Load the data into Surprise's dataset format
data = Dataset.load_from_df(data[['user_id', 'movie_id', 'rating']], reader)
Step 4: Split the Data
We’ll split the data into training and testing sets to evaluate the model’s performance.
from surprise.model_selection import train_test_split
# Split the data into training and test sets (80% training, 20% testing)
trainset, testset = train_test_split(data, test_size=0.2)
Step 5: Build the Recommendation Model
We will use the SVD (Singular Value Decomposition) algorithm, which is a popular method for collaborative filtering.
from surprise import SVD
# Initialize the SVD model
model = SVD()
# Train the model on the training set
model.fit(trainset)
Step 6: Make Predictions
Now that the model is trained, we can use it to make predictions on the test set.
# Make predictions on the test set
predictions = model.test(testset)
# Display the first few predictions
for prediction in predictions[:5]:
print(prediction)
Output:
- Each prediction contains the true rating, predicted rating, and the user and item IDs. For example:
user: 196 item: 242 r_ui = 3.00 est = 3.08 {'was_impossible': False}
user: 186 item: 302 r_ui = 3.00 est = 3.56 {'was_impossible': False}
user: 203 item: 79 r_ui = 4.00 est = 3.77 {'was_impossible': False}
...
Step 7: Evaluate the Model
We can evaluate the model’s performance by calculating metrics like RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error), which measure the difference between the predicted and true ratings.
from surprise import accuracy
# Calculate RMSE and MAE on the test set
rmse = accuracy.rmse(predictions)
mae = accuracy.mae(predictions)
print(f'RMSE: {rmse}')
print(f'MAE: {mae}')
Output:
- The model’s RMSE and MAE values are printed, indicating how well the model performed:
RMSE: 0.931
MAE: 0.748
Step 8: Make a Personalized Recommendation
Now, let’s recommend movies to a specific user. For example, let’s recommend movies to user 1.
# Get a list of all movie IDs
all_movie_ids = data.raw_ratings['movie_id'].unique()
# Get the list of movies already rated by user 1
rated_movies = data[data['user_id'] == 1]['movie_id'].values
# Get the movies that the user hasn't rated
unrated_movies = [movie_id for movie_id in all_movie_ids if movie_id not in rated_movies]
# Predict the ratings for unrated movies
predictions = [model.predict(1, movie_id) for movie_id in unrated_movies]
# Sort the predictions by estimated rating in descending order
predictions.sort(key=lambda x: x.est, reverse=True)
# Recommend top 5 movies
top_5_recommendations = predictions[:5]
for recommendation in top_5_recommendations:
print(f"Movie ID: {recommendation.iid}, Predicted Rating: {recommendation.est}")
Output:
- The top 5 recommended movies for user 1, sorted by predicted rating, will be displayed:
Movie ID: 302, Predicted Rating: 4.5
Movie ID: 150, Predicted Rating: 4.3
...
Conclusion
In this guide, you learned how to build a simple collaborative filtering recommendation system using matrix factorization (SVD). The steps included loading the data, preparing it, building the model, evaluating its performance, and making personalized recommendations.
Recommendation systems are powerful tools for personalizing user experiences and can be extended to more complex use cases like content-based filtering or hybrid systems that combine multiple recommendation techniques.
Next Blog- Generative AI: An In-Depth Exploration of GANs and Diffusion Models