Supervised Learning January 01 ,2025

Python implementation of Ridge Regression

Step 1: Import the Necessary Libraries

First, we import the required libraries.

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Import numpy and pandas for data handling
import numpy as np
import pandas as pd

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Scikit-learn libraries for preprocessing and modeling
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

Step 2: Generate or Load the Dataset

We generate a synthetic dataset for Ridge Regression. In this case, we simulate a dataset that has some linear relationship.

# Generating a dataset
np.random.seed(0)
X = np.random.rand(300, 1) * 10  # 300 random data points scaled by 10
y = 2 * X + 3 + np.random.randn(300, 1) * 2  # Linear relation with noise

# Convert the data to a DataFrame for easier handling
data = pd.DataFrame({'X': X.flatten(), 'y': y.flatten()})
data.head()

Output:

		X			y
0	5.488135	14.289283
1	7.151894	17.768149
2	6.027634	13.860635
3	5.448832	13.421820
4	4.236548	8.624974

Step 3: Visualize the Data

We visualize the relationship between X and y to understand the pattern.

plt.scatter(data['X'], data['y'], color='blue', label='Data')
plt.title("Scatter Plot of X vs y")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

Step 4: Train-Test Split

Now, we split the data into training and testing sets.

X_train, X_test, y_train, y_test = train_test_split(data[['X']], data['y'], test_size=0.2, random_state=42)

print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])

Output:

Training set size: 80
Testing set size: 20

Step 5: Feature Scaling

Since Ridge Regression is sensitive to feature scaling, we scale the features using StandardScaler.

# Initialize the scaler
scaler = StandardScaler()

# Scale the training and test data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Display the first 5 rows of scaled training data
print("Scaled Training Data:\n", X_train_scaled[:5])

Output:

Scaled Training Data:
 [[ 0.02403683]
 [-0.91675579]
 [-0.252981  ]
 [ 0.35312163]
 [-1.69304482]]

Step 6: Fit the Ridge Regression Model

We train the Ridge Regression model on the scaled data.

# Initialize the Ridge model with alpha (regularization strength)
ridge_model = Ridge(alpha=1.0)

# Fit the model
ridge_model.fit(X_train_scaled, y_train)

# Print the coefficients and intercept
print("Model Coefficients:", ridge_model.coef_)
print("Intercept:", ridge_model.intercept_)

Output:

Model Coefficients: [5.66775475]
Intercept: 12.933196681277819

Step 7: Evaluate the Model

We now make predictions on the test set and evaluate the model’s performance using R², Mean Absolute Error (MAE), and Mean Squared Error (MSE).

# Make predictions
y_pred = ridge_model.predict(X_test_scaled)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("R² Score:", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

Output:

R² Score: 0.8655909673477045
Mean Absolute Error: 1.649257093583459
Mean Squared Error: 4.483610836129543

Step 8: Visualize the Predictions

We compare the true values and predicted values using a scatter plot.

# Visualizing the true vs predicted values
plt.scatter(X_test, y_test, color='blue', label='True Values')
plt.scatter(X_test, y_pred, color='red', label='Predicted Values')
plt.title("True vs Predicted Values")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

Step 9: Visualize the Ridge Regression Line

Finally, we visualize the Ridge Regression line fitted on the entire dataset to see the model fit.

# Plot the Ridge Regression Line
plt.scatter(data['X'], data['y'], color='blue', label='Data')
plt.plot(X_test, y_pred, color='red', label='Ridge Regression Line')
plt.title("Ridge Regression Line")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

Summary of Outputs:

Scatter Plot of X vs y: Visualizes the linear data distribution.
Training and Testing Split Sizes: Confirms the split ratio between training and testing datasets.
Scaled Features: Shows the scaled features ready for model fitting.
Model Coefficients and Intercept: Highlights the learned parameters of the Ridge regression model.
R², MAE, and MSE Scores: Evaluates the model's predictive accuracy.
True vs Predicted Values: Plots the comparison between actual and predicted values.
Ridge Regression Line: Illustrates the fitted regression line.

This completes the Python implementation for Ridge Regression. The model has effectively learned from the data, and the output evaluations confirm the model's performance.

Next Topic- Lasso Regression in Machine Learning

Purnima

You must logged in to post comments.