Gradient Boosting for Regression
Now, let's use Gradient Boosting for regression on the California Housing dataset.
Step 1: Import Required Libraries
First, we need to import essential Python libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, mean_squared_error
from sklearn.datasets import fetch_california_housing
Step 2: Load and Explore the Dataset
housing = fetch_california_housing()
df_housing = pd.DataFrame(housing.data, columns=housing.feature_names)
df_housing['target'] = housing.target
Step 3: Split dataset
X_train, X_test, y_train, y_test = train_test_split(df_housing.drop(columns=['target']), df_housing['target'], test_size=0.2, random_state=42)
Step 4: Initialize Gradient Boosting Regressor
gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
Step 5:Train the model
gb_regressor.fit(X_train, y_train)
Step 6: Predict on test data
y_pred_reg = gb_regressor.predict(X_test)
Step 7: Evaluate performance
mse = mean_squared_error(y_test, y_pred_reg)
print(f'Mean Squared Error: {mse:.2f}')
Mean Squared Error: 0.29
Key Takeaways
- Gradient Boosting Regressor is a powerful technique for regression tasks.
- Used California Housing dataset to predict house prices.
- Model learns sequentially, improving performance over iterations.
- Achieved a Mean Squared Error (MSE) of 0.29, indicating good accuracy.
Hyperparameter tuning can further improve performance.