Machine Learning February 11 ,2025

Step-wise Python Implementation of Gradient Boosting

Gradient Boosting is a powerful ensemble learning technique that builds models sequentially, correcting the errors of previous models. It is widely used for both classification and regression tasks.

Step 1: Import Required Libraries

First, we need to import essential Python libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, mean_squared_error

Step 2: Load and Explore the Dataset

We will use the Iris dataset for classification.

from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Display first 5 rows
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0

Step 3: Split Data into Training and Testing Sets

We split the dataset into training (80%) and testing (20%) sets.

# Define features and target
X = df.drop(columns=['target'])
y = df['target']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Initialize and Train the Gradient Boosting Model

We use GradientBoostingClassifier for classification.

# Create Gradient Boosting Classifier
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

# Train the model
gb_model.fit(X_train, y_train)

Step 5: Make Predictions

Now, we use the trained model to make predictions on test data.

# Predict on test data
y_pred = gb_model.predict(X_test)

Step 6: Evaluate Model Performance

We evaluate the model using accuracy score, confusion matrix, and classification report.

# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

Accuracy: 1.00
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Step 7: Feature Importance

Gradient Boosting allows us to analyze feature importance.

# Plot feature importance
feature_importances = pd.Series(gb_model.feature_importances_, index=X.columns)
feature_importances.sort_values(ascending=True).plot(kind='barh', color='blue')
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title('Feature Importance in Gradient Boosting')
plt.show()

Hyperparameter Tuning in Gradient Boosting

We can fine-tune Gradient Boosting using GridSearchCV.

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 7]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(GradientBoostingClassifier(random_state=42), param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Evaluate best model
best_gb = grid_search.best_estimator_
y_pred_best = best_gb.predict(X_test)
print("Best Model Accuracy:", accuracy_score(y_test, y_pred_best))

Key Takeaways

Gradient Boosting builds models sequentially, correcting previous errors.
Works well for classification (Iris dataset) and regression tasks.
Uses weak learners (decision trees) and boosts their performance.
Feature Importance helps identify key variables.
Hyperparameter Tuning improves accuracy with GridSearchCV.
Achieved 100% accuracy on the Iris dataset (small dataset, may overfit).

Next Blog- Python Implementation of Gradient Boosting for Regression

Purnima

You must logged in to post comments.

Machine Learning

Machine Learning

Step-wise Python Implementation of Gradient Boosting

Step 1: Import Required Libraries

Step 2: Load and Explore the Dataset

Step 3: Split Data into Training and Testing Sets

Step 4: Initialize and Train the Gradient Boosting Model

Step 5: Make Predictions

Step 6: Evaluate Model Performance

Step 7: Feature Importance

Hyperparameter Tuning in Gradient Boosting

Key Takeaways

Related Blogs

Brief introduction o...

Understanding Data T...

Handling Missing Dat...

Feature Engineering...

Data Normalization a...

Splitting Data into...

(Cross-validation, C...

Model Evaluation and...

Model Evaluation and...

Hyperparameter Tunin...

Cross Validation in...

AdaBoost: A Powerful...

Transfer Learning in...

Step-wise Python Imp...

Random Forest for Re...

Gradient Boosting fo...

AdaBoost for Regres...

Gradient Boosting in...

Building a Machine L...

Introduction to ML T...

Case Studies and Ind...

Ethical Consideratio...

Bias and Fairness in...

Transparency and Int...

Career Paths in Mach...

Staying Updated with...

Model Deployment Opt...

Model Monitoring and...

Get In Touch

Categories