Machine Learning February 02 ,2025

Random Forest for Regression

We can also use Random Forest for regression tasks. Below is an example using the Boston Housing dataset

Step 1: Import Required Libraries

Before implementing Random Forest, import the necessary Python libraries.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error

Step 2: Load and Explore the Dataset

In this step we will read the data some of the open repositories like Kaggle dataset, UCI Machine Learning Repository etc and explore the data to understand the features and its importance.

We will use the Housing dataset as an example for regression.


housing = fetch_california_housing()
df_housing = pd.DataFrame(housing.data, columns=housing.feature_names)
df_housing['target'] = housing.target

df_housing.head() #print top 5 rows of the dataset
OUTPUT:
	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	target
0	8.3252	41.0	6.984127	1.023810	322.0	2.555556	37.88	-122.23	4.526
1	8.3014	21.0	6.238137	0.971880	2401.0	2.109842	37.86	-122.22	3.585
2	7.2574	52.0	8.288136	1.073446	496.0	2.802260	37.85	-122.24	3.521
3	5.6431	52.0	5.817352	1.073059	558.0	2.547945	37.85	-122.25	3.413
4	3.8462	52.0	6.281853	1.081081	565.0	2.181467	37.85	-122.25	3.422

Step 3: Split the Data

In this step we will split the dataset into training and testing datasets and also apply the scaling to  bring all the features on the same scale.

# Split into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(df_housing.drop(columns=['target']), df_housing['target'], test_size=0.2, random_state=42)

Step 4: Initialize Random Forest Regressor

We will create the instance of RandomForestRegressor for regression.

rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

Step 5: Train the model

Train the model using the training data (X_train, y_train)

rf_regressor.fit(X_train, y_train)

RandomForestRegressor(random_state=42)

Step 6  Predict on test data

As we have trained the model now time is to test the model using testing data (X_test, y_test)

Now, we predict on the test dataset.

y_pred_reg = rf_regressor.predict(X_test)
print(y_pred_reg)

OUTPUT:
[0.5095    0.74161   4.9232571 ... 4.7582187 0.71409   1.65083  ]

Step 7: Evaluate Performance

mse = mean_squared_error(y_test, y_pred_reg)
print(f'Mean Squared Error: {mse:.2f}')
Mean Squared Error: 0.26

Key Takeaways

  • Random Forest Regressor is an ensemble method using multiple decision trees.
  • Works well for continuous data prediction like house prices.
  • Reduces overfitting by averaging multiple trees.
  • Mean Squared Error (MSE) measures prediction accuracy.
  • Scalable & robust for real-world regression tasks.

    Next Blog- Gradient Boosting in Machine Learning
Purnima
0

You must logged in to post comments.

Get In Touch

123 Street, New York, USA

+012 345 67890

techiefreak87@gmail.com

© Design & Developed by HW Infotech