To implement Decision Tree Regression using Python and sklearn
for predictive modeling.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
# sample dataset (House Features: Size in sq ft, Bedrooms, Age)
data = {
"Size": [1500, 1800, 2400, 3000, 3500, 4000, 4200, 5000, 5500, 6000],
"Bedrooms": [3, 3, 4, 3, 5, 4, 5, 6, 5, 7],
"Age": [10, 15, 20, 18, 30, 8, 12, 5, 7, 2],
"Price": [300000, 350000, 400000, 450000, 500000, 600000, 620000, 750000, 800000, 900000]
}
# DataFrame
df = pd.DataFrame(data)
# splitting independent (X) and dependent (y) variables
X = df[["Size", "Bedrooms", "Age"]]
y = df["Price"]
# training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# creating and training the Decision Tree model
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)
# predicting house prices
y_pred = model.predict(X_test)
# model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# results
print("Mean Squared Error:", mse)
print("R-squared Score:", r2)
# plotting actual vs predicted prices
plt.scatter(y_test, y_pred, color='blue', label="Predicted vs Actual")
plt.plot(y_test, y_test, color='red', linestyle="dashed", label="Perfect Fit Line")
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Decision Tree: Actual vs Predicted House Prices")
plt.legend()
plt.show()
Output:
Mean Squared Error: 21200000000.0
R-squared Score: 0.5812345679012345
The program trains a Decision Tree Regression model and predicts house prices based on Size, Bedrooms, and Age. Expected output includes:
Explanation:
DecisionTreeRegressor()
.