To implement Multiple Linear Regression using Python and sklearn
to predict house prices based on multiple features.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# sample dataset (House Features: Size in sq ft, Bedrooms, Age)
data = {
"Size": [1500, 1800, 2400, 3000, 3500, 4000, 4200, 5000, 5500, 6000],
"Bedrooms": [3, 3, 4, 3, 5, 4, 5, 6, 5, 7],
"Age": [10, 15, 20, 18, 30, 8, 12, 5, 7, 2],
"Price": [300000, 350000, 400000, 450000, 500000, 600000, 620000, 750000, 800000, 900000]
}
# DataFrame
df = pd.DataFrame(data)
# splitting independent (X) and dependent (y) variables
X = df[["Size", "Bedrooms", "Age"]]
y = df["Price"]
# training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# creating and training the model
model = LinearRegression()
model.fit(X_train, y_train)
# predicting house prices
y_pred = model.predict(X_test)
# model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# results
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mse)
print("R-squared Score:", r2)
# plotting actual vs predicted prices
plt.scatter(y_test, y_pred, color='blue')
plt.plot(y_test, y_test, color='red', linestyle="dashed") # perfect fit line
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()
Output:
Model Coefficients: [ 109.80609436 18135.41266371 -2774.2807285 ]
Model Intercept: 111253.60186975863
Mean Squared Error: 492420858.93240654
R-squared Score: 0.9902731682186191
The program trains a Multiple Linear Regression model and predicts house prices based on Size, Bedrooms, and Age. Expected output includes:
Explanation:
LinearRegression()
.