Linear Regression Python Implementation

Step 1: Reading the Data

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
data_frame = pd.read_csv('kc_house_data.csv')
X = data_frame[['bedrooms', 'sqft_living']]
y = data_frame['price']
X = X.to_numpy()
y = y.to_numpy()

Step 2: Scaling the Data

scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)

Step 3: Declaring Constants

learning_rate = 1 # This decides our step size in Gradient Descent
m = len(X) # This is the number of training examples

Step 4: Making the Cost Function

def cost_function(theta, X, y):
sum = 0
for index, x_val in enumerate(X):
prediction = theta[0]+theta[1]*x_val[0]+theta[2]*x_val[1]
difference = prediction-y[index]
difference_square = difference**2
sum+=difference_square
error = (sum)/(2*m)
return error

Step 5: Making the Derivative Functions

This the Gradient Descent Algorithm, Image by : https://ahaanpandya.medium.com/linear-regression-explained-868914443188
Gradient Descent Algorithm after evaluating the derivative terms, Image from : https://stackoverflow.com/questions/29583026/implementing-gradient-descent-algorithm-in-matlab
def d_theta_0(t):
answer = 0
for index, x_value in enumerate(X):
pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
diff = pred-y[index]
answer+=diff
answer = (answer)/(m)
return answer
def d_theta_1(t):
answer = 0
for index, x_value in enumerate(X):
pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
diff = pred-y[index]
diff_2 = diff*x_value[0]
answer+=diff_2
answer = (answer)/(m)
return answer
def d_theta_2(t):
answer = 0
for index, x_value in enumerate(X):
pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
diff = pred-y[index]
diff_2 = diff*x_value[1]
answer+=diff_2
answer = (answer)/(m)
return answer
# The parameter t for all of these function means the vector theta which contains the parameters for our hypothesis function

Step 6: Training the Model

epochs = 150
loss_history = [] # This is for tracking the loss at each epoch so we can plot the loss later on
parameters = np.random.rand(3,1) # This will be create a vector of 3 random parameters
for i in range(epochs):
p = parameters.copy() # We make a copy of the parameters so that we can assign each parameter simultaneously
parameters[0]-=(learning_rate*d_theta_0(p))
parameters[1]-=(learning_rate*d_theta_1(p))
parameters[2]-=(learning_rate*d_theta_2(p))
loss = cost_function(parameters, X, y)
loss_history.append(loss)
plt.plot(range(1, 151), loss_history)
plt.show()
My graph of the loss history over 150 epochs
with open('parameters.txt', 'w') as f:
f.write(str(parameters[0])+'\n'+str(parameters[1])+'\n'+str(parameters[2]))

Step 7: Predicting using the model

def predict_price(sqft, no_of_bedrooms):
price = parameters[0]+parameters[1]*no_of_bedrooms+parameters[2]*sqft
print(price)

Conclusion

Thanks for Reading!

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Move your analytics, not your data

Mercari Price Suggestion Challenge

The best of the Times and the Sunday Times data and digital storytelling team — 2021

Great expectations — An Introduction.

Understanding Apache Hive LLAP

GA COVID-19 Report January 26, 2022

Entity Resolution for Master Data Management

Analyze COVID data in less than 5 minutes!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahaan Pandya

Ahaan Pandya

More from Medium

Kernel Density Estimation in Python [Part 1/2]

Spearsman’s Rank Correlation

YouTube Video Review: Statistics with Python (3 of 3)

Python linear regression