Linear Regression Python Implementation

6 min readFeb 24, 2021

In my last article, I focused on how the algorithm works and the theory behind linear regression but now in this article, I will focus on implementing it in Python, which is an excellent programming language for Data Science and Machine Learning. If you have not read my last article, I highly recommend to read it here: https://ahaanpandya.medium.com/linear-regression-explained-868914443188

Note- I will be using Pandas and Numpy which are python data science libraries and if you are not comfortable with Numpy and Pandas then you might want to look up a tutorial on those libraries. Some knowledge of Matplotlib and Sklearn will also be useful for visualizing how our model performs and processing our data.

Step 1: Reading the Data

In this example, I will be reading the data from a csv file which is usually the preferred file format for storing and reading data.

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as pltdata_frame = pd.read_csv('kc_house_data.csv')
X = data_frame[['bedrooms', 'sqft_living']]
y = data_frame['price']
X = X.to_numpy()
y = y.to_numpy()

In this piece of code, we read the data from a csv file about housing prices into a Pandas DataFrame and then we separate the x and y values of the dataset with the x values having two features, number of sqft and number of bedrooms and the y values having the price. We then convert them into Numpy arrays because they are much easier to work with.

Step 2: Scaling the Data

Before performing Linear Regression, you should always scale all your x values between 0 and 1 because it eases the calculations and it produces more accurate results.

scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)

Sklearn’s MinMaxScaler function is used to scale our data between 0 and 1 so now all the data including the sqft is between 0 and 1. We didn’t scale the y values because it doesn’t ease the calculations and isn’t as beneficial.

Step 3: Declaring Constants

In the last article, I discussed how the step size for Gradient Descent is dependent on the learning rate We will declare the learning rate variable here and also the number of training examples.

learning_rate = 1 # This decides our step size in Gradient Descent
m = len(X) # This is the number of training examples

I chose the learning rate to be 1 because I found it to be a pretty good choice for this dataset.

Step 4: Making the Cost Function

In my previous article, I also talked about the cost or the loss function which is used to determine the error in the model’s predictions. The preferred cost function for Linear Regression is Mean Squared Error.

def cost_function(theta, X, y):
    sum = 0
    for index, x_val in enumerate(X):
        prediction = theta[0]+theta[1]*x_val[0]+theta[2]*x_val[1]
        difference = prediction-y[index]
        difference_square = difference**2
        sum+=difference_square
    error = (sum)/(2*m)
    return error

In this code segment, we make the cost function, loop over all the training examples and sum their squared errors and calculate their mean. The cost function takes in 3 parameters, the x values, the actual y values and the vector theta which is a vector of parameters for our hypothesis function. If any of this sounds new to you, you might want to read my previous article again.

Step 5: Making the Derivative Functions

To optimize our cost function, we can use Gradient Descent which is a good algorithm for optimization.

This the Gradient Descent Algorithm, Image by : https://ahaanpandya.medium.com/linear-regression-explained-868914443188

We can make this algorithm simpler if we solve for the partial derivative terms beforehand.

Gradient Descent Algorithm after evaluating the derivative terms, Image from : https://stackoverflow.com/questions/29583026/implementing-gradient-descent-algorithm-in-matlab

This new gradient descent algorithm is much easier to implement in code. We can make a function for all three of our parameters theta 0, theta 1, and theta 2 because we have two features.

def d_theta_0(t):
    answer = 0    for index, x_value in enumerate(X):
        pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
        diff = pred-y[index]
        answer+=diff    answer = (answer)/(m)
    return answerdef d_theta_1(t):
    answer = 0    for index, x_value in enumerate(X):
        pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
        diff = pred-y[index]
        diff_2 = diff*x_value[0]
        answer+=diff_2    answer = (answer)/(m)
    return answerdef d_theta_2(t):
    answer = 0    for index, x_value in enumerate(X):
        pred = t[0]+t[1]*x_value[0]+t[2]*x_value[1]
        diff = pred-y[index]
        diff_2 = diff*x_value[1]
        answer+=diff_2    answer = (answer)/(m)
    return answer# The parameter t for all of these function means the vector theta which contains the parameters for our hypothesis function

In the above 3 functions, we converted the partial derivative terms into code which can now be used in gradient descent.

Step 6: Training the Model

Now its finally time to train our model and perform the linear regression. We will first initialize our parameters vector for our hypothesis function randomly and then we will improve the parameters using gradient descent. We can run as many gradient descent iterations as we want but after a while, the decrease in loss will be really less and we might overfit to the data. The number of iterations is called epochs in machine learning.

epochs = 150
loss_history = [] # This is for tracking the loss at each epoch so we can plot the loss later on
parameters = np.random.rand(3,1) # This will be create a vector of 3 random parameters
for i in range(epochs):
    p = parameters.copy() # We make a copy of the parameters so that we can assign each parameter simultaneously
    parameters[0]-=(learning_rate*d_theta_0(p))
    parameters[1]-=(learning_rate*d_theta_1(p))
    parameters[2]-=(learning_rate*d_theta_2(p))
    loss = cost_function(parameters, X, y)
    loss_history.append(loss)

This will train the model for 150 epochs and it will improve our parameters. We can plot the loss to make sure the loss is going down.

plt.plot(range(1, 151), loss_history)
plt.show()

I have already run this code before so you should roughly see a graph like this-

My graph of the loss history over 150 epochs

From this graph, we can see that the loss is rapidly going down over 150 epochs. Our model can now predict housing prices with an error of about $100k which is reasonable because we are not using a very big dataset and also we are only using 2 features from the dataset and not the full dataset. The houses also cost about a million dollars so there is more error. If we use the full dataset and all the features in the dataset we can surely reduce the loss even more. We can now save the parameters to a file so that we don’t have to run and train the model again and again because it takes some time to train the model.

with open('parameters.txt', 'w') as f:
     f.write(str(parameters[0])+'\n'+str(parameters[1])+'\n'+str(parameters[2]))

Now our parameters are stored in a file so we just have to read the file whenever we need to use the parameters instead of retraining the model.

Step 7: Predicting using the model

We can now finally predict prices using the model using our parameters!

def predict_price(sqft, no_of_bedrooms):
    price = parameters[0]+parameters[1]*no_of_bedrooms+parameters[2]*sqft
    print(price)

We can now use this function to predict the price of any new house based on the number of bedrooms and the sqft.

Conclusion

Linear Regression is a very useful algorithm in predicting continuous values. This python implementation shows you exactly how it works. I would recommend running this code on a machine with a GPU or Google Colab because Google Colab provides a GPU. Machine Learning is very computationally expensive and GPUs are perfect for the number of operations that need to be performed in machine learning. Try this out and try using another dataset to practice linear regression.