Linear Regression is one of the most fundamental algorithms in Machine Learning you will ever encounter. Linear Regression involves fitting a linear function through the data which can be used to predict a continuous linear value like prices, stocks, houses, etc. A continuous value is anything that can be any real number. Regression involves predicting continuous real values and classification involves predicting classes. But this article will only cover regression.
Note: This article will use university mathematics so a strong grasp on multivariable calculus (especially partial derivatives) and linear algebra is recommended.
This article will teach you how single variable and multivariable Linear Regression works. Let’s first look at a classic Linear Regression problem-
Predicting House Prices using its features
This is a classic Linear Regression problem which involves predicting house prices using its features like its area, bedrooms, etc. We are covering single variable linear regression so we will only use one feature to predict the price which for our example is the number of bedrooms. Let’s go over some machine learning terminology.
The computer will try to find a linear function which fits the data best. This linear function is called the hypothesis.
The computer will try to adjust the two parameters, theta 0 and theta 1 such that the error is minimized. The question is, how would you find the error and how would you minimize it? This is where the cost function comes in.
The cost function which is also sometimes called the loss function is the function which measures how the computer performs. The most common cost function for linear regression is mean squared error.
The cost function makes the job much easier because now the task is reduced to just minimizing the cost function. One of the most commonly used algorithms for optimization is Gradient Descent.
Gradient Descent is like climbing down a hill until you find a local minimum. If you have a strong grasp of multivariable calculus, then you will know that the gradient is the line of steepest ascent so it follows from that the negative of the gradient is the line of steepest descent. So first you initialize theta 0 and theta 1 randomly then you take steps in the direction of the negative gradient. The size of the steps you take is dependent on the learning rate you choose. If you choose a really big learning rate you might skip the local minimum and if you choose a really small learning rate you will find the local minimum but the training time for the machine learning model will be really long. You will need to experiment with different learning rates to find the perfect learning rate value for your task.
Multivariable Linear Regression
Now we will look at multivariable regression. In multivariable linear regression, we have multiple features to consider unlike the previous example where we only considered the number of bedrooms. For example, considering the area of the lawn, area of the house, etc also. Multivariable linear regression is very similar to its single variable counterpart with only a few changes but the main idea stays the same. There is some changes in the terminology in multivariable linear regression.
Now we can apply gradient descent to the cost function. Gradient Descent nearly stays the same with only a slight change.
These are all the differences between single and multivariable linear regression.
Once the cost function is minimized, the model finds the perfect linear function that fits our data and then the model can predict future values by simply using the linear function the model found. This algorithm is good when the data can be modelled using a line but when it cannot, we have to use another regression algorithm called locally-weighted regression.