Understanding Linear Regression and The Need For Gradient Descent

Source: Deep Learning on Medium


Go to the profile of Judy T Raj

Here, I’ll be implementing Univariate Linear Regression from the scratch in python to better understand the algorithm and the need for gradient descent. Here is a quick overview of the linear regression algorithm, before we do that.

Linear Regression : Overview

Let’s say we have the height and weight of all our friends. Assuming that a person’s weight is just a function of their height and no other factor, how do we predict the weight of someone new if we had their height? This is where we use univariate linear regression. When the value to be predicted, or the target variable is continuous, we employ linear regression.Linear regression is a machine learning algorithm which tries to fit a given set of points onto a straight line, y = ax + b.

In our example, y, the dependent variable, would be the height and x, the independent variable, would be the weight and the job of the linear regression algorithm is to find the values for a and b such that ax + b would draw a straight line that fits all the (weight, height) coordinates i.e., give the corresponding weight. Now lets talk about how it does that.

The Algorithm

The linear regression algorithm finds the right values for a and b by minimising the mean squared error (MSE) of the distances between the predicted points and the actual points.

This is essentially an optimisation problem where the values of a and b needs to be decided and this is decided by finding the minimal value for some loss function. The algorithm starts off with some initial values for a and b and computes y. It then finds the distance between the actual point (y_actual) and the point given by the predicted y value (y_pred). The algorithm computes the sum of the squares of the distances and the mean of this sum. This mean value or the MSE is the loss function that needs to be minimised by the optimisation problem.

Image result for linear regression mse
Finding the distance between the predicted points and the actual points

Python Implementation

Now that we know how the algorithm works, lets try to implement this in python.

Lets import some libraries that we are gonna need, matplotlib to plot our line and numpy to work with our data.

import matplotlib.pyplot as plt
import numpy as np

Now lets make up some values for our x and y,

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

Both x and y has a shape of (10,) and looks like the figure below when plotted,

plt.scatter(x, y)

Lets initialise our a and b and define the equation for computing y_pred and mse.

a = b = 0
y_ = a * x + b
mse = np.mean((y — y_)**2)
print(mse)
>>54.1

The initial value of mse is 54.1 for the given values of x and y. Our objective now is to bring it down as close to zero as possible.

Lets pick a direction to traverse and update the values of a and b until we get a zero for our mse. I’m gonna increment both a and b by the random value, 0.01 in each step and see where it gets us.

while mse > 0:

a = a+0.01
b = b+0.01
y_ = a * x + b

if np.mean((y — y_)**2) < mse :
mse = np.mean((y — y_)**2)

else :
print (a,b,mse)
break
>>>1.1900000000000008 1.1900000000000008 0.5634000000000003

Our mse has come down to the value of 0.563 when our a and b are both ar 1.19.

Lets plot the resulting line and see how it fits with our actual points.

y_pred = a * x + b
plt.figure(figsize = (12, 10))
plt.plot(x, y_pred, label = ‘Predicted’)
plt.scatter(x, y, label = ‘Actual’)
plt.show()

Our line seems to fit the points but not all that well. Lets compare this result to the results produced by a full-fledged linear regression model, like the sklearn one.

x = x.reshape(-1,1)
y = y.reshape(-1,1)
from sklearn.linear_model import LinearRegression
reg_model = LinearRegression().fit(x, y)
y_pred = reg_model.predict(x)
plt.figure(figsize = (12, 10))
plt.plot(x, y_pred, label = ‘Predicted’)
plt.scatter(x, y, label = ‘Actual’)
plt.show()

The line generated by the sklearn regressor seems to fit the points perfectly. Lets see what their values of a and b are.

reg_model.intercept_
>> array([1.23636364])
reg_model.coef_
>> array([[1.16969697]])

Close but not the same. Clearly, they took a different route to minimize their mse and ended up with different values.

Lets try again for a different set of x and y.

x = np.linspace(-1, 1, 101)
y = 4 * x + np.random.randn(101) * 0.44
print (x.shape , y.shape)
>> (101,) (101,)

Our initial mse now is 5.7978788117428826. Lets run the same code again and see what values of a and b we end up with, in our quest to get this 5.7 to 0.

while mse > 0:

a = a+0.01
b = b+0.01
y_ = a * x + b

if np.mean((y — y_)**2) < mse :
mse = np.mean((y — y_)**2)

else :
print (a,b,mse)
break
>>>1.0300000000000007 1.0300000000000007 4.4175244648874

Clearly, the route we picked was a dead end. Our mse gets stuck at 4.4, a long way from 0 when we tried incrementing both a and b by 0.01. No matter, lets try a different route.

while mse > 0:

a=a+0.1
b=b-0.01
y_ = a * x + b

if np.mean((y — y_)**2) < mse :
mse = np.mean((y — y_)**2)

else :
print (a,b,mse)
break
>> 4.000000000000002 -0.4000000000000002 0.34416766289398254

That seems better. Lets try and plot this and see how well the line fits.

Again, close but could be better. It is obvious that what needs changing is the way we traverse up and down the number line looking for the right values of a and b. But how do we know whether to decrement or increment and if so, by what value? There are a million different paths we could take and there’s no one path for every dataset.

The answer to this question is Gradient Descent. We minimise the loss by travelling iteratively in the direction of the steepest descent of negative gradient of the loss.

A model hyperparameter called the learning rate determines the step size. learning rate x gradient gives the value by which we update a and b. We’ll look deeper into the gradient descent algorithm in a later post.