Simple Linear regression model

Source: Deep Learning on Medium

Simple linear regression model

artificial intelligence has been an ever growing land field in the world as well as in Africa. The particular case of linear regression is the “hello world” of the domain, a simple based algorithm which can solve many encountered problems such as:

  • Predict whether a tumor is malignant or benign, in real-estate domain, weather predictions, and so on..lets dive into the issue :)

firstly we need to know what is regression. regression is simply modeling an output(target) in function of independent inputs called predictors. In the case of simple linear regression, we only have one independent variable (x) and a dependent variable (y).

we only talk of linear regression when we have a straight line which best fit the distribution of our data as shown below

linear model

The general equation of a simple linear model is:

Y=aX + b

The objective is to find the values of a and b which best fits our data , these variables are some hyperparameters of our model. before diving into the algorithm, lets look at some basic concepts:

Lost function:

Our linear model predict a target known as y_pred the lost function is the square difference between the predicted value, y_predict and the real value y_truth of a training example , mathematically we have:

with y_pred=aX + b

Cost function:

For the entire training set we calculate the cost function as :

also known as the mean square error function (MSE) . Inorder to get the best line which fits the most our data we convert the cost function into a minimisation problem:

determine the values of a and b for which j(a,b) is a minimum. Using the MSE, we are going to update the values of a and b.Lets look at how to update the hyperparameters:

Gradient descent:

The main idea is to randomly pick up the hyper-parameters a and b and then try to update it insuch a way as to reduce the cost function to its minima.As an anology to gradient descent, consider a man moving down a slope such as to reach its minima. If the man makes small steps, it is obvious that the man will reach down the slope it will just take much more time to get there!. If on the other hand, if the man makes much larger steps, he will probably miss the bottom. So it is important to have an apropriate step rate this step is called the learning rate , and it define how fast the algorithm converges to it minima

gradient descent curve

the question now is how to calculate the different updates?

Short and simple 🙂 we simply substract the derivative of the cost function with respect to each of the hyper-parameters to their respective initial values.Mathematically, we have:

we can therefore update the variables as follows:

The partial derivatives are the gradients and they are used to update a and b. Alpha is the learning rate which is a hyper-parameter that you must specify.So aplha has to be tuned to let the cost function to reach its minimum with a learning rate not too small nor too large. So that’s it for the theorie, let’s go directly into practice