Geometric Interpretation of Linear Regression

Original article was published on Artificial Intelligence on Medium

Geometric Interpretation of Linear Regression

Understand how the cost function of linear regression is derived using geometric interpretation

Photo by Isaac Smith on Unsplash

Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). In geometric interpretation terms, the linear regression algorithm tries to find a plane or line that best fits the data points as well as possible. Linear regression is a regression technique that predicts real value.

What does the term “finding plane that best fits the data points” mean?

Image 1: Representation of a sample 2-dimension dataset

For the above-given sample 2-dimension dataset (Image 1), the general equation of the line that covers as numbers of points as possible is y = m*x+c, where m is the slope of the line, and c is the intercept term. The linear regression algorithm tries to find a line/plane for which the cost function is minimized. Later in this article, you will know how a cost function is derived.

We will represent the above equation of plane as y = w1*x + w0

Image 2: Representation of sample 3-dimension dataset

Similarly, for sample 3-dimension dataset (Image 3) the equation of the plane which best fits as many points as possible is y = w1*x1 + w2*x2 + w0.

The same equation can be extended for a d-dimension dataset:

So, we need to find a plane (W, w_0) of the above equation that best fit most of the data points.

Deep Dive into Derivation of Geometric Interpratation of the algorithm:

Image 3

For any point P (in image 3), y_iAct is the actual output value of the point whereas y_iPre is the predicted value. Hence the error can be calculated as:

Since the error can be positive and negative as y_iPred can be above or below the plane/line, so to maintain positive error we square the error for each x_i

Source: Google Plots, Plot for y=x²

The error function follows a parabolic curve, which means error (Y-axis) will always be positive.

We need to minimize the errors for all the set of points, so the cost function of linear regression is:

The cost function defines that we need to find a plane with given W, w_0 such that the error for all the sets of points is minimized. Replacing y_iPred with the equation of plane the new cost function becomes:

Use an optimizer to compute the optimal value of W, w_0 which minimizes the above cost function. A gradient descent optimizer can be used to find the optimal value of the plane (W, w_0) for which the squared error is minimum.

Prediction for a Query Point:

Image 5

For a query point ‘Qx’ (Image 5) the corresponding predicted value is Ypred which can be computed using the equation of plane (W, w0) using the above equation.

Thank You for Reading!