Original article was published on Artificial Intelligence on Medium
Geometric Interpretation of Linear Regression
Understand how the cost function of linear regression is derived using geometric interpretation
Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). In geometric interpretation terms, the linear regression algorithm tries to find a plane or line that best fits the data points as well as possible. Linear regression is a regression technique that predicts real value.
What does the term “finding plane that best fits the data points” mean?
For the above-given sample 2-dimension dataset (Image 1), the general equation of the line that covers as numbers of points as possible is y = m*x+c, where m is the slope of the line, and c is the intercept term. The linear regression algorithm tries to find a line/plane for which the cost function is minimized. Later in this article, you will know how a cost function is derived.
We will represent the above equation of plane as y = w1*x + w0
Similarly, for sample 3-dimension dataset (Image 3) the equation of the plane which best fits as many points as possible is y = w1*x1 + w2*x2 + w0.
The same equation can be extended for a d-dimension dataset:
So, we need to find a plane (W, w_0) of the above equation that best fit most of the data points.
Deep Dive into Derivation of Geometric Interpratation of the algorithm:
For any point P (in image 3), y_iAct is the actual output value of the point whereas y_iPre is the predicted value. Hence the error can be calculated as:
Since the error can be positive and negative as y_iPred can be above or below the plane/line, so to maintain positive error we square the error for each x_i
The error function follows a parabolic curve, which means error (Y-axis) will always be positive.
We need to minimize the errors for all the set of points, so the cost function of linear regression is:
The cost function defines that we need to find a plane with given W, w_0 such that the error for all the sets of points is minimized. Replacing y_iPred with the equation of plane the new cost function becomes:
Use an optimizer to compute the optimal value of W, w_0 which minimizes the above cost function. A gradient descent optimizer can be used to find the optimal value of the plane (W, w_0) for which the squared error is minimum.
Prediction for a Query Point:
For a query point ‘Qx’ (Image 5) the corresponding predicted value is Ypred which can be computed using the equation of plane (W, w0) using the above equation.
Thank You for Reading!