Cruising through the basics of Polynomial Regression

Source: Deep Learning on Medium

Cruising through the basics of Polynomial Regression

Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). I assume that the readers have a clear understanding of Linear regression. I wont dive deep into mathematics and will try to give you a hands-on experience of polynomial regression in Python.

Let us take the basic data set to explain the concept.

Once the data has been imported along with the requisite libraries, we will have a look on the graph to understand the relationship better

Woah! Let us now try to fit a linear regression and then visualize it using a scatter plot. Care should be taken when entering the arguments in the plt.plot(). The second argument is the vector containing the y coordinates of our prediction points

In the above figure, the data points are represented by red dots and the predictions by the blue line. As can be seen from the above figure, the predictions are not correct. A lot of red points are away from the blue line.

Let us solve this problem by creating a non-linear model so that the red points lie closer to the predictions (Blue-Line). So to build this model we create a new class that will give us some tools to integrate some polynomial terms in our regression equation. This class is in preprocessing library. After creating this class our next step is to create an object which in this case is poly_reg. This will enable us to transform our original set of features into a new set of features consisting of the polynomial terms . The magnitude of degree of polynomial is passed as an argument to the PolynomialFeatures class.

Once the object poly_reg is created, then we use the same to fit on the original datapoints (X) and transform it to a new set of features (X_poly) which consists of the additional polynomial terms (In this case only one term). It can be seen that in addition to the polynomial term, a column of ones has been added automatically. That’s a bias!

After the new matrix of features is created, we fit the linear regression model on the transformed features.

The main objective of attaching the image is to make you practice the code and not copy-paste it.

So our regression model is almost ready. Let us write the code to visualize our predictions. We have to make some changes in the Plotting code that we wrote above for linear regression. I hope you got it right! See the Code, Just changing lin_reg to lin_reg2 wont help because lin_reg2 is still an object of Linear Regression class, we need to pass the transformed set of features as an argument. Let us run the code to see the results!

This shows that the blue curve is approaching much better than the blue line in the linear regression. This can be made even better by increasing the degree of polynomial. Try it increasing to 3! You will see that the predictions will get better in this case and try it with 4 degree as well. Higher degrees may cause overfitting.

In case you want to make the curve look more smooth and continuous, we need to make slight changes in the code. We use arange function from numpy that contains all the levels( Lower and Upper bounds) plus incremented steps. This gives us the vector and we need the matrix, So we use the reshape function from numpy to get X_grid matrix. Okay, Let us execute the code to get the smooth curve.

Smoothed Curve

We can clearly see that the data points lie closer to the blue line and the model is performing better than the previous one.

I will conclude my making the predictions with Linear as well as polynomial regression models

Predicted Values