Source: Deep Learning on Medium
Polynomial Regression in R — when to use and how to use
( Sometimes tweaking linear model by few parameters make your model more accurate)
Let’s say we want to perform regression analysis on the non-linear dataset. For example, we can take a look at the level of an employee vs. salary offered dataset available here. You are free to use your own dataset for the problem, the requirement is only that the data should be non-linear. If it is linear, simple linear regression will work just fine. If you are interested in linear regression as well check out my linear regression for busy data scientists tutorial here. The data looks like the following:
Let’s create a linear plot for our data using linear regression. The tutorial for performing simple linear regression is here.
Now it is clear that regression is not doing good in our data, so we need some kind of non-linear model.
So now let’s create additional features for polynomial regression.
> data$Level1 = data$Level^2> data$Level2 = data$Level^3> data$Level3 = data$Level^4
Now let’s plot with the new features.
my_model = lm(Salary ~ Level + Level1 + Level2 + Level3,data=data)> library(ggplot2)> ggplot() ++ geom_point(aes(x = data$Level, y = data$Salary),+ colour = ’red’) ++ geom_line(aes(x = data$Level, y = predict(my_model,newdata = data)),+ colour = ’blue’) ++ ggtitle(’Truth or Bluff (Polynomial Regression)’) ++ xlab(’Level’) ++ ylab(’Salary’)
And it gives us the following polt.
You can see it is much better now. So let’s make some predictions now.
> predict(my_model, data.frame(Level = 6.5,+ Level1 = 6.5ˆ2,+ Level2 = 6.5ˆ3,+ Level3 = 6.5ˆ4))Result: 158862.5