Original article was published on Artificial Intelligence on Medium
Machine Learning : Regression
What is Regression? In statistical modelling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variable (often called ‘predictors’, ‘covariates’, or ‘features’).
What is Regression Model?
Regression models are of two types:
i) Linear Regression Model.
ii) Non-Linear Regression Model.
If the independent variable is time, then we are forecasting future values. Otherwise, we predict present but unknown values. Regression techniques vary from Linear Regression to Random Forest Regression. Here, we are discussing about Linear Regression Model. We use this technique in case study like predicting Salary, Grades, etc.
Types of Linear Regression Model?
Simple Linear Regression: This technique of regression works under the condition when their is only one independent variable to predict dependent variable.
Here, the equation of Regression Line is:
Salary = bo + b1*Experience.
The regression model puts the line which best fit the data and the observation came out is more the slope, more the change in Salary.
Finding The Best Fitting Line:
To get the best fitting line the regression model finds various lines of regression and find the minimum sum of all the instances of data in dataset. The one with minimum sum value is considered as final Trend Line which best fits the data.
The mothod used in this technique is called Least Square Method.
Multiple Linear Regression: This technique of regression works under the condition when their are more than one independent variable to predict dependent variable.
Before going further we should know that their are some assumptions of linear regression:
- Multivariate Normality.
- Independence of errors.
- Lack of Multicollinearity.
Dummy Variables: When we are dealing with multiple regression we come across dataset columns containing various fields of data. When we deal with column contating string data we consider as a categorical data and we convert it into numerical data.
We can see that a column named State contains string and considering as a categorical data we change into numerical values.
It is prescribed that not to use all the dummy variable. If we have 2 dummy variable use one and if we have 3 use only 2 and so on. This means use one less the number of dummy variables. It helps to remove Multicollinearity.
Conclusion: The above description about the linear regression very well describes about the main regression steps followed throughout the model building process.