Machine Learning Day 3

You and I barely agree I know, but today you would not blame me if I conclude that we were thought lots of things today in class that we still don’t understand. Including those you did not know before you came to class and those you think you know now.

We will start today be defining some key terms and phrases used in class today. I know today’s class made you felt that training a machine learning model was the only thing there is to AI. I will congratulate you when you figure out that you are wrong and far from the truth as Lagos is from China.

Definition of some Key terms and phrases

  • Linear Regression : linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables)

Standard linear regression models with standard estimation techniques make a number of assumptions about the predictor variables, the response variables and their relationship. Some of such assumptions you should know include:

  • Weak exogeneity
  • Linearity
  • Lack of perfect multicollinearity in the predictors
  • Constant Variance
  • Independence of errors

Important: Ask google for their definition.

Moving on, lets checkout regularization:

  • Regularization its a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting
  • Neural Networks: This one has a very short definition. It is a computer system modelled on the human brain and nervous system.
  • The null hypothesis taken from the word null, it is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. The null hypothesis is generally assumed to be true until evidence indicates otherwise.
  • Root Mean Square error: This guy right here used to be my worst enemy, telling me to find RMSE was like asking me to read a seaborn heatmap, its so frustrating.
  • I don’t blame you if like me you have never heard anything about ‘Lasso’ before. In statistics and machine learning Lasso means least absolute shrinkage and selection operator. It is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. Although from my point of view Ridge is far better and preferable.
  • Ridge better known as Tikhonov Regularization is the most commonly used regression algorithm to approximate an answer for an equation with no unique solution. This type of problem is very common in machine learning tasks, where the “best” solution must be chosen using limited data.


Linear and non-Linear correlation getting it right.

Correlation is said to be linear if the ratio of change is constant. When the amount of output in a factory is doubled by doubling the number of workers, this is an example of linear correlation.

In other words, when all the points on the scatter diagram tend to lie near a line which looks like a straight line, the correlation is said to be linear

Correlation is said to be non linear if the ratio of change is not constant. In other words, when all the points on the scatter diagram tend to lie near a smooth curve, the correlation is said to be non linear (curvilinear). This is shown in the figure on the right below.

Important: When you are using linear regression model any categorical variable you have in your dataset, knock them out.

Your mean deviation and standard deviation can be very high, but before you KNOCK THEM OFF like Tennis balls check if it correlates well with the other columns.

You can also normalize distribution by using their log transform.

Finally, a simple tip on how to normalize your data.

Normalizing is simply a case of getting all your data on the same scale: if the scales for different features are wildly different, this can have a knock-on effect on your ability to learn (depending on what methods you’re using to do it). Ensuring standardized feature values implicitly weights all features equally in their representation.(well copied from stackexchange)

Go to bed you made it, see you in class Tomorrow.

Source: Deep Learning on Medium