How to evaluate Machine Learning models? — Part 1

Original article was published by RAVI SHEKHAR TIWARI on Becoming Human: Artificial Intelligence Magazine

How to evaluate Machine Learning models? — Part 1

Before getting deep into artificial intelligence, it is important to understand which model should deploy? what are the metrics with respect to dataset and the application of model.

Like a Teacher with the model as our students, there are some parameters or we can say benchmark — decides the quality of our model whether the given model will be selected or it will be rejected.

Below is the brief classification of the metrics which we will be discussing in this as well as the upcoming part.

Fig. 1. Metrics

In this part i.e. part 1 we are going to cover varoius metrics. So without any further delay lets start.

Before boarding, let assume that we have trained one regression model whose name is Learned_Model and after training it gives output as “O” (predicted)for n sample and for “n” number of sample it gives n*O, whereas we have actual output denoted as “Y”(target variable) for n sample,

Big Data Jobs

1. MAE (Mean Absolute Error)

MAE is we break this we will have three separate words i.e. Error, Absolute, and Mean (traversing right to left) where,

Error is the difference between the target and predicted value i.e. Y-O

Absolute is defined as the absolute value of the error i.e. |Y-O|, let it be AE

Mean is defined as taking the average of the Absolute error for n sample

i.e ∑ AE /n(AE), where n(AE) is number of sample i.e. n.

Fig. 2 MAE

Link for code: jupyter notebook

2. MSE (Mean Square Error)

Similarly, if we break MSE this we will have three separate words i.e. Error, Square, and Mean (traversing right to left) where,

Error is the difference between the target and predicted value i.e. Y-O

Absolute is defined as the absolute value of the error i.e. (Y-O)², let it be SE

Mean is defined as taking the average of the Absolute error for n sample

i.e ∑ SE/n(SE), where n(SE) is number of sample i.e. n.

Fig. 3 MSE

Note MAE in robust to the outlier than MSE because by squaring the error, the outliers — a high error that other samples get more attention and dominance in the final error hence impact the model parameters.

Link for code: jupyter notebook

Trending AI Articles:

1. Fundamentals of AI, ML and Deep Learning for Product Managers

2. The Unfortunate Power of Deep Learning

3. Graph Neural Network for 3D Object Detection in a Point Cloud

4. Know the biggest Notable difference between AI vs. Machine Learning

3. RMSE (Root Mean Square Error)

Similarly, if we break RMSE this we will have three separate words i.e. Error, Square, Mean, and Root (traversing right to left) where,

Error is the difference between the target and predicted value i.e. Y-O

Absolute is defined as the absolute value of the error i.e. (Y-O)², let it be SE

Mean is defined as taking the average of the Absolute error for n sample

i.e ∑ SE/n(SE), where n(SE) is number of sample i.e. n.

The root is defined as taking the root of the MSE i.e ( ∑ SE/n(SE))^(1/2)

RMSE is highly affected by outliers. In RMSE the power of root empowers it to show a large no. of deviations. As compared to MAE, RMSE gives higher weightage and punishes large errors.

Fig. 4 RMSE

RMSLE denotes Root Mean Square Logarithm Error a derivative of RMSE which scale down the error. Basically it is used when we don’t want to penalize the huge difference in the predicted (Y) and the target value(O), when the predicted value (O) is True Positive i.e. correct.

If both O and Y values are small: RMSE == RMSLE

If either O and Y values are big: RMSE >RMSLE

If both O and Y values are big: RMSE >RMSLE — almost negligible.

Fig. 5 RMSLE

Link for code: jupyter notebook

4. R² (R-squared)

R² is also known as the coefficient of determination used to determine the goodness of fit I .e.shows how well the model fits the line or shows how data fit the regression model. Basically it determines the proportion of variance in the D.V that can be explained by I.V. Generally higher the value of R² better the model performs but in some cases this thumb rule also fails to validate. So, we need to consider the other factors also along with the R².

In simple words, we can say how good is our model when compared to the model which just predicts the mean value of the target from the test set as predictions.

Fig. 6. R² Mathematical Intuition

Link for code: jupyter notebook

5. Adjusted R²

As we have discussed R² above, similarly Adjusted R² is an improvised version of the R².

R² is biased it means if we add new features or I.V if does not penalize the model respective of the correlation of the I.V. It always increases or remains the same i.e. no penalization for uncorrelated I.V.

Fig. 7 Problem with R²

In order to counter this problem of non-decrement of R² value, a penalizing factor has been introduced which penalizes the R².

Theoretical intuition of Adjusted R² refer below image:

Fig. 8 Adjusted R² theoretical intuition.

Mathematical intuition of Adjusted R² refer below image:

Fig. 9 Adjusted R² Mathematical intuition.

Link for code: jupyter notebook

6. Variance

In Decision trees, the inverse variance is defined by the homogeneity of the node i.e. less the variance more the homogeneity, and hence purity of the node is increased. Variance is used to decide the split of the node which has continuous samples.

The larger the spread of the tail larger the dispersion hence variance is large in terms of Normal Distribution.

For formulae refer to the image.

Fig. 10 Variance Mathematical intuition

Link for code: jupyter notebook

7. AIC

AIC refers to Akaike Information Centre is a method for scoring and selecting a model, developed by Hirotugu Akaike.

AIC shows the relationship between Kullback-Leibler measurement and the likelihood estimation of the model. It penalizes the complex model less i.e. it emphasize more on the training dataset by penalizing the model with an increase in the IV. AIC selects the complex model, the lower the value of AIC better the model.

AIC= -2/N *LL + 2*k/N, where

N= No. of Samples in training,

LL=LogLikelihood of model

K= no. of parameters in training dataset.

For mathematical explanation refer to this image.

Fig. 11 AIC Mathematical Intuition and 2nd order of AIC

Link for code: jupyter notebook

8. BIC

BIC is a variant of AIC known as Bayesian Information Criterion. Unlike the BIC, it penalizes the model for its complexity and hence selects the simple model. The complexity of the model increases, BIC increases hence the chance of the selection decreases.

BIC = -2 * LL + log(N)*k, where

k= no. of parameters,

LL= logLikelihood,

N= No of sample in training dataset.

For mathematical explanation refer to this image.

Fig. 12 BIC Mathematical Intuition

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr P. Supraja”- guided me throughout the journey, from bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Contact me:

If you have any query feel free to contact me on any of the below-mentioned options:

Website: www.rstiwari.com

Google Form: https://forms.gle/mhDYQKQJKtAKP78V7

Notebook for Reference:

Jovian Notebook: https://jovian.ml/tiwari12-rst/metrices-part1

References:

Data Generation: https://www.weforum.org

Environment Set-up: https://medium.com/@tiwari11.rst

Jovian: https://jovian.ml/docs/user-guide/install.html

Keras Metric: https://keras.io/api/metrics/

Youtube: https://www.youtube.com/channel/UCFG5x-VHtutn3zQzWBkXyFQ

Don’t forget to give us your 👏 !


How to evaluate Machine Learning models? — Part 1 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.