Original article was published by Chaitanyanarava on Deep Learning on Medium

Chapter-3 : Evaluation Metrics:

## Step_3–1 : Transforming Output Variables:

There are five outcomes in our model i.e., [0,1,2,3,4]. Now we can transform these labels either using

- One hot encoding : It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains “0” or “1” corresponding to which column it has been placed.

For example if the target label is 2 then it will be encoded as [0,0,1,0,0]

- Ordinal Regression : It is used to learn ranking or ordering on instances, which has the property of both classification and metric regression. The learning task of ordinal regression is to assign data points into a set of finite ordered categories i.e., A teacher rates students performance using A,B,C,D and E (A>B>C>D>E)

For example if the target label is 2 then it will be encoded as [1,1,1,0,0] which means the sample with class-2 also belongs to the classes before it (0,1)

According to the this paper using ordinal regression especially in case of categorical target variables which are ordered that too in healthcare will be of more helpful.

## Step_3–2 : Cohen kappa metric:

The competition has given weighted kappa score as an evaluation metric. Let’s just see what it is meant for.

- It is used to measure the degree of agreement between raters among categorical data.
- It is a simple extension to accuracy where it finds the simple percent agreement calculation (True predictions / Total predictions) while kappa score makes it more robust by considering the agreement occurred by chance.

**Quadratic Weighted Cohen’s kappa score:**

- The weighted kappa allows disagreements to be weighted differently and it is useful when scores are ordered.
- It makes use of three matrices:

- The matrix of observed scores.
- The matrix of expected scores.
- Weight matrix

**Why using weighted kappa metric?**

There are lot of metrics to compare the results like confusion matrix, accuracy, recall, precision etc.., Then why to use new metric?

- It is useful when the target labels are ordered like [0,1,2,3,4].
- In case of disagreement, the score is given according to the distance of predicted and observed elements. It is not only taking care of agreement but posing penalty on the disagreement.
- The score will be higher if the true prediction is 3 and model predicted as 2 and will be lower if the model prediction is 4. This property will be very helpful because in medical getting higher than expected will be okay because on later diagnosis it can be rectified. But if the patient got class-1 and it is predicted as class-0 then it will be a problem because that patient is ignored which can adversely effects the patient as well as the reputation of the hospital.