PyTorch [Tabular] —Multiclass Classification

Original article can be found here (source): Deep Learning on Medium

PyTorch [Tabular] —Multiclass Classification

This blog post takes you through an implementation of multi-class classification on tabular data using PyTorch.

We will use the wine dataset available on Kaggle. This dataset has 12 columns where the first 11 are the features and the last column is the target column. The data set has 1599 rows.

Classifier meme [Image [1]]

Import Libraries

We’re using tqdm to enable progress bars for training and testing loops.

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from import Dataset, DataLoader, WeightedRandomSampler

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

Read Data

df = pd.read_csv("data/tabular/classification/winequality-red.csv")df.head()
Input data [Image [2]]

EDA and Preprocessing

To make the data fit for a neural net, we need to make a few adjustments to it.

Class Distribution

First off, we plot the output rows to observe the class distribution. There’s a lot of imbalance here. Classes 3, 4, and 8 have a very few number of samples.

sns.countplot(x = 'quality', data=df)
Class distribution bar plot [Image [3]]

Encode Output Class

Next, we see that the output labels are from 3 to 8. That needs to change because PyTorch supports labels starting from 0. That is [0, n]. We need to remap our labels to start from 0.

To do that, let’s create a dictionary called class2idx and use the .replace() method from the Pandas library to change it. Let’s also create a reverse mapping called idx2class which converts the IDs back to their original classes.

To create the reverse mapping, we create a dictionary comprehension and simply reverse the key and value.

class2idx = {

idx2class = {v: k for k, v in class2idx.items()}

df['quality'].replace(class2idx, inplace=True)

Create Input and Output Data

In order to split our data into train, validation, and test sets using train_test_split from Sklearn, we need to separate out our inputs and outputs.

Input X is all but the last column. Output y is the last column.

X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]

Train — Validation — Test

To create the train-val-test split, we’ll use train_test_split() from Sklearn.

First we’ll split our data into train+val and test sets. Then, we’ll further split our train+val set to create our train and val sets.

Because there’s a class imbalance, we want to have equal distribution of all output classes in our train, validation, and test sets. To do that, we use the stratify option in function train_test_split().

# Split into train+val and test
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=69)

# Split train into train-val
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.1, stratify=y_trainval, random_state=21)

Normalize Input

Neural networks need data that lies between the range of (0,1). There’s a ton of material available online on why we need to do it.

To scale our values, we’ll use the MinMaxScaler() from Sklearn. The MinMaxScaler transforms features by scaling each feature to a given range which is (0,1) in our case.

x_scaled = (x-min(x)) / (max(x)–min(x))

Notice that we use .fit_transform() on X_train while we use .transform() on X_val and X_test.

We do this because we want to scale the validation and test set with the same parameters as that of the train set to avoid data leakage. fit_transform calculates scaling values and applies them while .transform only applies the calculated values.

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
X_train, y_train = np.array(X_train), np.array(y_train)
X_val, y_val = np.array(X_val), np.array(y_val)
X_test, y_test = np.array(X_test), np.array(y_test)

Visualize Class Distribution in Train, Val, and Test

Once we’ve split our data into train, validation, and test sets, let’s make sure the distribution of classes is equal in all three sets.

To do that, let’s create a function called get_class_distribution() . This function takes as input the obj y , ie. y_train, y_val, or y_test. Inside the function, we initialize a dictionary which contains the output classes as keys and their count as values. The counts are all initialized to 0.

We then loop through our y object and update our dictionary.

def get_class_distribution(obj):
count_dict = {
"rating_3": 0,
"rating_4": 0,
"rating_5": 0,
"rating_6": 0,
"rating_7": 0,
"rating_8": 0,

for i in obj:
if i == 0:
count_dict['rating_3'] += 1
elif i == 1:
count_dict['rating_4'] += 1
elif i == 2:
count_dict['rating_5'] += 1
elif i == 3:
count_dict['rating_6'] += 1
elif i == 4:
count_dict['rating_7'] += 1
elif i == 5:
count_dict['rating_8'] += 1
print("Check classes.")

return count_dict

Once we have the dictionary count, we use Seaborn library to plot the bar charts. The make the plot, we first convert our dictionary to a dataframe using pd.DataFrame.from_dict([get_class_distribution(y_train)]) . Subsequently, we .melt() our convert our dataframe into the long format and finally use sns.barplot() to build the plots.

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(25,7))# Train
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_train)]).melt(), x = "variable", y="value", hue="variable", ax=axes[0]).set_title('Class Distribution in Train Set')
# Validation
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_val)]).melt(), x = "variable", y="value", hue="variable", ax=axes[1]).set_title('Class Distribution in Val Set')
# Test
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_test)]).melt(), x = "variable", y="value", hue="variable", ax=axes[2]).set_title('Class Distribution in Test Set')
Class distribution in train, val, and test sets [Image [4]]

Neural Network

We’ve now reached what we all had been waiting for!

Custom Dataset

First up, let’s define a custom dataset. This dataset will be used by the dataloader to pass our data into our model.

We initialize our dataset by passing X and y as inputs. Make sure X is a float while y is long.

class ClassifierDataset(Dataset):

def __init__(self, X_data, y_data):
self.X_data = X_data
self.y_data = y_data

def __getitem__(self, index):
return self.X_data[index], self.y_data[index]

def __len__ (self):
return len(self.X_data)

train_dataset = ClassifierDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).long())
val_dataset = ClassifierDataset(torch.from_numpy(X_val).float(), torch.from_numpy(y_val).long())test_dataset = ClassifierDataset(torch.from_numpy(X_test).float(), torch.from_numpy(y_test).long())

Weighted Sampling

Because there’s a class imbalance, we use stratified split to create our train, validation, and test sets.

While it helps, it still does not ensure that each mini-batch of our model see’s all our classes. We need to over-sample the classes with less number of values. To do that, we use the WeightedRandomSampler.

First, we obtain a list called target_list which contains all our outputs. This list is then converted to a tensor and shuffled.

target_list = []for _, t in train_dataset:

target_list = torch.tensor(target_list)
target_list = target_list[torch.randperm(len(target_list))]

Then, we obtain the count of all classes in our training set. We use the reciprocal of each count to obtain it’s weight. Now that we’ve calculated the weights for each class, we can proceed.

class_count = [i for i in get_class_distribution(y_train).values()]
class_weights = 1./torch.tensor(class_count, dtype=torch.float)
###################### OUTPUT ######################tensor([0.1429, 0.0263, 0.0020, 0.0022, 0.0070, 0.0714])

WeightedRandomSampler expects a weight for each sample. We do that using as follows.

class_weights_all = class_weights[target_list]

Finally, let’s initialize our WeightedRandomSampler. We’ll call this in our dataloader below.

weighted_sampler = WeightedRandomSampler(

Model Parameters

Before we proceed any further, let’s define a few parameters that we’ll use down the line.

EPOCHS = 400

NUM_FEATURES = len(X.columns)


Let’s now initialize our dataloaders.

For train_dataloader we’ll use batch_size = 64 and pass our sampler to it. Note that we’re not using shuffle=True in our train_dataloader because we’re already using a sampler. These two are mutually exclusive.

For test_dataloader and val_dataloader we’ll use batch_size = 1 .

train_loader = DataLoader(dataset=train_dataset,
val_loader = DataLoader(dataset=val_dataset, batch_size=1)test_loader = DataLoader(dataset=test_dataset, batch_size=1)

Define Neural Net Architecture

Let’s define a simple 3-layer feed-forward network with dropout and batch-norm.

class MulticlassClassification(nn.Module):
def __init__(self, num_feature, num_class):
super(MulticlassClassification, self).__init__()

self.layer_1 = nn.Linear(num_feature, 512)
self.layer_2 = nn.Linear(512, 128)
self.layer_3 = nn.Linear(128, 64)
self.layer_out = nn.Linear(64, num_class)

self.relu = nn.ReLU()
self.dropout = nn.Dropout(p=0.2)
self.batchnorm1 = nn.BatchNorm1d(512)
self.batchnorm2 = nn.BatchNorm1d(128)
self.batchnorm3 = nn.BatchNorm1d(64)

def forward(self, x):
x = self.layer_1(x)
x = self.batchnorm1(x)
x = self.relu(x)

x = self.layer_2(x)
x = self.batchnorm2(x)
x = self.relu(x)
x = self.dropout(x)

x = self.layer_3(x)
x = self.batchnorm3(x)
x = self.relu(x)
x = self.dropout(x)

x = self.layer_out(x)

return x

Check if GPU is active.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print(device)
###################### OUTPUT ######################

Initialize the model, optimizer, and loss function. Transfer the model to GPU. We’re using the nn.CrossEntropyLoss because this is a multiclass classification problem. We don’t have to manually apply a log_softmax layer after our final layer because nn.CrossEntropyLoss does that for us. However, we need to apply log_softmax for our validation and testing.

Loss function meme [Image [5]]
model = MulticlassClassification(num_feature = NUM_FEATURES, num_class=NUM_CLASSES)

criterion = nn.CrossEntropyLoss(
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
###################### OUTPUT ######################
(layer_1): Linear(in_features=11, out_features=512, bias=True)
(layer_2): Linear(in_features=512, out_features=128, bias=True)
(layer_3): Linear(in_features=128, out_features=64, bias=True)
(layer_out): Linear(in_features=64, out_features=6, bias=True)
(relu): ReLU()
(dropout): Dropout(p=0.2, inplace=False)
(batchnorm1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(batchnorm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(batchnorm3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

Train the model

Before we start our training, let’s define a function to calculate accuracy per epoch.

This function takes y_pred and y_test as input arguments. We then apply log_softmax to y_pred and extract the class which has a higher probability.

After that, we compare the the predicted classes and the actual classes to calculate the accuracy.

def multi_acc(y_pred, y_test):
y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
_, y_pred_tags = torch.max(y_pred_softmax, dim = 1)

correct_pred = (y_pred_tags == y_test).float()
acc = correct_pred.sum() / len(correct_pred)

acc = torch.round(acc) * 100

return acc

We’ll also define 2 dictionaries which will store the accuracy/epoch and loss/epoch for both train and validation sets.

accuracy_stats = {
'train': [],
"val": []
loss_stats = {
'train': [],
"val": []

Let’s TRAAAAAIN our model!

Training meme [Image [6]]
print("Begin training.")for e in tqdm(range(1, EPOCHS+1)):

train_epoch_loss = 0
train_epoch_acc = 0
for X_train_batch, y_train_batch in train_loader:
X_train_batch, y_train_batch =,

y_train_pred = model(X_train_batch)

train_loss = criterion(y_train_pred, y_train_batch)
train_acc = multi_acc(y_train_pred, y_train_batch)


train_epoch_loss += train_loss.item()
train_epoch_acc += train_acc.item()

with torch.no_grad():

val_epoch_loss = 0
val_epoch_acc = 0

for X_val_batch, y_val_batch in val_loader:
X_val_batch, y_val_batch =,

y_val_pred = model(X_val_batch)

val_loss = criterion(y_val_pred, y_val_batch)
val_acc = multi_acc(y_val_pred, y_val_batch)

val_epoch_loss += train_loss.item()
val_epoch_acc += train_acc.item()

print(f'Epoch {e+0:03}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')

###################### OUTPUT ######################
Epoch 001: | Train Loss: 1.55731 | Val Loss: 1.48898 | Train Acc: 5.556| Val Acc: 0.000Epoch 002: | Train Loss: 1.55930 | Val Loss: 1.27569 | Train Acc: 50.000| Val Acc: 100.000.
Epoch 399: | Train Loss: 0.11390 | Val Loss: 0.10750 | Train Acc: 100.000| Val Acc: 100.000Epoch 400: | Train Loss: 0.11665 | Val Loss: 0.07421 | Train Acc: 100.000| Val Acc: 100.000

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you’re in training mode.

Well, why do we need to do that? If you’re using layers such as Dropout or BatchNorm which behave differently during training and evaluation (for example; not use dropout during evaluation), you need to tell PyTorch to act accordingly.

Similarly, we’ll call model.eval() when we test our model. We’ll see that below.

Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we’ll print out the loss/accuracy and reset it back to 0.

Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader.

We do optimizer.zero_grad() before we make any predictions. Since the backward() function accumulates gradients, we need to set it to 0 manually per mini-batch.

From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward() and optimizer.step() .

Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for that epoch. We add up all the losses/accuracies for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader to obtain the average loss/accuracy per epoch.

The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad and not perform any back-propagation. torch.no_grad() tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.

Visualize Loss and Accuracy

To plot the loss and accuracy line plots, we again create a dataframe from the accuracy_stats and loss_stats dictionaries.

# Create dataframes
train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})# Plot the dataframes
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,7))
sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable", ax=axes[0]).set_title('Train-Val Accuracy/Epoch')sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')