Original article can be found here (source): Deep Learning on Medium

# Getting started with deep learning with PyTorch

Here in this post, I will take you through basic model building, training and testing using PyTorch.

PyTorch is a deep learning framework built by Facebook.

I will take very popular **MNIST** dataset and classify the images between the digits (0–9). This is basically like a “hello, world” program of deep learning.

Rather than just writing and showing full code, I will take you through important parts of the process, which are basic fundamental blocks of any Neural network model building process.

**Data loading****Defining model****Set loss function and optimizer****Training**

Interesting thing is, starting from first step, PyTorch library provides APIs to easily load the data, define the model and out of the box functions for optimization and loss calculation.

Let’s quickly start with **Data Loading**

PyTorch provides *torchvision *module, which consists popular datasets, model architectures for computer vision analysis. We will download our MNIST dataset from torchvision. It’s as easy as writing two lines,

`from torchvision.datasets import MNIST`

trainset = MNIST(root=’~/datasets/’, train=True, download=True)

testset = MNIST(root=’~/datasets/’, train=False, download=True)

root: where to download the data

train: whether it is training data or not

download: it downloads and stores the data locally

**Defining model**

PyTorch fundamental library named *torch *provides us with **nn.Module **class, which we can use for defining neural network layers as below

from torch import nn

class MNIST_NN(nn.Module):

def __init__(self,):

super().__init__()

self.pool = nn.MaxPool2d(3, stride=2)

self.relu = nn.ReLU()

self.conv1 = nn.Conv2d(1, 16, (3, 3), padding=1)

self.conv2 = nn.Conv2d(16, 32, (3, 3), padding=1)

self.fc1 = nn.Linear(32 * 6 * 6, 240)

self.fc2 = nn.Linear(240, 120)

self.fc3 = nn.Linear(120, 10)

def forward(self, x):

x = self.conv1(x)

x = self.relu(x)

x = self.pool(x)

x = self.pool(F.relu(self.conv2(x)))

x = x.view(-1, 32 * 6 * 6)

x = self.relu(self.fc1(x))

x = self.relu(self.fc2(x))

x = self.fc3(x)

return xnet = MNIST_NN() # initialize the neural model

torch has mostly self-explanatory classes and methods,**nn.Conv2d** is a 2D convolution layer takes (channels, filters, kernel-size & padding) as arguments**nn.Linear** represents a fully connected layer takes (number of input nodes, output nodes) as arguments**view** is same as the **reshape** function in *numpy*

**Most important**In PyTorch we just initialize the layers in __init__ methods and we define the network in forward method. We don’t deal anything with backpropogation or gradient calculation. Based on the network, PyTorch internally defines the backpropogation network and calculates the gradients, which we will shortly see while training.

Before we move into **training** the neural network model, we need a way to divide the whole data into mini-batches and input it into the network. For this task, PyTorch provides a **DataLoader** class which can take the full dataset and return mini-batches based on the batch-size value.

`from torch.utils.data import DataLoader`

trainloader = DataLoader(trainset, batch_size=8, shuffle=True)

testloader = DataLoader(testset, batch_size=8, shuffle=True)

If you check the output of the trainloader, it is an iterable

# batch_index, (data, labels)

x_batch_idx, (x_data, x_labels) = next(enumerate(testloader))

x_batch_idx, x_data.shape, x_labels.shapeoutput:

(0, torch.Size([8, 1, 28, 28]), torch.Size([8]))

Next is defining the **loss function** & **optimizer**PyTorch provides

*torch.optim*library for defining the optimizer and we can define the criteria of loss function using nn.CrossEntropyLoss() method, since the number of class labels are 10.

I chose stochastic gradient descent with learning rate of 0.01, (lesser for faster convergence)

`import torch.optim as optim`

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

**Training **is very intuitive as you can see below

`epochs = 2`

for epoch in range(epochs):

for batch_idx, data in enumerate(trainloader):

inputs, labels = data

optimizer.zero_grad()

outputs = net(inputs)

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

We just define number of epochs to train and loop through the training process that many times

Everytime we iterate a batch, these steps are followed in same order for gradient calculation and updating parameters

- optimizer.zero_grad(), will reset all the gradients that were accumulated in previous mini-batch run
- loss.backward(), calculates the loss
- optimizer.step(), based on loss value takes a step at updating the gradients

**Saving, Loading the model**

`torch.save(net.state_dict(), './models/mnist.pth') # to save`

net.load_state_dict(torch.load('./models/mnist.pth')) # to load the weights into the neural net model object

**Evaluating**

correct = 0

total = 0with torch.no_grad(): # because we don't want to update gradients

for data in testloader:

images, labels = data

outputs = net(images) # prediction

total += labels.size(0)

_, predicted = torch.max(outputs, axis=1)

correct += (predicted == labels).sum().item()print(f'Accuracy: {(correct/total):.5f}')output:

Accuracy: 0.98650

Thanks for reading the article.

If you are experienced, please suggest if there are any mistakes. If you are a beginner, hope this article is helpful to you.