Source: Deep Learning on Medium

This blog is a continuation to the earlier one published as** ****Intro to Deep Learning with pytorch _ part1****.**

#### Course outline:

This course comes with 8 lessons and one lab. the 8 lessons are

**introduction to neural networks**: you will learn the concept behind deep learning and how we train deep neural network with back propogation.**talking pytorch with soumith chintala:**soumith chintala , the creator of pytorch talks past, present and future of pytorch.**introduction to pytorch:**you will learn how to build deep neural networks with pytorch and builds the state of art model using pre-trained networks that classifies dog and cat images.**Convultional neural network**:**Style transfer**: using trained networks to transfer the style of one image to another and implementing style transfer model.**Recurrent neural network**: you will learn how recurrent neural networks learn from sequence of data such as time series and also builds a recurrent neural network that learns from text and generates new text with one character at a time.**Sentiment prediction with RNN**: will build & train a recurrent network that can classify the sentiment of movie reviews.**Deploying pytorch model**: will learn how to use pytorch’s hybrid frontend to convert models from pytorch to C++ for use in production.

In earlier blog we seen clear explanation of **lesson 1 : introduction to Neural networks**, where you been introduced to several concepts like linear boundary, higher dimension, perceptrons, neural networks, perceptrons as logical operators , perceptrons algorithms, error function, Discrete vs Continous predictions, softmax function, One hot encoding, Cross entropy, multiclass cross entropy, Perceptron vs Gradient descent, Neural network architecture,Feed forward and back propogation.

Now in this blog we will cover lesson -2 which is Talking pytorch with soumith chintala and lesson- 3 which is intro to pytorch.

### Lesson-2 Talking Pytorch with soumith chintala

**soumith chintala** is the creator of **pytorch**. you can follow him here from his **twitter.**

In this lesson he tells you everything about pytorch and how it originated and what is its story , what are its applications, what are its implementations and the future of pytorch.

**Origins of pytorch****.****Debugging and Designing of pytorch****.****From Research to Production**.**Cutting edge Applications In Pytorch.****Pytorch and the Facebook Product.****The Future of Pytorch.**

Now lets look at the lesson -3 Intro to Pytorch

### Lesson-3 Intro To Pytorch

I’ll first give you a basic introduction to PyTorch, where we’ll cover **tensors** — the main data structure of PyTorch. I’ll show you how to create tensors, how to do simple operations, and how tensors interact with NumPy.

Then you’ll learn about a module called **autograd** that PyTorch uses to calculate gradients for training neural networks. Autograd, in my opinion, is amazing. It does all the work of backpropagation for you by calculating the gradients at each operation in the network which you can then use to update the network weights.

Next you’ll use PyTorch to build a network and run data forward through it. After that, you’ll define a loss and an optimization method to train the neural network on a dataset of handwritten digits. You’ll also learn how to test that your network is able to generalize through **validation**.

However, you’ll find that your network doesn’t work too well with more complex images. You’ll learn how to use pre-trained networks to improve the performance of your classifier, a technique known as **transfer learning**.

PyTorch, a framework for building and training neural networks. PyTorch in a lot of ways behaves like the arrays you love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks.

### Neural Networks

Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply “neurons.” Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit’s output.

mathematically looks like

With vectors this is the dot/inner product of two vectors:

### Tensors

It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

import torch #importing pytorch library

def activation(x): #defining sigmoid activation func we knew part1

return 1/(1+torch.exp(-x))

### Generating some data

torch.manual_seed(7) # Set the random seed so things are predictable

features = torch.randn((1, 5))# 5Features of random normal variables

weights = torch.randn_like(features)# weights for our data

bias = torch.randn((1, 1))#bias term

after applying activation function to above one, we will get

y = activation(torch.sum(features * weights) + bias)

You can also do the multiplication and sum in the same operation using a matrix multiplication. but both `features`

and `weights`

have the same shape, `(1, 5)`

. This means we need to change the shape of `weights`

to get the matrix multiplication to work.

using .view( ) method ,now we can reshape `weights`

to have five rows and one column with something like `weights.view(5, 1).`

y = activation(torch.mm(features, weights.view(5,1)) + bias)

That’s how you can calculate the output for a single neuron. The real power of this algorithm happens when you start stacking these individual units into layers and stacks of layers, into a network of neurons.

torch.manual_seed(7) # Setting the random seed

# Features are 3 random normal variables

features = torch.randn((1, 3))

# Define the size of each layer in our network

n_input = features.shape[1] # input units

n_hidden = 2 # hidden units

n_output = 1 # units

W1 = torch.randn(n_input, n_hidden) # Weights from inputs to hidden

W2 = torch.randn(n_hidden, n_output) # Weights from hidden to output

B1 = torch.randn((1, n_hidden))#bias for hidden

B2 = torch.randn((1, n_output))#bias for output

after applying activation function to this multilayer network, it is like

h = activation(torch.mm(features, W1) + B1)

output = activation(torch.mm(h, W2) + B2)

print(output)

i got output `tensor([[ 0.3171]])`

.

### Neural networks with PyTorch

Deep learning networks tend to be massive with dozens or hundreds of layers, that’s where the term “deep” comes from. PyTorch has a nice module `nn`

that provides a nice way to efficiently build large neural networks.

# Importing necessary packages

%matplotlib inline

%config InlineBackend.figure_format = 'retina'

import numpy as np

import torch

import helper

import matplotlib.pyplot as plt

Now we’re going to build a larger network that can solve a (formerly) difficult problem, identifying text in an image. Here we’ll use the MNIST dataset which consists of greyscale handwritten digits. Each image is 28×28 pixels, you can see a sample below.

Our goal is to build a neural network that can take one of these images and predict the digit in the image.

get our data through the `torchvision`

package.

from torchvision import datasets, transforms

# Defining a transform to normalize the data

transform = transforms.Compose([transforms.ToTensor(),

transforms.Normalize((0.5,), (0.5,)),

])

# Download and load the training data

trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

We have the training data loaded into `trainloader`

and we make that an iterator with `iter(trainloader)`

. Later, we’ll use this to loop through the dataset for training, like

**for** image, label **in** trainloader:

dataiter = iter(trainloader)#iterator

images, labels = dataiter.next()

print(type(images))

print(images.shape)

print(labels.shape)

*output*: <class ‘torch.Tensor’>

torch.Size([64, 1, 28, 28])

torch.Size([64])

So, 64 images per batch, 1 color channel, and 28×28 images. and this is how one image looks like

plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r');

First, let’s try to build a simple network for this dataset using weight matrices and matrix multiplications. Then, we’ll see how to do it using PyTorch’s `nn`

module.

our images are 28×28 2D tensors, so we need to convert them into 1D vectors. Thinking about sizes, we need to convert the batch of images with shape `(64, 1, 28, 28)`

to a have a shape of `(64, 784)`

, 784 is 28 times 28. This is typically called *flattening*, we flattened the 2D images into 1D vectors.

def activation(x):

return 1/(1+torch.exp(-x))

# Flatten the input images

inputs = images.view(images.shape[0], -1)

# Create parameters

w1 = torch.randn(784, 256)

b1 = torch.randn(256)

w2 = torch.randn(256, 10)

b2 = torch.randn(10)

h = activation(torch.mm(inputs, w1) + b1)

out = torch.mm(h, w2) + b2

Now we have 10 outputs for our network. We want to pass in an image to our network and get out a probability distribution over the classes that tells us the likely class(es) the image belongs to.

it looks like

Here we see that the probability for each class is roughly the same. This is representing an untrained network, it hasn’t seen any data yet so it just returns a uniform distribution with equal probabilities for each class.

To calculate this probability distribution, we often use the **softmax** function to squish each input 𝑥𝑖xi between 0 and 1.

### Building networks with PyTorch

PyTorch provides a module `nn`

that makes building networks much simpler.

from torch import nn

class Network(nn.Module):

def __init__(self):

super().__init__()

# Inputs to hidden layer linear transformation

self.hidden = nn.Linear(784, 256)

# Output layer, 10 units - one for each digit

self.output = nn.Linear(256, 10)

self.sigmoid = nn.Sigmoid() # sigmoid activation

self.softmax = nn.Softmax(dim=1) # softmax output

def forward(self, x):

# Pass the input tensor through each of our operations

x = self.hidden(x)

x = self.sigmoid(x)

x = self.output(x)

x = self.softmax(x)

return x

# Create the network and look at it's text representation

model = Network()

model

You can define the network somewhat more concisely and clearly using the `torch.nn.functional`

module.

import torch.nn.functional as F

class Network(nn.Module):

def __init__(self):

super().__init__()

# Inputs to hidden layer linear transformation

self.hidden = nn.Linear(784, 256)

# Output layer, 10 units - one for each digit

self.output = nn.Linear(256, 10)

def forward(self, x):

# Hidden layer with sigmoid activation

x = F.sigmoid(self.hidden(x))

# Output layer with softmax activation

x = F.softmax(self.output(x), dim=1)

return x

### Activation functions

So far we’ve only been looking at the softmax activation, but in general any function can be used as an activation function. The only requirement is that for a network to approximate a non-linear function, the activation functions must be non-linear. Here are a few more examples of common activation functions: Tanh (hyperbolic tangent), and ReLU (rectified linear unit).

In practice, the ReLU function is used mostly.

#### On your own:

Now lets Create a network with 784 input units, a hidden layer with 128 units and a ReLU activation, then a hidden layer with 64 units and a ReLU activation, and finally an output layer with a softmax activation as shown above. You can use a ReLU activation with the `nn.ReLU`

module or `F.relu`

function.

class Network(nn.Module):

def __init__(self):

super().__init__()

# Defining the layers, 128, 64, 10 units each

self.fc1 = nn.Linear(784, 128)

self.fc2 = nn.Linear(128, 64)

# Output layer, 10 units - one for each digit

self.fc3 = nn.Linear(64, 10)

def forward(self, x):

x = self.fc1(x)

x = F.relu(x)

x = self.fc2(x)

x = F.relu(x)

x = self.fc3(x)

x = F.softmax(x, dim=1)

return x

model = Network()

model

The weights and biases are tensors attached to the layer you defined, you can get them with `model.fc1.weight`

for instance.

print(model.fc1.weight)

print(model.fc1.bias)

it gives:

Parameter containing:

tensor([[-2.3278e-02, -1.2170e-03, -1.1882e-02, ..., 3.3567e-02,

4.4827e-03, 1.4840e-02],

[ 4.8464e-03, 1.9844e-02, 3.9791e-03, ..., -2.6048e-02,

-3.5558e-02, -2.2386e-02],

[-1.9664e-02, 8.1722e-03, 2.6729e-02, ..., -1.5122e-02,

2.7632e-02, -1.9567e-02],

...,

[-3.3571e-02, -2.9686e-02, -2.1387e-02, ..., 3.0770e-02,

1.0800e-02, -6.5941e-03],

[ 2.9749e-02, 1.2849e-02, 2.7320e-02, ..., -1.9899e-02,

2.7131e-02, 2.2082e-02],

[ 1.3992e-02, -2.1520e-02, 3.1907e-02, ..., 2.2435e-02,

1.1370e-02, 2.1568e-02]])

Parameter containing:

tensor(1.00000e-02 *

[-1.3222, 2.4094, -2.1571, 3.2237, 2.5302, -1.1515, 2.6382,

-2.3426, -3.5689, -1.0724, -2.8842, -2.9667, -0.5022, 1.1381,

1.2849, 3.0731, -2.0207, -2.3282, 0.3168, -2.8098, -1.0740,

-1.8273, 1.8692, 2.9404, 0.1783, 0.9391, -0.7085, -1.2522,

-2.7769, 0.0916, -1.4283, -0.3267, -1.6876, -1.8580, -2.8724,

-3.5512, 3.2155, 1.5532, 0.8836, -1.2911, 1.5735, -3.0478,

-1.3089, -2.2117, 1.5162, -0.8055, -1.3307, -2.4267, -1.2665,

0.8666, -2.2325, -0.4797, -0.5448, -0.6612, -0.6022, 2.6399,

1.4673, -1.5417, -2.9492, -2.7507, 0.6157, -0.0681, -0.8171,

-0.3554, -0.8225, 3.3906, 3.3509, -1.4484, 3.5124, -2.6519,

0.9721, -2.5068, -3.4962, 3.4743, 1.1525, -2.7555, -3.1673,

2.2906, 2.5914, 1.5992, -1.2859, -0.5682, 2.1488, -2.0631,

2.6281, -2.4639, 2.2622, 2.3632, -0.1979, 0.7160, 1.7594,

0.0761, -2.8886, -3.5467, 2.7691, 0.8280, -2.2398, -1.4602,

-1.3475, -1.4738, 0.6338, 3.2811, -3.0628, 2.7044, 1.2775,

2.8856, -3.3938, 2.7056, 0.5826, -0.6286, 1.2381, 0.7316,

-2.4725, -1.2958, -3.1543, -0.8584, 0.5517, 2.8176, 0.0947,

-1.6849, -1.4968, 3.1039, 1.7680, 1.1803, -1.4402, 2.5710,

-3.3057, 1.9027])

These are actually autograd *Variables*, so we need to get back the actual tensors with `model.fc1.weight.data`

. Once we have the tensors, we can fill them with zeros (for biases) or random normal values.

# Set biases to all zeros

model.fc1.bias.data.fill_(0)

# sample from random normal with standard dev = 0.01

model.fc1.weight.data.normal_(std=0.01)

### Forward pass

Now that we have a network, let’s see what happens when we pass in an image.

# Grab some data

dataiter = iter(trainloader)

images, labels = dataiter.next()

# Resize images into a 1D vector, new shape is (batch size, color channels, image pixels)

images.resize_(64, 1, 784)

# or images.resize_(images.shape[0], 1, 784) to automatically get batch size

# Forward pass through the network

img_idx = 0

ps = model.forward(images[img_idx,:])

img = images[img_idx]

helper.view_classify(img.view(1, 28, 28), ps)

As you can see above, our network has basically no idea what this digit is. It’s because we haven’t trained it yet, all the weights are random!

### Training Neural Networks

The network we built in the previous part isn’t so smart, it doesn’t know anything about our handwritten digits. Neural networks with non-linear activations work like universal function approximators. There is some function that maps your input to the output. For example, images of handwritten digits to class probabilities. The power of neural networks is that we can train them to approximate this function, and basically any function given enough data and compute time.

We train the network by showing it examples of real data, then adjusting the network parameters such that it approximates this function.To find these parameters, we need to know how poorly the network is predicting the real outputs. For this we calculate a **loss function** (also called the cost), a measure of our prediction error. For example, the mean squared loss is often used in regression and binary classification problems.

### Backpropagation

For single layer networks, gradient descent is straightforward to implement. However, it’s more complicated for deeper, multilayer neural networks like the one we’ve built.

athematically, this is really just calculating the gradient of the loss with respect to the weights using the chain rule.

### Losses in PyTorch

Let’s start by seeing how we calculate the loss with PyTorch. Through the `nn`

module, PyTorch provides losses such as the cross-entropy loss (`nn.CrossEntropyLoss`

). You’ll usually see the loss assigned to `criterion`

. As noted in the last part, with a classification problem such as MNIST, we’re using the softmax function to predict class probabilities. With a softmax output, you want to use cross-entropy as the loss. To actually calculate the loss, you first define the criterion then pass in the output of your network and the correct labels.

Something really important to note here. Looking at the documentation for `nn.CrossEntropyLoss`

,

This criterion combines

nn.LogSoftmax()and

nn.NLLLoss()in one single class.

The input is expected to contain scores for each class.

This means we need to pass in the raw output of our network into the loss, not the output of the softmax function.

import torch

from torch import nn

import torch.nn.functional as F

from torchvision import datasets, transforms

# Define a transform to normalize the data

transform = transforms.Compose([transforms.ToTensor(),

transforms.Normalize((0.5, 0.5,0.5), (0.5, 0.5, 0.5)),])

# Download and load the training data

trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Build a feed-forward network

model = nn.Sequential(nn.Linear(784, 128),

nn.ReLU(),

nn.Linear(128, 64),

nn.ReLU(),

nn.Linear(64, 10))

# Define the loss

criterion = nn.CrossEntropyLoss()

# Get our data

images, labels = next(iter(trainloader))

# Flatten images

images = images.view(images.shape[0], -1)

# Forward pass, get our logits

logits = model(images)

# Calculate the loss with the logits and the labels

loss = criterion(logits, labels)

print(loss)

*output*: tensor(2.2810)

### Autograd

Now that we know how to calculate a loss, how do we use it to perform backpropagation? Torch provides a module, `autograd`

, for automatically calculating the gradients of tensors.

x = torch.randn(2,2, requires_grad=True)

print(x)

*output*: tensor([[ 0.7652, -1.4550], [-1.2232, 0.1810]])

y = x**2

print(y)

*output*:tensor([[ 0.5856, 2.1170], [ 1.4962, 0.0328]])

### Training the network!

There’s one last piece we need to start training, an optimizer that we’ll use to update the weights with the gradients. We get these from PyTorch’s `optim`

package. For example we can use stochastic gradient descent with `optim.SGD`

.

from torch import optim

# Optimizers require the parameters to optimize and a learning rate

optimizer = optim.SGD(model.parameters(), lr=0.01)

print('Initial weights - ', model[0].weight)

images, labels = next(iter(trainloader))

images.resize_(64, 784)

# Clear the gradients, do this because gradients are accumulated

optimizer.zero_grad()

# Forward pass, then backward pass, then update weights

output = model(images)

loss = criterion(output, labels)

loss.backward()

print('Gradient -', model[0].weight.grad)

*output*:

Initial weights - Parameter containing:

tensor([[ 3.5691e-02, 2.1438e-02, 2.2862e-02, ..., -1.3882e-02,

-2.3719e-02, -4.6573e-03],

[-3.2397e-03, 3.5117e-03, -1.5220e-03, ..., 1.4400e-02,

2.8463e-03, 2.5381e-03],

[ 5.6122e-03, 4.8693e-03, -3.4507e-02, ..., -2.8224e-02,

-1.2907e-02, -1.5818e-02],

...,

[-1.4372e-02, 2.3948e-02, 2.8374e-02, ..., -1.5817e-02,

3.2719e-02, 8.5537e-03],

[-1.1999e-02, 1.9462e-02, 1.3998e-02, ..., -2.0170e-03,

1.4254e-02, 2.2238e-02],

[ 3.9955e-04, 4.8263e-03, -2.1819e-02, ..., 1.2959e-02,

-4.4880e-03, 1.4609e-02]])

Gradient - tensor(1.00000e-02 *

[[-0.2609, -0.2609, -0.2609, ..., -0.2609, -0.2609, -0.2609],

[-0.0695, -0.0695, -0.0695, ..., -0.0695, -0.0695, -0.0695],

[ 0.0514, 0.0514, 0.0514, ..., 0.0514, 0.0514, 0.0514],

...,

[ 0.0967, 0.0967, 0.0967, ..., 0.0967, 0.0967, 0.0967],

[-0.1878, -0.1878, -0.1878, ..., -0.1878, -0.1878, -0.1878],

[ 0.0281, 0.0281, 0.0281, ..., 0.0281, 0.0281, 0.0281]])

### Training for real

Now we’ll put this algorithm into a loop so we can go through all the images. Some nomenclature, one pass through the entire dataset is called an *epoch*. So here we’re going to loop through `trainloader`

to get our training batches. For each batch, we’ll doing a training pass where we calculate the loss, do a backwards pass, and update the weights.

model = nn.Sequential(nn.Linear(784, 128),

nn.ReLU(),

nn.Linear(128, 64),

nn.ReLU(),

nn.Linear(64, 10),

nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

optimizer = optim.SGD(model.parameters(), lr=0.003)

epochs = 5

for e in range(epochs):

running_loss = 0

for images, labels in trainloader:

# Flatten MNIST images into a 784 long vector

images = images.view(images.shape[0], -1)

# TODO: Training pass

optimizer.zero_grad()

output = model(images)

loss = criterion(output, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

else:

print(f"Training loss: {running_loss/len(trainloader)}")

*output*:

Training loss: 1.8959971736234897

Training loss: 0.8684300759644397

Training loss: 0.537974218426864

Training loss: 0.43723612014990626

Training loss: 0.39094475933165945

now check out it’s predictions.

%matplotlib inline

import helper

images, labels = next(iter(trainloader))

img = images[0].view(1, 784)

# Turn off gradients to speed up this part

with torch.no_grad():

logps = model(img)

# Output of the network are log-probabilities, need to take exponential for probabilities

ps = torch.exp(logps)

helper.view_classify(img.view(1, 28, 28), ps)

Even i am the beginner in this and your likes motivate me to write the next parts. first time i am feeling that learning is too tough, especially this deeplearning. so please hit a like and share it to your friends.

#### References:

references mentioned in part -1 **here**.