Three ways to build a Neural Network in Pytorch

Source: Deep Learning on Medium

Three Ways to build a Neural Network in PyTorch

TLDR;

This isn’t meant to be a tutorial on pytorch nor an article that explains how a neural network works. Instead, I thought it would be a good idea to share some of the stuff I’ve learned in the Udacity Bertelsmann Scholarship, AI Program. Having said this, the goal of this article is to illustrate a few different ways that one can create a neural network in PyTorch.

Prerequisites: I assume you know what a neural network is and how they work…so let’s dive in!

Introduction

Let’s say you want to define the following neural network, with one input, two hidden and one output layer with relu activations in the intermediate layers and a sigmoid activation function for the output layer, like so:

Fully Connected (Feed Forward) Network

So this is a Fully Connected 16x12x10x1 Neural Network witn relu activations in hidden layers, sigmoid activation in output layer.

1. Manually building weights and biases

One way to approach this is by building all the blocks. This is a low level approach, but it may be suited if you’re trying to reproduce the latest and greatest deep learning architecture on a paper you just just read. Or maybe if you want to develop a customized layer. Either way, PyTorch has you covered. You’ll need to define your weights and biases, but if you’re comfortable at that level, you’re good to go. The key thing here is that you will need to tell PyTorch what is variable or optimizable in your network, so that PyTorch knows how to perform gradient descent on your network. Let’s look at how someone might approach this in low level PyTorch:

Building a neural network in low level PyTorch

2. Extending the torch.nn.Model Class

In practice, most of us will likely use predefined layers and activation functions to train our networks. There are a couple of routes to go if you’re headed in this direction. A more elegant approach involves creating your own neural network python class, by extending the Model class from torch.nn. There are many advantages of defining a neural network this way and perhaps most notably, it allows one to inherit all of the functionality of the torch.nn module while allowing the flexibility of overwriting the default model construction and forward pass method. In this approach, we will define two methods:

1. The class constructor, __init__

2. The forward method

The first is the initializer of the class and is where you’ll define the layers that will compose the network. Typically we don’t need to define the activation functions here since they can be defined in the forward pass (i.e. in the forward method), but it’s not a rule and you can certainly do that if you want to (we’ll actually see an example at the end).

The second method is where you define the forward pass. This method takes an input that represents the features the model will be trained on. Here, you can call the activation functions and pass in as parameters the layers you’ve previously defined in the constructor method. You’ll need to pass the input as an argument to the first layer and after processing the activations, that output can be fed into the next layer and so on.

Let’s take a look at how we could do this in practice:

A more elegant approach to define a neural net in pytorch.

And this is the output from above..

MyNetwork(
(fc1): Linear(in_features=16, out_features=12, bias=True)
(fc2): Linear(in_features=12, out_features=10, bias=True)
(fc3): Linear(in_features=10, out_features=1, bias=True)
)

In the example above, fc stands for fully connected layer, so fc1 is represents fully connected layer 1, fc2 is the fully connected layer 2 and etc. Notice that when we print the model architecture the activation functions do not appear. The reason is we’ve used the activation functions from the torch.nn.functional module. It makes the code more compact and is suited for this kind of approach.

3. Using torch.nn.Sequential

There is still a more compact way to define neural networks in pytorch. This is a modular approach, made possible by the torch.nn.Sequential module and is especially appealing if you come from a Keras background, where you can define sequential layers, kind of like building something from lego blocks. This is a very similar approach to Keras’s sequential API and leverages the torch.nn pre-built layers and activation functions. Using this approach, our feed-forward network can be defined a follows:

A more compact approach.

Output:

Sequential(
(0): Linear(in_features=16, out_features=12, bias=True)
(1): ReLU()
(2): Linear(in_features=12, out_features=10, bias=True)
(3): ReLU()
(4): Linear(in_features=10, out_features=1, bias=True)
(5): Sigmoid()
)

Notice that the layers are indexed and include the activation functions. We can in fact inspect a single layer and debug the model weights by simply indexing the model object.

Output:

Linear(in_features=16, out_features=12, bias=True) Parameter containing:
tensor([[-0.1890, -0.2464, 0.1255, 0.1371, -0.0860, 0.1147, 0.2162, 0.2164,
-0.0341, -0.2427, 0.1252, -0.1267, 0.0055, -0.2351, 0.1516, 0.0091],
[ 0.0167, 0.0783, 0.1194, 0.1763, 0.1607, 0.0041, -0.1883, -0.1407,
0.0213, 0.0674, 0.0864, 0.1367, 0.1203, -0.1143, -0.2471, -0.0009],
[-0.0500, -0.1950, -0.2270, 0.0407, 0.2090, 0.1739, 0.0055, -0.0888,
-0.1226, -0.1617, 0.1088, -0.0641, 0.0952, 0.0055, 0.1121, -0.1133],
[-0.2138, 0.1044, -0.1764, 0.1689, -0.1032, -0.0728, -0.1849, 0.1771,
0.0622, -0.0881, 0.1024, -0.0872, 0.0363, -0.2183, -0.2392, -0.0807],
[ 0.0876, -0.2130, 0.2191, -0.0753, -0.0198, 0.0565, 0.1932, -0.1412,
-0.1640, 0.0318, -0.1846, 0.0020, -0.1138, 0.2188, 0.1850, -0.2329],
[ 0.1501, 0.1809, 0.0378, -0.1194, -0.0991, -0.0848, -0.0085, 0.0384,
-0.1353, 0.0767, -0.2460, -0.1252, -0.0993, 0.1840, 0.0407, -0.1561],
[ 0.1464, -0.0153, -0.1369, 0.1616, -0.1700, -0.0877, 0.1000, 0.0953,
-0.0804, 0.1279, -0.1432, 0.1903, 0.1807, -0.0442, 0.0553, -0.0375],
[-0.1962, -0.1922, 0.1221, -0.0932, 0.0206, 0.0845, 0.1370, 0.1825,
0.1228, 0.1985, -0.2023, 0.1319, 0.0689, -0.1676, 0.0977, 0.2275],
[-0.1287, 0.2306, 0.1450, 0.2316, 0.0879, -0.0373, -0.2405, -0.0491,
-0.0185, -0.0385, 0.1891, -0.1952, -0.2433, -0.0572, 0.0555, -0.1912],
[-0.0958, 0.0692, -0.2458, -0.0730, 0.2082, 0.0005, -0.1477, 0.0229,
0.1032, 0.1871, 0.0302, 0.0664, -0.1704, 0.0197, 0.0262, -0.0398],
[-0.1210, -0.0301, -0.1284, -0.1590, -0.0594, 0.1115, 0.0256, 0.2206,
-0.2330, 0.1262, -0.0866, 0.2195, 0.1969, 0.0960, 0.0339, 0.0959],
[ 0.0263, 0.2152, -0.1841, -0.1301, -0.2202, -0.0430, 0.0739, 0.1239,
-0.1601, -0.1970, -0.1937, 0.0711, -0.0761, 0.1796, -0.1004, -0.0816]],
requires_grad=True)

This is interesting, but what if you have many different kinds of layers and activation functions? Calling them by an index may seem unfeasible in this case. Luckily you can name the layers using the same structure and passing as an argument an OrderedDict from the python collections module. That way, you get the best of both worlds. In other words, you keep the order of your layers and name them, allowing simpler and direct reference to the layers.

Output:

Sequential(
(fc1): Linear(in_features=16, out_features=12, bias=True)
(relu1): ReLU()
(fc2): Linear(in_features=12, out_features=10, bias=True)
(relu2): ReLU()
(fc3): Linear(in_features=10, out_features=1, bias=True)
(sigmoid): Sigmoid()
)

Output:

Linear(in_features=12, out_features=10, bias=True) Parameter containing:
tensor([[ 0.0291, -0.0783, 0.1684, 0.2493, 0.1118, -0.2016, 0.0117, -0.1275,
-0.0657, -0.2506, 0.1129, 0.1639],
[ 0.1274, 0.2261, 0.1084, -0.2451, -0.1085, 0.1292, 0.0767, -0.2743,
0.1701, 0.1537, -0.0986, 0.2247],
[ 0.0317, 0.1218, 0.1436, -0.1260, 0.1407, 0.0319, -0.1934, -0.0202,
-0.1683, 0.2342, 0.0805, 0.0563],
[-0.2444, 0.2688, -0.1769, 0.2193, 0.0854, 0.1284, 0.1424, -0.2334,
0.2324, 0.1197, 0.1164, -0.1184],
[ 0.0108, 0.2051, 0.2150, 0.0853, 0.1356, 0.1136, -0.1111, 0.1389,
-0.0776, -0.0214, 0.0702, 0.1271],
[-0.0836, -0.1412, -0.0150, -0.1620, -0.0864, 0.1154, 0.0319, -0.1177,
0.1480, 0.0097, -0.2481, -0.1497],
[ 0.0131, 0.0566, 0.1700, -0.1530, -0.1209, -0.0394, 0.0070, -0.0984,
-0.0756, -0.2077, 0.1064, 0.2788],
[ 0.2825, -0.2362, 0.1566, -0.0829, -0.2318, -0.1871, 0.2284, -0.0793,
-0.2418, -0.0040, 0.2431, -0.2126],
[ 0.0218, 0.0583, 0.1573, 0.1060, 0.0322, -0.1109, 0.2153, 0.0909,
-0.0028, -0.1912, -0.0733, 0.0013],
[-0.2602, -0.2267, -0.1786, -0.2129, 0.1996, -0.2484, 0.1303, -0.0052,
-0.2715, 0.0128, -0.0752, -0.0428]], requires_grad=True)

Bonus: Mixed Approach

When creating neural networks in pytorch, you choose one approach over the other but there are times when you might prefer a mixed approach. PyTorch is very flexible in this sense and you can have for example a sequential approach inside of a class based approach like this:

Output:

MyNetwork2(
(layers): Sequential(
(0): Linear(in_features=16, out_features=12, bias=True)
(1): ReLU()
(2): Linear(in_features=12, out_features=10, bias=True)
(3): ReLU()
(4): Linear(in_features=10, out_features=1, bias=True)
)
)

Conclusions

Although the feed-forward neural network used as the example throughout this text is simple and may not truly depict the benefit of one approach over the other, the main idea here was to show that there are many different approaches to defining a neural network in pytorch and hopefully you could see how the pytorch library is powerful and at the same time very flexible.