Pytorch layer dimensions: What sizes should they be and why?

Source: Deep Learning on Medium

Lesson 3: Fully connected (torch.nn.Linear) layers

Documentation for Linear layers.

"""
Class

torch.nn.Linear(in_features, out_features, bias=True)
Parameters
in_features – size of each input sample
out_features – size of each output sample
"""

Do not be confused: “in_features” and “in_channels” are completely different. I know these look similar, but they’re completely different:

# Asks for in_channels, out_channels, kernel_size, etc
self.conv1 = nn.Conv2d(1, 20, 3)
# Asks for in_features, out_features
self.fc1 = nn.Linear(2048, 10)

Calculate the dimensions.

There are two, specific, important arguments for all nn.Linear layer networks no matter how many layers you have. The very first argument, and the very last argument. It doesn’t matter how many fully connected layers you have in between, those dimensions are easy, as you’ll soon see.

If you want to pass in your 28 x 28 image into a linear layer, you have to know two things:

  1. Your 28 x 28 pixel image can’t be input as a [28, 28] tensor. This is because nn.Linear will read it as 28 batches of 28-feature-length vectors. since it expects an input of [batch_size, num_features]. You have to transpose it (see .view() below).
  2. Your batch size passes unchanged through all your layers. No matter how your data changes as it passes through a network, your first dimension will end up being your batch_size even if you never see that number explicitly written throughout an entire network.

Use .view() to change your tensor’s dimensions.

image = image.view(batch_size, -1)

You supply your batch_size as the first number, and then “-1” basically tells Pytorch, “you figure this number out for me… please.” Your tensor will now feed properly into any linear layer. Now we’re talking!

So then, to initialize the very first argument of your linear layer, pass in the number of features. For the 28 x 28 example, our new view tensor is of size [1, 784]:

batch_size = 1# Simulate a 28 x 28 pixel "image"
input = torch.randn(28, 28)
# Use view to get [batch_size, num_features].
input = input.view(batch_size, -1) # torch.Size([1, 784])
# Intialize the linear layer.
fc = torch.nn.Linear(784, 10)
# Pass in the simulated image to the layer.
output = fc(input)
print(output.shape)
>>> torch.Size([1, 10])

The second argument of a linear layer, if you’re passing it on to more layers, is called H for hidden layer. You just kind of play positional ping-pong with H and make it the last of the previous and the first of the next, like this:

"""The in-between dimensions are the hidden layer dimensions, you just pass in the last of the previous as the first of the next."""fc1 = torch.nn.Linear(in_feature, 100) # 100 is last
fc2 = torch.nn.Linear(100, 50) # 100 is first. 50 is last
fc3 = torch.nn.Linear(50, 20) # 50 is first. 20 is last
fc4 = torch.nn.Linear(20, 10) # 20 is first.
"""This is the same pattern for convolutional layers as well, only it's channels, and not features that get passed along"""

The very last output, aka your output layer depends on your model and your loss function. If you have 10 classes like in MNIST, and you’re doing a classification problem, you want all of your network architecture to evntually consolidate into those final 10 units so that you can determine which of those 10 your input is predicting. The last layer is dependent on what you want to infer from your data. The operations you can do to get the answer you need is a topic for another article, because there is a lot to cover. But now you have the basics covered.