Original article was published by /u/yasserius on Deep Learning
Batch sizes are supposed to be proportional to the GPU/TPU memory size.
Experts recommend that we keep them as powers of 2. So, 8, 16, 32, etc.
There are useful threads with partial answers,
Max batch size= available GPU memory bytes / 4 / (size of tensors + trainable parameters)
but I could not find a full calculation of how to get the perfect value for a given input image resolution.
Say, we have a colab TPU with RAM of 8 GB.
And we are training MNIST on it, with a ResNet50 model.
MNIST has 60,000 images of size 128×128.
Then how to estimate the optimum batch size?
Please show every step of the calculation.
Also, lets break it down into sub problems:
If we take the Keras ResNet50 model, how much memory does it occupy on the GPU/TPU? I guess this has to do with 16, 32 and 64 bit integer and float data types of the tensor.
What other things in the program take up considerable memory on the TPU?
Thanks in advance!