Original article can be found here (source): Artificial Intelligence on Medium
Traditional normalization is a process of placing feature input values on the same set of equivalent scales. So for images input values that are fed forwarded through a network, the pixel values are normalized to have values within the range of 0 to 1.
The technique batch normalization was presented in 2015 by Christian Szegedy and Sergey Ioffe in this published paper. Batch normalization was performed as a solution to speed up the training phase of deep neural networks through the introduction of internal normalization of the inputs values within the neural network layer.
Batch normalization can be applied to a neural network layer’s input before or after the activation function. In either case, the output of a layer is normalized.
The batch normalization process occurs in two stages, standardization and normalization. Here are the steps of operations that occur to the input values:
- Zero centers the input values by subtracting the mean and dividing by the amount of standard deviation. This provides the current batch of inputs in operation with a mean of 0 and a standard deviation of 1
- Scale the input values
- Offset the input values
Through batch normalization, the process of normalization data is no longer restricted to the input layer of the network and is now an integral internal component of the neural network. It’s parameters values that represent the offset and the scale within batch normalization operation are also learnable during training.
Through the learnable batch normalization parameters, each layer input values are optimally normalized.
It’s called ‘batch’ normalization as the operation performed is based on each batch of input values that are fed through the network.