Briefly Overfitting and Underfitting Concepts in Neural Networks

Source: Deep Learning on Medium

Go to the profile of Amir Khan

Underfitting occurs when the model is not learned well and not powerful enough to get correct predictions on unseen data. in terms of errors we say high bias and low variance. If you train for too long though, the model will start to overfit.

Overfitting Occurs when you trained your model with good accuracy like say 90% training accuracy, and when you use your trained model on unseen data during testing the unseen validation data you got like say 70% validation accuracy. We can say in terms of errors its low bias and high variance issue.

During Overfitting, model doesn’t genaralize well on unseen data. If you train your model for long time, the model will start tend to overfit and learns the information which is not suitable to generalize well on unseen data.

Overfitting will occur in every training of neural networks, so there is must needed skills to prevent overfitting, so that our model perform well on unseen data.

To prevent overfitting, the best solution is to use more training data. Our Model will trained on more data will be naturally able to genaralize well on unseen data. But it’s not feasible according to me because it will take lot of time to gather more data and preprocessing it.

The next best solution is to use technique of regularization. there is two common regularization technique i.e weight regularization and dropout. It’s the common way to prevent overfitting is to penalize the complexity of network by forcing the weights of neurons to take small values which make the distribution of weights to be in regular order. This is called “weight regularization”, and it is done by adding to the loss function of the network a cost associated with having large weights.

Dropout is the most effective and commonly used regularization techniques for neural networks, it is applied to layer containing neurons associated with it’s weight during training, randomly dropping out the neurons will stabalize the neural network to prevent weights to become zero which is likely to overfit the model.

There is a famous principle you might have heard of Occam’s Razor principle: given two explanations for something, the explanation most likely to be correct is the “simplest” one, the one that makes the least amount of assumptions. This also applies to the models learned by neural networks: given some training data and a network architecture, there are multiple sets of weights values (multiple models) that could explain the data, and simpler models are less likely to overfit than complex ones.

So the simplest way to reduce overfitting is to reduce the size of model, like reducing the number of layers and number of parameters of neural network. the distribution of parameter values has less chance to overfit.

To find an appropriate model size, it’s best to start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until you see diminishing returns on the validation loss.

Data Augumentation is the best way to utilize image data and make model trained better during training, giving image data to model from every angle to train will prevent overfitting and learn in better way. Data Augumentation includes Flipping the image data horizontal or vertical, rotate the image in every suitable angle.

Early Stopping is also we use to prevent overfitting, we generally stop training model further when we see after lets say 10 epoch or 20 epoch validation didn’t improved.

Batch Normalization is also we use generally in deep neural network to prevent overfitting is to normalize the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one. every batches contains different data values so it’s must to normalizing the output of one layer before applying the activation function, and then feed it into the following layer (sub-network).

here the most common ways to prevent overfitting in neural networks:

  • Get more training data
  • Reduce the capacity of the network
  • Add weight regularization
  • Add dropout
  • Data Augumentation
  • Batch Nomalization
  • Early Stopping