(Only from my knowledge feel free to correct me for any point, references from Analytics Vidhya, other medium sts and some research papers.)
- Overfit! The first thing to do if your network isn’t learning is to overfit a training point. Accuracy should be essentially 100% or 99.99%, or an error as close to 0. If your neural network can’t overfit a single data point, something is seriously wrong with the architecture, but it may be subtle. If you can overfit one data point but training on a larger set still does not converge, try the following suggestions.
- Lower your learning rate. Your network will learn slower, but it may find its way into a minimum that it couldn’t get into before because its step size was too big. It will prevent overshooting.
- Decrease batch size. Reducing a batch size to 1( stoichastic) can give you more granular feedback related to the weight updates, which you should report with TensorBoard (or some other debugging/visualization tool).
- Increase batch size. A larger batch size — heck, the whole training set if you could — reduces variance in gradient updates, making each iteration more accurate. In other words, weight updates will be in the right direction. But! There’s an effective upper bound on its usefulness, as well as physical memory limits.
- Check your reshaping. Drastic reshaping (like changing an image’s X,Y dimensions) can destroy spatial locality, making it harder for a network to learn since it must also learn the reshape. (Natural features become fragmented. The fact that natural features appear spatially local is why conv nets are so effective!) Be especially careful if reshaping with multiple images/channels; use
numpy.stack()for proper alignment.
Debugging is all about hit and trial, I will keep on adding other ways I find dirung my learning process. Please feel free to provide any suggestions.
Source: Deep Learning on Medium