Source: Deep Learning on Medium
So I plot losses against time and watch how it changes
If you’ve ever trained a deep-learning model using Adam optimizer and stared at the loss while it decreases over time you might wonder what’s actually going on. Here’s a timeless article by Distill giving some insight.
Or if you prefer visual, here’s 2 minute video I made:
Here’s some explanation of the video: in a reinforcement learning setting I was training a policy inside a simulated environment to minimize loss at each time step. I proposed 6 different losses for the policy and fixed the initial state to see how losses plot against time step changes during an episode. Each time I updated the policy I made a new plot. This video was made up of 1500 plots.
Turns out the optimal policy happens at around 1:05 / 2:00 of the video.
If you did read the Distill article or just know what’s going on you might notice those tale-tell effects when gradient descending with momentum.
Still, I prefer staring at the floating point averaged loss as it dips lower and lower…