So I plot losses against time and watch how it changes

Source: Deep Learning on Medium

So I plot losses against time and watch how it changes

If you’ve ever trained a deep-learning model using Adam optimizer and stared at the loss while it decreases over time you might wonder what’s actually going on. Here’s a timeless article by Distill giving some insight.

Or if you prefer visual, here’s 2 minute video I made:

Here’s some explanation of the video: in a reinforcement learning setting I was training a policy inside a simulated environment to minimize loss at each time step. I proposed 6 different losses for the policy and fixed the initial state to see how losses plot against time step changes during an episode. Each time I updated the policy I made a new plot. This video was made up of 1500 plots.

Turns out the optimal policy happens at around 1:05 / 2:00 of the video.

If you did read the Distill article or just know what’s going on you might notice those tale-tell effects when gradient descending with momentum.

Still, I prefer staring at the floating point averaged loss as it dips lower and lower…