Autonomous Driving Research Papers: Main Series P2 — Waymo’s Multipath

Source: Deep Learning on Medium

The Proposed Solution: Using Anchors for Hierarchical Prediction

Anchors: Coarse trajectory predictions that are pre-computed using unsupervised methods such as clustering or uniform sampling. Use these to reduce a lot of the learning effort from the model, eliminate the model collapse issue (since now the ‘diverse’ trajectories are precomputed), and introduce hierarchy to the prediction process.

Hierarchy here is achieved by the model first reasoning about the intent (like U-turn, left turn etc.) by assigning likelihoods to the fixed number of pre-computed anchor trajectories. Say there are three anchor trajectories (corresponding to intents of LEFT turn, RIGHT turn, STRAIGHT), then one plausible likelihood assignment is 0.4, 0.1, 0.5 to these respectively.

Then, for each intent, it produces control uncertainty at each time step by outputting a mean value that corresponds to the offset from anchor state, and associated covariance that captures the aleatoric uncertainty on this offset.

A really bad picture depicting a crucial concept in the paper for a simplified case of 1-D motion. Let the blue circles denote the anchor waypoints, these are derived from observing multiple past trajectories. For the current agent, let’s say that the model has reason to believe that it is faster than average. Then it wants to produce waypoints that are right-offset to the anchors. And it is uncertain about the actual values of this offset, so it produces Gaussian distributions (modeling assumption of the paper) of the offsets. The mean of these is shown by the green circles, which can be said to be the MLE trajectory here. Also note that the variance of the predicted Gaussian distributions should intuitively increase for waypoints further in the future, as the model would be more unsure about them.

Note that this concept of anchors is similar to the one used by Fast RCNN, where first the model predicts likelihoods over anchors, and then continuos refinements on top like box corner location offsets.