Self-driving cars: How close are we from full autonomy?

Original article was published on Artificial Intelligence on Medium


The neural network driving Tesla cars

Over the past few years, the autopilot software has improved gradually. Some exciting news is that Tesla has been working on a major update to the neural network structure. To my understanding, this new version is now being deployed with the traffic lights and stop signs update. As a result, the rate of progress will likely be faster over the next months as Tesla AI team further explore the potentiality of the system.

I highly recommend the video below of a presentation by Andrej Karpathy, the director of AI at Tesla, about the state of AI for Full Self Driving. I will now describe some of the main highlights.

Andrej Karpathy — AI for Full-Self Driving.

The new neural network receives input from all sensors and combines the data in the neural network feature space (Fusion layer in the image below), creating a real-time 3-dimensional representation of the surrounding environment. This representation is then given to a Bird’s Eye View network (BEV Net) from which an extensive set of tasks need to be predicted.

Tesla autopilot neural network. Image from Andrej Karpathy presentation at ScaledML 2020 conference available on YouTube. (click YouTube link to go to the exact video position)

Another very interesting topic covered in the presentation is how they can deal with the edge cases. The image below shows a good example — the case of stop signs. One would think that stop signs are very easy to capture and learn by the neural network. But what if they are partially occluded? Or what if they have a modifier as in the example below where the plate under the stop says “except right turns”. Autonomous vehicles are expected to be able to work in all those scenarios.

Examples of the real world long-tail distribution of scenarios. Image from Andrej Karpathy presentation at ScaledML 2020 conference available on YouTube. (click YouTube link to go to the exact video position)

The process of training the huge neural network to perform well on edge cases like the examples above consists of running a small network in “shadow mode” that retrieves similar samples from the Tesla fleet. The samples obtained are then used to improve the training dataset. Then the big neural net can be retrained to achieve better accuracy. For the case of stop signs, the label “stop sign” needs to have modifiers to cover the edge cases.

The validation for such a complex multi-task problem is also a very important topic that is covered in the presentation.

On a recent keynote at CPRV 2020 conference, Karpathy made a similar presentation adding a few interesting examples. Starting with a good example of edge cases. I guess no comment is needed for this one other than these are real images sampled from the Tesla fleet.

Extreme examples of the real world long-tail distribution of scenarios. Image from Andrej Karpathy presentation at CPRV 2020 conference available on YouTube. (click YouTube link to go to the exact video position)

Another crazy example is the following image. Can you handle such a roundabout? Regarding this example, Karpathy makes an interesting point that they don’t need to handle every possible case. If the cases that can’t be handled are known, one possible option is to follow a different trajectory that avoids a specific situation. As a human, I would definitely avoid the roundabout in the image below.

Extremely difficult scenarios. Image from Andrej Karpathy presentation at CPRV 2020 conference available on YouTube. (click YouTube link to go to the exact video position)

I believe the image above is also a hint that they are now working hard on solving roundabouts and intersections. That’s the logical step to follow after traffic lights and stop signs and an important step towards feature-complete autopilot.