NeurIPS 2019 Outstanding New Directions Paper Award: Uniform convergence may be unable to explain…

Source: Deep Learning on Medium

I have come across a very interesting paper on the generalization in deep learning: Uniform convergence may be unable to explain generalization in deep learning.

This paper is the Outstanding New Directions Paper Award for NeurIPS 2019. What so interesting about this paper?

Vaishnavh Nagarajan et al. have actually observed uniform convergence provably cannot “explain generalization”!

Full paper here:

Vaishnavh Nagarajan, J. Zico Kolter

Carnegie Mellon University, Bosch Center for Artificial Intelligence

We cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. While it is well-known that many existing bounds are numerically large, through a variety of experiments, we first bring to light another crucial and more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by stochastic gradient descent (SGD) where uniform convergence provably cannot `explain generalization,’ even if we take into account implicit regularization {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by SGD that have test errors less than some small ϵ, applying (two-sided) uniform convergence on this set of classifiers yields a generalization guarantee that is larger than 1−ϵ and is therefore nearly vacuous.

Highlight comments from NeurIPS:
The paper presents what are essentially negative results showing that many existing (norm based) bounds on the performance of deep learning algorithms don’t do what they claim. They go on to argue that they can’t do what they claim when they continue to lean on the machinery of two-sided uniform convergence. While the paper does not solve (nor pretend to solve) the question of generalisation in deep neural nets, it is an “instance of the fingerpost’’ (to use Francis Bacon’s phrase) pointing the community to look in a different place.