Highlights from NeurIPS 2019

Source: Deep Learning on Medium

Highlights from NeurIPS 2019

While the holiday season concludes and a brand-new year starts, we look back the incredible 2019 and to all the amazing things that happened and the ones we accomplished. Among others, last December we were back to NeurIPS, and we can’t wait to share all the cool things that we saw there. This year’s edition was held in the impressive Vancouver Convention Center, which hosted over ten thousand researchers and practitioners gathered for the annual appointment of the leading conference on machine learning research. The scale of the conference has become astonishing, with over six thousand submissions, of which almost 1500 accepted for presentation at the conference. Check the complete stats here:

Key takeaways

Several of Criteo’s researchers and engineers attended the conference in Vancouver to present our research work and learn about the trends in the field. The conference brought up several innovations in various areas in machine learning, including learning algorithms, deep learning, multitask and transfer learning, meta-learning and game theory.

One big theme in the conference was the comparison and interplay between machine learning models like deep neural networks and the biological brain. Many of the invited talks brought forward the argument that artificial neural networks should be able to learn more like their natural counterparts. Yoshua Bengio described it as an interaction between System 1 (Intuitive, fast, unconscious) and System 2 (Slow, logical, conscious) and how deep learning should be able to go from one to the other. Blaise Aguera y Arcas focused on meta-learning and evolution strategies as a viable path towards truly general intelligence. The keynote How to know by Celeste Kidd, presented how people come to know what they know. As the information we face is huge, she wonders how people navigate into this big amount of information, and how do their decisions shape their knowledge and future beliefs. Especially, the author showed evidence that people play an active role in their own learning process, from the stage of infancy to adult life. We learnt why we are curious about subjects, but not others and how our past experiences in life shape our future interests. In this talk, the author proposed 5 main human contributions/way of thinking, that an ML practitioner should know:

  • humans continuously form beliefs
  • certainty diminishes interest
  • certainty is driven by feedback
  • less feedback may encourage overconfidence
  • humans form beliefs quickly.

Another interesting addition to this year’s repertoire was the extremely successful workshop on climate change. In this workshop and throughout the conference there was a strong call for reducing the carbon footprint of our trained models, which highlights how the community is developing sensitivity towards this kind of issues that go beyond the traditional machine learning scope and applications.

5 moments at NeurIPS

Best papers

In terms of papers presented at the conference, the topics that caught the most attention this year were optimization, causality and reinforcement learning, three areas in which Criteo’s researchers are actively working on. Here is an overview of the two papers we think were among the most impactful.

1. Fast and Accurate Least-Mean-Squares Solvers

One of the best paper awards, Fast and Accurate Least-Mean-Squares Solvers by Alaa Maalouf, Ibrahim Jubran and Dan Feldman, is about designing an efficient Least-Mean-Squares (LMS) solver for the case when the number of samples is much higher than the number of dimensions. It combines few basic ideas about the structure of the covariance matrix and the Caratheodory’s theorem to obtain a partition of the feature space in a way that makes the LMS computation efficient. The empirical benchmark on the linear and ridge regression of scikit-learn (based on standard Cython LMS solvers) is very convincing. Simple, elegant, efficient.

2. Uniform convergence may be unable to explain generalization in deep learning

Another promising awarded paper was Uniform convergence may be unable to explain generalization in deep learning, by Vaishnavh Nagarajan and J. Zico Kolter. Here, the authors stress the non-consistency of some past error bounds, concerning deep learning error rates. They explain that uniform convergence-based generalization bounds cannot capture the deep neural nets behaviour. While over-parameterized, these models seem to escape to the previously proposed error bounds, that may increase when the dataset becomes bigger. There are some dataset dependent constants hidden in the constants inside the state-of-the-art error bounds.

3. Exact sampling of determinantal point processes with sublinear time preprocessing

The question of efficient sampling from Determinantal Point Process (DPP) is of crucial relevance for the use of DPP in real-world scenarios. In their paper, Exact sampling of determinantal point processes with sublinear time preprocessing, Michal Derezinski, Daniele Calandriello and Michal Valko provide an efficient and exact strategy to draw k-samples from the set {1,…,n} that only require a n•poly(k) pre-processing step and a poly(k) sampling cost. Their algorithm, DPP-VFX (for Very Fast and eXact DPP sampler) features a two-step sampling strategy based on 1) a downsampling regularized-DPP step and a 2) a regular DPP sampling from the subsample obtained from step 1. With clever use of the connections between DPP, (regularized) ridge leverage score and Nyström approximation, they can theoretically assert the effectiveness of DPP-VFX, which is illustrated by compelling numerical simulations on large scale problems (see article, the code is available). This is an elegant and important advance towards seeing DPP used in practical applications.

4. A Step Toward Quantifying Independently Reproducible Machine Learning Research

Finally, another very important theme that was highlighted but not yet thoughtfully explored is the issue of reproducibility of numerical experiments. Edward Raff, senior lead scientist at Booz Allen Hamilton, investigated this question in his paper entitled A Step Toward Quantifying Independently Reproducible Machine Learning Research. The author spent six months attempting to reproduce the results of 255 papers, published between 1984 and 2017, from the machine learning community. The underlying assumption was that the results should be completely reproducible starting from the details contained in the paper. Hence, the goal was not to compile/execute the code provided by the authors, but rather to reimplement the approaches based on the description presented in the paper. Overall, he succeeded in reproducing the results of 162 papers, that is 63.5% of the selected papers. According to Raff, from the reproducibility perspective, the most important feature is the quantity of information the authors were able to convey on the first read through the papers. The author summarizes it as “readability”.

Readability: “how many reads through the paper were necessary to get to a mostly complete implementation”.

“As expected, the fewer attempts to read through a paper, the more likely it was to be reproduced” -Raff.

A massive thanks to those speakers, authors and researchers for sharing their work with the community. On our side, with 5 papers accepted, we were very proud to have the occasion to present what we enjoyed working on last year. More to come in our next article, in the meantime, an overview can be found here ⬇

Stay tuned!