Don’t Learn Deep Learning

Source: Deep Learning on Medium


A Bathysphere may be useful for literal Deep Learning. (Ralph White/ Corbis)

Deep Learning is one of the biggest breakthroughs in machine learning in the last generation — but does that mean that a generalist data scientist should try to master it?

Deep learning was unequivocally a leap forward for neural networks. Computer vision problems that had previously been a stumbling block suddenly became tractable, and a new vista opened, reviving the entire field of Artificial Intelligence.

In fact, it’s importance was recognised recently when three of the figures of greatest importance to Deep Learning — Yoshua Bengio, Geoffrey Hinto and Yann LeCunn were given the Turing Award by ACM for their contributions to the development of Deep Learning.

With Deep Learning being seen as the place where machine learning advances are coming for the present, it is often recommended that beginning data scientists need to learn how deep learning works as an essential step towards establishing a deep learning career.

There are two ways in which Deep Learning can be properly seen as a niche skill set within the overall framework of data science.

The use cases for deep learning stand somewhat apart from the mainstream use cases of data science. Where Data Scientists are often trying to make models with broad applications in a business context, such as to predict marketing churn, insurance events or similar, the use cases that deep learning has been especially associated with have tended to be around things like computer vision and other traditionally Artificial Intelligence applications.

These different knowledge domains themselves require a different mindset, and different background knowledge. In fact, although data science and artificial intelligence are seen by some as two forms of the same thing, or as fields such that one is a subset of the other, it is more the case that they are completely different fields with a small area of overlap, like medicine and pharmacy or law and accounting.

As a result, where the use cases of Deep Learning are dominated by Artificial Intelligence applications, those applications are very different from the mainstream of data science applications. The applications where Deep Learning has proved most effective (for now, anyway) — image recognition and speech recognition — are a long way off the beaten track for the majority of data scientists, and moreover have their own customs and jargon that must be learned to do anything more than play in the shallows.

Deep learning is also especially burdensome in terms of requiring specialised knowledge of particular neural network architectures. Moreover, in contrast to a lot of the terminology that applies to more common data mining algorithms such as random forests, C&RT and even Gradient Boosted Machines, the terminology from Deep Learning generalises poorly.

A third issue is that Deep Learning is a true Big Data technique that often relies on many millions of examples to come to a conclusion. In his ‘Critique of Deep Learning’, Gary Marcus expresses this as ‘ In problems where data are limited, deep learning often is not an ideal solution.’ The sad truth is that for most data scientists, most of the time, data is limited. This is a secondary reason that deep learning often won’t be suitable for the applications that most data scientists work with most often.

These three elements amount to Deep Learning requiring a different mindset compared to statistics, and even compared to at least some of the other approaches to machine learning. Not only does this mind set need to be learned for a practitioner to be effective, it needs to be partially unlearned when the practitioner returns to another arena. In this way deep learning and statistics are similar to microsurgery versus orthopaedic surgeon — working with very small structures in the body compared to working with the largest bones.

This isn’t to say that developing Deep Learning skills can’t be another feather in your cap, or another tool in your toolbox. However, what needs to be realised is that Deep Learning will, for most people, offer a poor return on the time invested in study, especially early in your career. As one of the most difficult to learn tool sets with among the most limited fields of application, the other tools offer a far better return on the time invested.

The burden of needing to study extra stuff that is unlikely to be used is already deflecting people trying to learn to be data scientists from their goals. This is likely to be contributing to the burnout that a lot of data scientists report. Limiting the required study to a smaller list of more essential topics allows for the possibility of saying, ‘I’ve mastered what needs to be mastered, at least for now.’

In this context, the extra burden of the specialised and non-transferable skill set that goes with deep learning is a bridge to far for many people. On the one hand, the idea of the deep learning algorithms creating their own features automatically means that practitioners are kept apart from the data they are trying to model. We owe to the next Data Scientists coming up the ranks to ensure that they only study what is truly necessary to begin their career.

Robert de Graaf’s book, Managing Your Data Science Projects, is out now through Apress.

Follow Robert on Twitter.