Machine Learning — Choosing Substance Over Style

If you want to foray into the field of machine learning with the intent of solving real-world, mission-critical problems of your domain, and are caught in the dilemma of whether to start with dazzling deep learning models or old-fashioned decision trees and logistic regressions, here is a proverb in Hindi that should help you resolve your predicament:

“haathi ke daant khane ke aur, dikhane ke aur.”

Popular press releases on progress in the field of AI & Machine Learning are like the tusks of an elephant — impressive to look at — while the other hidden teeth do all the hard work required to keep an elephant alive, by helping it chew and eat food on a day to day basis. Even as the Googles and the Facebooks of the world show exciting demos of futuristic possibilities and potential abilities of deep learning models, it is the classical machine learning methods, with clever feature engineering techniques that often serve the business demands. The stylish deep learning neural models have their place in solving real-world applications. In fact, for some problems, especially involving vision and speech, there are no better models than the deep learning solutions. However, it is the less sexy machine learning techniques, if applied well, that are most ready today to form the substance of successful solutions to real-world problems. Of course, the catchphrase is “if applied well”. So, what does it take to succeed with classical machine learning techniques?

Domain-Aware Feature Engineering

Most practitioners will agree that it is clever feature engineering skills that can bring the best out of classical machine learning methods. This leads to a natural question: If such techniques require clever, creative feature engineering techniques, why not just use deep learning, whose key goal (among others) is to alleviate the need for hand-crafting of features? Firstly, deep learning models require a lot of data. Secondly, deep learning models are hard to interpret, which is often required in real world. While hand-crafting of features relies on your knowledge of the domain and the particular task at hand, incorporation of such knowledge, in turn, reduces the need for a large amount of training data. And simpler models are more interpretable too. Your domain knowledge is something that you may already be well versed with. Incorporating that appropriately, in practice, will yield double benefits. Lesser data and better results!

Having said that, it is not so straightforward to translate raw data into a representation that is best suited for machine learning models to work with. You will need to equip yourself with the right kind of scientific knowledge and practical tricks of the trade to be able to achieve this.

Equipping for Real-World Problems

There are a bunch of tried and tested techniques that one needs to use to tame the data and turn them into a meaningful input form. To add to this there is a model-specific bag of tricks like regularisation, hyperparameter tuning, kernel tricks, etc. etc. (by the way, you can’t escape them even when you go to the deep learning domain). There are two key aspects that will help you discern these techniques and apply them confidently to real-world problems. One is scientific knowledge of the models and the second is awareness of problem-solving strategies that will yield excellent results.

The Value of Scientific Knowledge (Bottom-Up Knowledge)

Many tricks of the trade in getting machine learning algorithms to perform well may seem ad-hoc if you don’t understand the underlying machinery of the ML models you have chosen to work with. However, if you open the black box of machine learning models and understand how they mathematically achieve the learning objective, most of those tricks fall into the right slots of why and how they work. The scientific knowledge of working of the models will enable you to meaningfully engineer the features for the task at hand. More importantly, when the model doesn’t perform as expected, such theoretically well-founded knowledge will help you diagnose the problem and come up with corrections to your model or to the data. Without the help of scientific knowledge to guide you in these scenarios, you will be left shooting in the dark.

Understanding the Problem Solving Strategies (Top-Down Knowledge)

I have come across many engineers who have gathered a lot of theoretical knowledge, from textbooks and online courses, yet they find it hard to connect the dots between real-world scenarios and the machine learning formulations necessary to address them. It greatly helps if you study how a specific model has been applied successfully to a variety of problems consisting of different types of real-world data. Such knowledge of design patterns of machine learning will play an invaluable role in identifying and shortlisting a handful of strategies to solve a real-world problem, providing a degree of certainty to your potential success.


I began by making a case for mastering classical machine learning methods as they are more likely to be of use to you as a practitioner solving business problems. It is often easy to be blinded by the supernova explosion of popularity or hype of deep learning techniques. It is important to not lose sight of more useful, less trendy machine learning models, that still remain the workhorses of the industry. They are likely to remain so for years to come. Surely, deep learning techniques are definitely natural additions to your arsenal and for some problems they provide the best solutions today, but require a very large amount of training data. If you can make do with less data, by incorporating domain knowledge, why trade that for a more expensive technique that requires an order of magnitude more data?

If you are convinced of the value of classical machine learning methods, I hope you place importance on acquiring both, the bottom-up (scientific) knowledge and the top-down (design patterns) on equal footing. As an experienced machine learning practitioner, a mentor and a teacher, I can’t stress this more.

Source: Deep Learning on Medium