Source: Deep Learning on Medium

# An interview with Rahul Agrawal, Principal Machine Learning Manager at AI and Research, Microsoft

Our interviewee today is **Rahul Agrawal. **Rahul works as a Principal Machine Learning Manager at Microsoft. He works in the problem area of intent understanding and advertiser understanding. Rahul completed his Masters’ from the Indian Institute of Science (IISc) with a specialization in machine learning. Rahul enjoys connecting the theory and its intuition. He is obsessed with coding and solving problems. He trades his ideas, practical tips and tricks on machine learning on LinkedIn on a regular basis. Off the work, Rahul is an avid Bollywood and Indian Classical Music fan. You can check more about Rahul here.

I would like to wholeheartedly thank Rahul for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general 🙂

# An interview with Rahul Agrawal, Principal Machine Learning Manager at AI and Research, Microsoft

**Sayak**: Hi Rahul! Thank you for doing this interview. It’s a pleasure to have you here today.

**Rahul**: Thank you so much Sayak. I am really grateful to you for giving me an opportunity to present my thoughts with the readers. This has been in mind for a long time and I am really happy that you took an initiative to put this into shape.

**Sayak**: Pleasure is all mine, Rahul. Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?

**Rahul**: I am currently working at Microsoft AI and Research as a Principal Research Manager, where my core responsibility is to develop algorithms for language and image understanding. Our algorithms powers the *ads understanding* and matching them with the user query across multiple languages and markets. I lead a wonderful team of 30+ applied scientists and engineers, who relentlessly work towards building a smarter product that tries to understand the nuances of language. Our typical day involves sifting through the problems that we have, what were the failures from our past experiments, how we can tweak our model to answer the reasons for failure and model the newly observed data, researching back the existence of similar formulation and running the experiments again. A key aspect of the overall process is rapid turnaround time and the scale of data.

**Sayak**: That’s so wonderful. I am sure you *really* enjoy what you do. How did you become interested in machine learning?

**Rahul**: I started coding at the age of 11 in 1991 and was madly in love with programming and development. At that time, it was a fascination for building simple games in *GW-BASIC* on *CGA graphics*. Soon I ventured into building system software using *Terminate* and *Stay Resident* programs in MSDOS, for pranks and daily automation. Over the next 2 years, the games became pretty much boring as the computer moves were predictable. I spent most of the next 2 years building lots of small system utilities and was now looking for something new. At the age of 14, with the help of my uncle, I got access to the local library in NIT Bhopal, where I got access to Artificial Intelligence by Rich and Knight. At that time, I couldn’t grasp the complete material, but the space search algorithms were something that I adapted to an intelligent tic-tac-toe game that humans cannot defeat. This was my first tryst with something that we could call intelligent. Soon I incorporated all these newly acquired knowledge in all of my little GW-Basic Games to the extent that they soon became too difficult for humans to play with.

The second phase of the romance with artificial intelligence started in the 3rd year of my B.Tech, where I actually read the book by Rich and Knight cover to cover and realized the potential of AI. This was the year 2001 and the internet was new hot technology, where the portal of Yahoo! was a gateway to the world of information. I built up a face recognition system for my thesis project with all sorts of mathematics and algorithms coded up in a combination with C++. I also worked on a voice recognition system. Both the projects used to work for a fixed set of users and under strictly controlled situations, but the ball was rolled into action. Throughout the past 10 years of coding, I have developed into someone who has a deep understanding of computer systems and software development, but now I have decided that I will commit my life to study in the field of AI. It was like you struggled hard practicing for your favorite sport throughout the winters and in springtime, so you changed the sport.

Next, I went to complete my MTech at the **Indian Institute of Science** and that is the place where I understood the name machine learning in its full glory. It was a tough ride for someone who was madly in love with the development of algorithms and systems, to now look at the mathematical side of machine learning. At first, it didn’t looked like computer science at all and I thought I did the mistake, but over the next two years under the supervision of my guide Prof. Chiranjib Bhattacharyya and other eminent experts in the department, the passion for ML and AI was cast in stone.

**Sayak**: Thank you so much, Rahul, for your detailed story. I am sure it must have been joyful ride, so far. When you were starting what kind of challenges did you face? How did you overcome them?

**Rahul**: I ventured into the field of AI as a natural progression from developing deterministic algorithms to incorporating intelligence as the algorithms for making games smarter has to be non-deterministic. During this transition, I was approaching the field purely from the algorithmic and development aspect. The one ingredient that was missing was a deep understanding of continuous mathematics. To elaborate on this a bit more, computer science handles both discrete and continuous objects. When we talk about the users and their connections, we are talking about discrete objects and when we talk about a series of values of Bombay stock exchange, we are talking about continuous data.

Now if we consider problems in the discrete space, they are frustratingly complex and lack necessary symmetry and continuity. The success of machine learning in large lies in the fact that real-world questions involve *symmetry*, *continuity* and hence we can have reasonably intelligent algorithms. This implies that the biggest challenge was the lack of understanding of continuous mathematics — probability theory, real analysis, linear algebra, etc. To compound the problem, I was also ignoring the mathematics learning during the initial parts of my studies at IISc. Fortunately, the resistance to learning mathematics was purely a psychological one, as before I committed myself to computer science, mathematics was my passion. So for the next 3 semesters, I undertook a lot of serious math studies to gain an in-depth understanding of mathematical aspects of machine learning.

Second challenge while learning about machine learning was a lack of availability of good quality libraries. While R was already a serious software with a rich set of algorithms available, I didn’t pick it up initially. It turned out to be a blessing in disguise because I went ahead and implemented everything that I needed in C++, which helped me understand the innards of machine learning algorithms and non-trivial deep issues such as the impact of floating-point errors when we are performing millions of operations. The negative impact of this was a lack of breadthwise experimentations and also many failed efforts as my implementations had bugs. I overcame this handicap when I entered the industry by making it a habit of regularly learning stable libraries and languages pertaining to machine learning.

**Sayak**: This is so inspiring! The challenges you faced and how you tackled them are my lessons-to-be-learned here. Thanks again for detailing it as much as you could. What were some of the capstone projects you did during your formative years?

**Rahul**: I would like to mention about two projects — developing a handwriting recognition system for an indigenously developed handheld device Simputer, and a recommendation system for YouYube. The first project was my MTech thesis project while the other was done at the first company I joined, Veveo.

To speak about the first project, it was an interesting problem for me as Simputer was a low resource device when compared to PC where I was doing most of my machine learning till that time. The problem was to recognize handwriting as the user is writing on the screen of Simputer with stylus and to convert it into the text for storing and search. The languages supported included nine Indian languages along with English. It was also special as the solution involved the study of reproducing kernel Hilbert spaces and a slim book Computation Functional Analysis. The project has everything from pure mathematical modeling, solving a real-world problem, continuous implementation, and integration, and a low resource device, which implied my implementation always involved trade-offs. The key learning from the project was the realization of how business problems are abstracted and mathematical models are created for them. *How do we come up with a learning problem to address the mathematical model and to develop an efficient implementation?* Another key aspect that I learned was the importance of *continuous experimentation* and *working with the real product in close cycles*.

The project on recommendation system for YouTube was to suggest related videos given a video or your viewing history. It must be noted that this was in the year 2006. The most interesting aspect of the solution was that we connected formalism from graph theory, hubs and authorities, and coupled it with a combinatorial optimization formulation that was NP-Complete. Then we went ahead and built upon a well-known approximation algorithm and scaled it to 50 machine setup using MPI. In short, this was a major project for me that helped me develop an appreciation for breadthwise modeling as interesting formulations can be scattered all around and also how to transform and scale-up algorithms.

**Sayak**: I think I now know what is really meant by “Deep Dives”! These fields like machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?

**Rahul**: It is a great question and it took me some time to get it right. The saving grace for me was that the field was growing along with me. The habits that I have formed in my early stage of career was to regularly follow the abstract services. I had a personal list of conferences and journals that I made sure I read at least the abstracts so as to make myself aware of the latest happenings in the research world. Secondly, reading the ACM’s Computing Surveys provided in-depth survey style articles in one area. Thirdly, I used to follow a list of blogs that provided me perspectives beyond technology and science. Finally, I ensure that every paper I read, I will try to code up a bare metal version so that I understand the paper fully and the not so apparent implementation challenges and assumptions. Towards the later stage, I also started following SourceForge and later GitHub to keep myself on top of the latest open source projects relevant to the field.

**Sayak**: That is pretty comprehensive! Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?

**Rahul**: In my opinion, a new concept consists of science and engineering aspects. Science provides a mathematical model to explain an observed process. For example, suppose we are building a model and we realized that the loss function used is not performing the way it should. In such a scenario there is a physical phenomenon that is not captured by the mathematical machinery we had and hence someone will create a new loss function. The second aspect of the concept is how it applies to solving an engineering problem.

I approach a new concept by first trying to get a sense of the mathematical question the concept is trying to address, differential calculus is about measuring rates of change, the probability is about modeling uncertainty, and SVM is about maximizing minimum margin. To get a good sense of the mathematical model, I usually read at a cursory level the proposed formulations, commentaries on the concept to get an overall picture. Once I get an overall sense of the mathematical problem addressed, I then look for either a good book or survey paper to get an understanding of the concept. Finally, I shift my focus to understanding the engineering challenges and real-world issues. For example, SVM’s are non-trivial to parallelize or subject to incremental learning.

**Sayak**: That is really “full-stack” of you. You do not only just care about the mathematical foundation, but also look for the engineering challenges involved in there. Pretty neat! Any plans on putting together a book on practical machine learning? The reason I ask this is your tips, tricks, and thorough explanations that you share over LinkedIn are so awesome. So won’t it be great to put them together in the form of a handbook or even a full-fledged book?

**Rahul**: In fact, I was working on a book on machine learning last year, though I stopped the effort after writing four chapters. The primary reason was that it was yet another “ML in Language X” book, which I realized was degenerating into a library/language manual. Secondly, I am a strong believer in the fact that the learning material should be freely available and such an arrangement couldn’t work out with my previous book.

In the meantime, another development happened, I started engaging with a lot of young practitioners and new entrants in the field. One of key awakening for me was that there is a clear dearth of good material on two aspects, approaching machine learning from the perspective of product development and linkage between mathematical model & engineering implementation. For example, consider I am given the problem of detecting spell variations in the user-generated content. Now should I solve this problem using sophisticated BERT/GPT-2 based models or edit distance-based algorithms are fine? How do I even detect the existence of BERT kind of formulation? How to ensure I am continuously being agile? Should I go for full-blown data collection or should I involve active learning? What choices do I have for metrics and how to ensure that I have a working engineering implementation?

The more I thought about it, the more it became clear to me that there is a need for a book on elucidating the practitioner’s viewpoint and also the computer science viewpoint of machine learning. I have started working on this book and I hope to release it in public domain by Feb 2020.

**Sayak**: That’s tremendous news, Rahul! I will be waiting to read the book eagerly. Any advice for the budding researchers?

**Rahul**: My advice to budding researchers is to realize and leverage your strength. Machine learning is an interdisciplinary field that you can enter as an application domain expert, computer science researcher, mathematician, statistician, computational linguist, etc. The key is to utilize your strengths and leverage them for yourself. Secondly, keep learning and implementing. It is very important to have a breadth and depth understanding as at the end it is the elegance and generality of the model that defines the quality of the solution. Finally, always try to do an implementation (even rudimentary) of the papers read so that you get a clear understanding of the issues involved.

**Sayak**: Thank you so much, Rahul, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.

**Rahul**: Thank you to you Sayak again. This is a great opportunity for me to share my thoughts. I hope the interview will help to dispel confusion from the minds of our readers. I also hope that it will help them tread the path of machine learning with more confidence and kindle many more minds.