Learning Data Science: Be careful what you ask for.

Source: Deep Learning on Medium

Learning Data Science: Be careful what you ask for.

Courtesy: Wikimedia.

Around 2 years ago, one evening I reached home after a hard day of brainstorming about writing a data input pipeline for a TensorFlow project I was writing. While working with 5 million images is exciting, it is not fun unless you can process them efficiently. After dinner, I sat down in front of youtube and looked up for any information which taught about something as critical as an input pipeline. What I found made me feel a combination of disgust, shame and foolish for even looking up to youtube for such information.

There was a youtube video speaking about “TensorFlow in 5 minutes (tutorial)”. Despite having no good reason for me to watch that video, I was curious as to what can be said about TensorFlow in 5 minutes. Utter horror and disgust awaited me. A guy with coloured hair, using a ton of memes with rapidly changing background with high contrast was making it seem like it was a bad dream. It was not so bad as an introduction of what TensorFlow is. As a tutorial, it was all hell broke loose. As a researcher, I often work with new ideas and use large-scale datasets for which good quality implementation is critical. An implementation often takes hours of planning and sometimes it can be days. Above all, it gives a perspective on what big data really is and what exactly is the value of truly understanding something by giving time to it; rather than jumping, dancing and quickly skimming one’s way through it.

Such youtube videos are a reality of our generation. As time passed, more and more of such and other videos were brought to my attention. Today perhaps they are in tens of thousands. Over time they became professional jeopardy. Over time we have seen more and more interns who apparently claim to know things such as Keras, TensorFlow and PyTorch; yet turn out to be in a precarious position, when some real-life problem’s implementation is entrusted to them, thereby causing professional jeopardy. To me personally as well as to loads of other colleagues it is plainly sad.

Young people with tons of talent end up at the wrong spot at the wrong time with a responsibility they cannot bear. The confidence that dwindles before them cannot be upheld even by the best of us. That their reliance on some youtube star did not pay off and the reality was much more challenging simply butchers their confidence and there is very little that can be offered to them as help.

Youtube is certainly an asset for learners, but you get exactly what you ask for. If you are looking for a set of 5-minute tutorials to binge-watch, so you can be a data scientist in 3 months, that is exactly what you will get. But you will never be able to work confidently on a real-life problem. There is a way to educate yourself in machine learning and data science on your own by making use of youtube, but you quite definitely need to be committed to seriously learn the stuff, step by step and give it time that it deserves. People like me, who have gone through a proper engineering and research course of machine learning have taken years to get good at what we do.

I am aware of the financial difficulty faced by thousands in getting through a university course in machine learning. I am no exception and without the utter help of my parents, I’d not be here. And I stand by the fact, that one can learn data science and machine learning by themselves. But choose, the right resource.

Tetiana Ivanova’s talk in 2016 about career planning for data science, is something I and a lot of people I know immensely respect. It deserves and commands respect, for the systematic thought put into the talk. It does not present a picture of data science which is full of rainbows and sunshine. It instead presents a way into it and highlights the challenges clearly. It tells you the truth.

Learning data science on your own is challenging and is above all, an exercise in career planning. One absolutely cannot recommend to you a mind-map or a flowchart of steps you need to take. It is a very individual journey and specialization is the key today. Specialize yourself, in a specific domain. Get theoretical and practical knowledge of that domain. For implementations, even if you do not have access to a large machine, imagine hard scenarios and try to find stuff related to that online — e.g:- processing terabytes of data efficiently, distributed computations etc.

There is no shortcut to data science and machine learning. There is no 3-month or 6-month program you can or should follow which will make you a real data scientist. It is an illusion and a dangerous one which will cost you above all, your trust and your beliefs. Otherwise, like so many others, you will pay money to someone for learning data science and getting references to reputed companies, only to, later on, find yourself asking for a refund.

I understand that I have not spoken of an approach to learning data science here. This is because there is no shortcut or a mind map. From the next post, I would begin talking about several aspects of data science and approaches to potentially handle them.