Source: Deep Learning on Medium
Learning Data Science — 4 untold truths
Did you flirt with the idea of learning data science? You are not alone. This has been a really hot topic in the last few years and it will be one in the upcoming few, for sure. Yet, very few people actually become data scientists.
Well, part of the problem is that many aspiring data scientists don’t know what to expect from this field. Or even worse, based on the many misleading (sometimes scammy) “how to become a data scientist” articles, they have false expectations. And when they hit the wall, they get demotivated and quit.
In this article, I want to show you four untold truths that you should know about learning data science — and I have never seen them written down anywhere else before.
Untold truth #1: Learning Data Science is Hard!
Learning data science is not easy.
It will take a lot of work, a lot of energy and a lot of time from you.
I have seen an ad recently in my Instagram feed that said:
“Take this course and master data science in 1 month!”
And I was like: what the fudge!?
I’ve been practicing data science for 6+ years now. I’ve held senior DS positions (in addition to teaching). But I wouldn’t say that I mastered data science or analytics. I know for a fact that no one can master data science in 1 month. In fact, my personal estimation (based on students I worked with) is that from zero to the junior level the learning process will take ~6–9 months.
(More about that in this course: Data Science Online Training
Learning data science is hard!
A few online education platforms imply the opposite.
- “Just change one word in this query. Run it! And boom, you’ve learned SQL…”
- “Just watch this video course of the instructor running Python code, and you will know Python, too…”
- “Just play around with this interactive chart and you will understand regression analysis immediately…”
Two years ago, I interviewed a guy for a junior DS position. He didn’t have any hands-on experience, but he learned SQL on a popular “just-type-your-code-into-the-browser” kind of online learning platform. (I won’t name the exact platform here. :-))
I gave him a computer with an SQL manager open — and a simple real-life task. He had to JOIN two SQL tables, then do a simple segmentation. He couldn’t solve the task! He ran into syntax errors, he couldn’t debug his code, he didn’t get the context, he couldn’t discover the data…
And that’s when I realized that many of these online schools give people only the illusion of data science knowledge.
You want to have real data science knowledge
You want to have real data science knowledge.
But what does it take?
Well, first and foremost: (1) a lot of practicing (2) in true-to-life data environments.
Don’t try to skip forward: take the time and the energy and set up your own data server!
Yes, sometimes (well, quite often in the beginning) you will mistype a code-snippet, your computer will throw an error and it will be very annoying. But this is how it works! We make mistakes, we learn from them and next time we will do much better.
And also take the time to practice a lot!
When you practice, it’s okay to make stupid mistakes. For instance, it’s okay to accidentally mess up your previously built data pipelines and lose hours of work… (This happens from time to time with my students.) But again: we all do stupid things in real life data projects, too. At least, I did in my junior years — and it cost me a lot of extra work-hours. But I learned from that.
We make mistakes, we learn from them and we don’t make them again.
Note: How to practice? I shared a few ideas (and even more) in the above-mentioned Data Science Online Course.
Learning data science is not easy and it will take time. If you can’t accept this fact, then maybe this profession is not the best choice for you. But if you are okay with learning data science the hard way, this learning period of a few months will be one of your best long-term investments. (I’ll get back to this below.)
Untold truth #2: It’s not “Learning Data Science”, it’s “improving your Data Science skills”
The world changes really fast and it won’t get any slower.
And I seriously believe that if one wants to keep up with the pace, the only way to do it is by focusing on improving skills.
You might already have heard that according to researchers’ predictions, ~65% of today’s grade schoolers will hold jobs that don’t exist yet.
You might also have heard that the current estimated “half-life” of engineering related information is ~4 years. So 50% of the things your learn today regarding IT will be outdated in ~4 years.
What does it mean for you?
That the skills you acquire and improve are way more important than the actual information you learn.
It also means that “learning data science” is not about learning data science.
- improving your coding skills.
- improving your business skills.
- improving your mathematical/statistical skills.
- improving your data visualization, presentation, communication and other soft skills.
Learning data science is not about:
- Learning a certain package of Python.
- Learning the different industry benchmarks for this or that KPI.
- Learning certain statistical models.
- Learning how to use Google Data Studio or Tableau.
What seems important today, might be irrelevant in 5 years
Because mastering, for instance, the Scikit-learn library or Google Data Studio might seem important today… but I bet that there will be a better machine learning package and a better data visualization software in 5 years.
Don’t get me wrong, I still think that today, you should learn these things because they are part of the current data science and analytics ecosystem and also part of the learning curve itself.
I’m saying that you should keep in mind that when you learn these (or any other) tools, the important thing is not to cram in every little syntax detail or which button is where in the specific software — but to understand the big picture. Why does this tool work the way it works? What’s the underlying logic? How does this function work in other similar tools? Once you get these, changing between tools (even between programming languages) will be easy as pie.
And you will be much more prepared for the ever-changing future.
So to future-proof your data science career: focus on your skills and not on the information you learn!
Untold truth #3: Because it’s hard, Learning Data Science is a great investment
Let’s talk about career perspectives, too!
Learning data science is a great short and long-term investment.
I guess I don’t have to explain the short-term investment part.
“Demand for data scientists is off the charts … data science skills shortages are present in almost every large U.S. city. Nationally, we have a shortage of 151,717 people with data science skills, with particularly acute shortages in New York City, the San Francisco Bay Area, and Los Angeles.”
Also, based on Glassdoor’s research, Data Scientist was ranked as the best job three years in a row in the USA.
Note: the above numbers apply to the US only — I don’t have hard data for the EU or any other parts of the world. But in my experience, in the EU we have the same trends.
High demand and persistent shortage puts data scientists into a really good position. It means:
- Higher salary and better benefits
- Better job security
- Better work conditions (e.g. flexible hours, working from home, etc.)
Besides, data scientist is a well-respected job within the company (and in the outer world, too). You will be someone who your managers and colleagues want to listen to.
The point is: learning data science is a good short-term investment, for sure.
But is learning data science a good long-term investment, too?
My answer is yes and I have two reasons.
Just look at the data! In 2018 the shortage of data scientists in the US was 151,717 people. This number was ~140,000 in 2011. So in 7 years, the market couldn’t produce enough new data scientists to fill up the gap. (It even grew a bit.)
This is something that I’ve already mentioned in the intro. Many people want to learn data science… yet, not too many of them become data scientists after all.
Why? Because learning data science is hard. It’s a combination of hard skills (like learning Python and SQL) and soft skills (like business skills or communication skills) and more.
This is an entry limit that not many students can pass. They got fed up with statistics, or coding, or too many business decisions, and quit.
So the question is:
- Is data science for you?
- And if yes: are you willing to put in the effort and the hard work it requires?
If yes, it will be one of the best career investments of your life.
Untold truth #4: Learning Data Science is not about learning Machine Learning, Deep Learning (or any other data buzzwords)
If you had to guess, what would you say is the most time-consuming part of the data scientist job?
Or in other words, what do you think you’ll need to work on the most when practicing data science and analytics for real?
Hint: it’s not Machine Learning.
The answer is…
Data scientists often say: “80 percent of data science is data cleaning. And 20 percent is complaining about data cleaning.”
Okay, obviously, that’s a joke.
But when you get into your first data science role, you will see for yourself: it’s not about doing machine learning and predictive analytics 24/7.
Because to be able to run a proper ML algorithm you have to complete many other steps first:
- data collection
- data formatting
- data cleaning
- transforming your data to the right format
- discovering and understanding the data
- running other data analytics projects
- data visualization
- automating the above steps
- and so on…
And believe me when I say: when you are working with real data, these things are just as exciting as the machine learning and predictive analytics parts.
What’s important then?
When you are learning data science, you should not focus on polishing your ML skills. Instead you should focus on:
- being fluent with Python and SQL
- understanding the business logic behind simpler analytical methods
- being familiar with the basics of statistics
- practicing and experiencing the pain of working with a raw and uncleaned data set
- learning how to automate
- and so on…
These things will help you to become a better data scientist and eventually get your first job — not another deep learning or artificial intelligence course.
So to summarize:
- Learning Python and SQL –» important
- Learning about Deep Learning –» not important
- Learning the basics of statistics –» important
- Learning about Artificial Intelligence –» not important
- Practicing data cleaning, data formatting and automation –» important
- Understanding “artificial neural networks” –» not important
At least, at the junior level…
Later on (in 1 or 2 years), when your career moves forward, you will have to learn these above-mentioned, fancy machine learning methods on the job, anyway.
But for now: focus on the things that are important for your next step!
I know: being a data scientist, a machine learning guru, a master of deep learning… These all sound exciting. And you will get there eventually.
(I mean, if you want to. For instance, I take much, much more enjoyment from working on simpler analytics projects that have bigger impacts on business.