Original article was published by Ruben Winastwan on Deep Learning on Medium
My Data Science Learning Pathway
I think we all agree that the hardest part of everything is always in the beginning. Same as me when I wanted to get my hands dirty in data science. I kept asking a question: where do I start?
After some research, I finally came up with my online learning curriculum and here are the list of courses or specializations that I took on Coursera in chronological order.
I decided that I want to start learning data science at a very basic level because I don’t want to miss out some important concepts. That’s why I decided to take IBM Data Science as my very first specialization.
You don’t need to have any prior knowledge about data science, statistics, machine learning, or programming before taking this course. The very first course of this specialization is literally called ‘What is Data Science?’. I mean, you won’t get any more basic than this, right?
There are 9 courses in this specialization. It starts with the concept and methodology of data science before delving into programming stuff with Python and SQL. Next, it introduces you to the meat of data science — Statistics, Data Analysis, Data Visualization, and Machine Learning.
You won’t be an expert in data science after completing this specialization, as this specialization won’t teach you each topic in great detail. However, it gave me a very good overview of data science and what should I learn next.
Thanks to this specialization, I was able to create a roadmap for my data science and machine learning online learning journey as follows:
- Data Visualization
- Machine Learning
- Deep Learning
Which then leads me to the next specialization that I took.
This is a specialization offered by Cloudera which focused on utilizing SQL for Big Data analysis. In total, there are 3 courses in this specialization.
As we already know, the amount of data nowadays is just too big to be stored in traditional DBMS, hence knowledge and hands-on experience in dealing with data in distributed clusters are very important. And this course will teach you exactly that.
What I really like about this specialization is how hands-on it was. With the Virtual Machine from Cloudera, we have a chance to apply SQL query to retrieve or to store data with either Apache Hive, Apache Impala, MySQL, or PostgreSQL. You can always revisit the Virtual Machine even after you finished the specialization, so you will always able to revise your SQL skills and play around with the data.
Don’t worry if you know nothing about SQL, as this specialization will teach you from the basics.
I took this course to complement the material that I’ve learned from the previous specialization from Cloudera. While the specialization from Cloudera focused more on applying SQL in distributed clusters, this specialization gave me access to apply SQL on the cloud.
This specialization will teach you about how to retrieve or to store data on Google Cloud Platform (GCP) in BigQuery. You’ll get access to play around with Google public datasets like Google analytics and implement the SQL query by yourself.
Aside from that, what I like about this specialization is that you’ll learn more than just SQL and BigQuery. You’ll also learn about how to use Google Data Studio to create an interactive data visualization dashboard and how to create a simple regression or classification machine learning model directly in BigQuery.
After taking this specialization, I moved forward to learn about one of, if not, the most important concept behind data science and machine learning, which is statistics.
We can agree that statistics is the heart of data science. As I already know statistics before, I took this specialization with the expectation to refresh the fundamental theory of statistics. But in the end, I got more than I was expected.
The specialization really teaches you all you need to know about statistics, starting with the fundamental theory about probability, inferential statistics, and regression theory from both frequentist and Bayesian perspectives.
There are two things that I like about this specialization:
- All of the final projects are portfolio-worthy, which means that you need to do the real statistical data analysis work and don’t expect to finish them in 1 or 2 hours. After you finish the specialization, you will have 3 or 4 portfolio-worthy projects that you can put in your resume.
- You need to use R to finish the project in each course. This was good for me because I have never used R before. I think learning a new programming language will be beneficial in the long run and R is definitely a nice data science and statistical toolbox to add in your skillset.
After finishing the specialization, I felt like I want to dig a little bit deeper about Bayesian statistics, in particular about Markov chain Monte Carlo. That’s why I took one more course about statistics after this specialization, which was…
If you want to know the concept of Bayesian statistics in a comprehensive way, I think this will be the right course for you. In this course, you’ll learn about the concept regarding Markov chain Monte Carlo as well as how to solve regression problems with the Bayesian concept.
What I really like about this course is the balance between the theory and practical aspects.
For every material, the theory will be covered first, and then there will be a demonstration, in which the lecturer will show you how to implement the theory you’ve just learned in a code. In this course, you’ll learn how to implement Bayesian statistics in R and JAGS.
The final project for this course is also portfolio-worthy and pretty much similar to Statistics with R specialization above. You will be asked to do statistical analysis work with Bayesian concepts in R.
After finishing the course, I decided to move forward to the next topic, which is data visualization.
I would normally use Python when it comes to visualizing data, either with the help of Matplotlib, Seaborn, or Plotly. However, I wanted to learn something new — I wanted to learn how to visualize the data using Business Intelligence tools, either with PowerBI or Tableau. And then I found this specialization.
I would recommend this specialization if you are new to Tableau and want to learn to visualize the data with it.
There are 5 courses including a Capstone project in this specialization. The first three courses will give you a theoretical understanding of data visualization best practice and how to tell a story with your data. The fourth course is basically where you get your hands dirty with Tableau, as you will learn how to create an interactive data visualization dashboard and story with Tableau.
What I really like about this specialization is that when you’re enrolled in this specialization, you’ll get free access to use Tableau Desktop for 6 months.
This means that you can explore a lot of functionality of Tableau on your local machine and create a lot of interesting visualizations with it. If the license is expired after 6 months, you’ll have a chance to extend it for further 6 months.
At this point, I have learned about the overview of data science, Big Data analysis using SQL, statistics, and data visualization best practice. Next, it was finally the time for me to learn about machine learning.
As a total beginner in machine learning, I decided to take Andrew Ng’s Machine Learning course knowing that this course is the most well-known course on Coursera regarding machine learning.
And it is totally justified. I believe I couldn’t find a better machine learning course for a beginner than this one.
The course will teach you about the concept of classical supervised and unsupervised machine learning algorithms like Linear Regression, Logistic Regression, SVM, K-means clustering, as well as artificial neural networks. Not only that, Andrew also gave us tips and tricks for applying machine learning system in practice.
Basically, I liked everything about this course.
I liked how passionate Andrew Ng in teaching us about different types of machine learning algorithms. I liked how easy it was for him to explain and simplify difficult machine learning concepts to us. I also liked the programming assignment and how we had the opportunity to implement Neural Networks algorithms from scratch.
If you’re new to machine learning, for me this is the best course that you should take to get you started.
Finally, I was getting closer and closer to reach my initial goal — to learn about the concept of Convolutional Neural Networks.
I still remember how excited I was when I find out that Andrew Ng is the teacher of this Deep Learning specialization. It was not a difficult decision for me to take this specialization right after I finished the Machine Learning course.
The specialization is very well structured. The first course will teach you about the concept of Deep Neural Networks after you learned about the classic Neural Networks in the previous Machine Learning course. Next, it gives the important concepts of Convolutional Neural Networks and Sequence Models.
Andrew Ng as usual is perfect in teaching difficult concepts regarding deep learning algorithms. The programming assignments are interesting, which let you to implement various deep learning algorithms with TensorFlow, one of the most used deep learning frameworks in the industry right now.
However, most of the programming assignments in this specialization are still implemented in TensorFlow 1, which is pretty much outdated now.
I believe that this specialization was called TensorFlow in Practice before DeepLearning.AI changed its name to TensorFlow Developer Professional Certificate.
Anyway, the main reason I took this specialization straight away after finishing Deep Learning specialization is that I wanted to learn how to implement TensorFlow 2 for various deep learning algorithms. And this specialization totally delivered that.
This specialization is a pure hands-on exercise. You won’t find any theory regarding deep learning in it as its focus is to implement deep learning algorithm with the help of TensorFlow. Thus, it is suggested that you already know about deep learning concepts before taking this specialization.
It gives you hands-on experience on how to build deep learning models for image classification, sentiment analysis, poetry generation, and time series forecasting.
As a bonus, if you want to take the TensorFlow Developer Certificate in the future, this specialization would also be the best source for you to prepare for it. I recently took the certification and I can say that this specialization is the best source for the preparation. If you’re interested in my experience of taking the certification, you can read it in the link below.