Getting started with data science — the path I chose as a sophomore

Summer of 2018 — as I reach the end of my sophomore year I decide to do an internship revolving majorly around software development in python. Data science as a field was not particularly new to me — however, was definitely something which I considered as “out of my league” or “not my cup of tea”. One of the biggest hurdles of getting started with data science is the mental blockage thrown by either our peers/seniors as to how daunting the field and the math associated with it can be.

A major part in getting started with any field is the route you choose. Choose a sub-optimal route and you get easily burnt out or scared by the utter complexity of the concepts being thrown upon you. Stop learning at a wrong stage and you only know concepts from the surface & not their actual implementation. What proceeds is how tackled this issue with trial and error and is my personal take — your mileage may vary.

Looking at the initial hype, I immediately decided to jump on the hype train and start off with the famous Machine learning course taught by Andrew Ng. Four weeks in, I get stressed by just looking at the math and see myself struggling to grasp the concepts. (As you might have guessed, math isn’t something I am extremely comfortable with, hence the bias towards opting for a more application-oriented course. If you like math, you might like the Machine learning course by Andrew Ng.)

Mistake 1 — Do not start with something which involves a lot of math, there’s a high chance you’ll give up or fail to understand the underlying concept.

Two months later, during my summer internship, I started following the Machine Learning A-Z course taught by Kirill Eremenko and realized that the concepts standalone aren’t too tough to understand and I can do it. I later preceded to finish all the A-Z courses and felt pretty confident about my skills in the field considering that I completed 4 courses in a span of 5 weeks.

Mistake 2 — Do not overshoot your abilities.

With all this knowledge from the 4courses I took, I went ahead & tried to solve a real-world problem — trying to detect malicious URL’s. After doing some digging on the web, I find out random forests is a good approach to this problem and successfully implement a basic model to solve the issue in a matter of four to five days. Here’s where I hit my first roadblock — Parameter Tuning.

I find myself tuning hyperparameters for the random forest algorithm like a monkey turning knobs randomly. It involved a lot of trial and error and when things did work, I had no idea about how they worked.

Now here’s when I decided that it’s time that I need to know the math behind this to be actually efficient. I immediately started the machine learning course and found myself actually understanding a lot of the math which I failed to grasp previously. I attribute a lot of this to the fact that on the second attempt, I knew where the math was actually being implemented. This gave me enough motivation to actually take up the specialization and complete it too.

Once you are done with this, you can move on to take further advanced courses in a particular domain of your choice. A great example to this would be a course like CS231n if you are interested in Computer Vision and wish to learn more about CNN’s.

Mistake 3 — Just do not keep on doing courses!

I find myself guilty of committing this mistake. If you keep on going through theory without practical implementation, a lot of what you learned won’t stick around for long. Start off by making good side projects — an ideal approach can be incorporating data science into your college mini projects which most universities require for every course you take in a semester. This way you end up earning brownie points as your project would stand out from the rest of your peers and also help you fill in your resume while applying for internships later on!

TL;DR: Don’t start off with math heavy courses. The A-Z courses offered by Kirill Eremenko are great to start off, after which you should move on to understand the math.

Again, everything mentioned here is my personal opinion and your mileage may vary!

Source: Deep Learning on Medium