Original article was published on Deep Learning on Medium
My 10 favorite resources for learning data science online
These websites will help you keep up to date with the latest trends in data science
I think you will not argue with me when I state that data science is becoming one of the most popular fields to work at, especially given that Harvard Business Review named “data scientist” the sexiest job of the 21st century. In the field, we have come a long way, from the times when terms like data science and machine learning were still unknown and everything was gathered under the umbrella of statistics. However, we are far from the end of the journey.
That can also be a dividing aspect of data science — the field is developing so rapidly that it can be difficult to even follow all the new algorithms, techniques, and approaches. So working in data science, similarly to software engineering, often requires constant learning and development. Don’t get me wrong, some people (myself included) like that a lot. Others prefer to learn for a few years and then just cut the coupons from that knowledge. Both approaches are perfectly fine — it is a personal preference.
As I mentioned, working in data science can be a journey. That is why in this article, I want to share my 10 favorite data science resources (online ones), which I frequently use for learning and trying to keep up with the current developments. This list will focus on online resources (blogs, videos, podcasts) and will not cover MOOCs or books, as there is more than enough content there for a separate article. Let’s start!
1. Towards Data Science
This should come as no surprise, given you are reading this article published in Towards Data Science. TDS is Medium’s biggest publication covering all data science related topics. What you can find here:
- beginner-friendly tutorials with code (in most popular languages such as Python, R, Julia, SQL, and more),
- in-depth descriptions of particular ML algorithms or techniques,
- summaries of influential papers,
- descriptions of personal pet projects,
- the latest news from the field,
- and more!
TDS creates a really nice community in which everyone is encouraged to share and participate. Additionally, I can highly recommend joining the newsletter and following TDS on Twitter to keep up with the latest and most popular articles.
Lastly, I can also recommend the Towards Data Science podcast, which can be especially helpful for people wondering how to break into data science and find their perfect role.
2. PyData (conference + videos)
PyData is the educational program of NumFOCUS — a nonprofit charity promoting open practices in research, data, and scientific computing. They organize conferences all over the world encouraging researchers and practitioners to share their insights from their work. In the talks you can find a mix of general Python best practices, examples of real-life cases the data scientists worked on (for example, how they model churn or what tools they use to generate an uplift in their marketing campaigns), and introductions to some new libraries.
Speaking from experience, it is a lot of fun to actually attend the conference in person, as you can actively take part in the presentations, ask questions, and network with people who share your interests. However, as this is not always possible and simply there are too many conferences to attend, you can find all the recordings on their YouTube channel. Normally, the recordings are published a few months after each conference.
The PyData talks are a great source of inspiration, as you can see how other companies approached a particular topic, and maybe you can apply a similar method in your company.
3. Machine Learning Mastery
Jason Brownlee’s website/blog is a gold mine of content for data scientists, especially the more junior ones. You can find a plethora of tutorials, from classic statistical modeling approaches (linear regression, ARIMA), to the latest and greatest machine/deep learning solutions. The articles are always very hands-on and contain Python code applying the particular concept to a toy dataset. What is really great about the website is that Jason clearly explains the concepts and also refers to further reading for those who want to dive extra deep into the theoretical background. You can also filter all the articles by the topic, in case you are interested only in imbalanced learning or how to code your first LSTM network.
Distill aims to provide a clear and intuitive explanation of machine learning concepts. They argue that papers are often restricted to the PDF files, which can not always show the full picture. And in times when ML gains more and more impact, it is crucial to have a good understanding of how the tools we are using actually work.
Distill uses impressive and interactive visualizations to clearly explain what is actually happening behind the scenes of the machine learning algorithms. One of my favorite articles there described t-SNE (t-distributed stochastic neighbor embedding) and showed how the generated graphs, while visually pleasing can be misleading. It also pointed out the significance of the hyperparameters by providing an interactive tool to see the impact first-hand.
If you need any extra assurances about the quality of the content there, the steering committee behind Distill included names such as Yoshua Bengio, Ian Goodfellow, Michael Nielsen, Andrej Karpathy.
5. Papers With Code
Papers With Code is a great initiative to create a free and open resource pool containing ML papers, together with the code and evaluation tables. You can easily browse the available papers (including the State-of-the-Art) and search by topics, for example, image colorization within the computer vision domain.
This website comes in really handy when you want to experiment with some approach or apply it to your dataset, without actually writing all the code yourself. While such an exercise is definitely helpful and you will learn a lot, sometimes you just need to hack together an MVP to show that something actually works for your use-case and generates value-added. After getting the required approval, you can calmly dive into the code to understand all the nuances of a particular model or architecture.
Kaggle became the go-to platform for people wanting to participate in machine/deep learning competitions. Thousands of people take part in competitions to train the best models (often large and complex ensembles of models) to achieve the best score and gain recognition (and monetary prizes).
However, the platform itself is much more than that. For starters, Kaggle contains thousands of Kernels/Notebooks, showing the practical implementation of ML algorithms. Often, the creators also provide an in-depth theoretical explanation of the models and their hyperparameters. This Notebook contains further links to many of the most popular ML/DL algorithms implemented to custom datasets in Kaggle Kernels (both Python and R).
What is more, Kaggle also contains many custom, user-uploaded datasets (at the moment of writing, over 40k) that you can use for your own analyses. You can find pretty much anything that can spike your interest, from the latest numbers concerning COVID-19 to the stats of all the Pokémon out there. Many TDS articles are written using the datasets from Kaggle. So if you want to practice your skills on something other than Titanic or Boston houses, Kaggle is a great place to start.
I started my data science journey with R, and even after switching my main programming language to Python I still follow R-bloggers. It is a blog aggregator (you can join as well by submitting your blog) and covers a wide range of topics. While most of them are R-related, you can still learn quite a lot by reading about general approaches to data science tasks.
I do believe that one should not restrict themselves to just one programming language and ignoring everything else. Maybe you will read about an interesting project/package in R and will decide to port it to Python? Alternatively, you can use
rpy2 to access R packages from Python and make your life easier.
While Python is currently the number 1 language in data science, there are still many packages and tools that have not been ported to Python from R. That’s why I believe R-bloggers is a very valuable resource and might be a source of inspiration for porting some R functionalities to Python.
arXiv is Cornell University’s open-access repository of electronic preprints of scientific papers in fields such as computer science, machine learning, and many more. Basically, this is the place to look for the latest research and state-of-the-art algorithms. However, nowadays there are so many new articles added every day that it is basically impossible to follow everything. That is why Andrej Karpathy created the ArXiv Sanity Preserver to try to filter out the most important/relevant papers. Additionally, you can follow arXiv Daily on Twitter to receive a daily curated list of the most important research articles. Friendly warning: the number of tweets can be overwhelming.
9. GitHub Awesome Machine Learning
This GitHub repo contains a curated list of machine learning frameworks, libraries, and software in general. For our convenience, they are grouped by language. Additionally, the repo contains lists of blogs, free books, online courses, conferences, meetups, and much more. This repository is definitely very valuable and you can sink in for quite some time exploring all the available information. Enjoy!
This one is can be very subjective, as in many cases Twitter is used as a social network just like Facebook. However, I try to use it exclusively for following people from the data science field and avoid click-baity content. Many researchers, authors, and otherwise famous data scientists have active Twitter accounts and they frequently share interesting/relevant content. It’s a great way to stay up to date with the new developments and “hot topics” in data science.
The list of people to follow will highly depend on the scope of your interests, for example, if you focus on deep learning used for computer vision or maybe NLP. I would recommend to start with some of your favorite authors, be it books or MOOCs, and then the list will naturally grow, as you will be exposed to other interesting people via retweets, etc.
Just in case you are interested, you can find the people I follow here.
Other helpful resources
The list above is by no means exhaustive, as the internet is full of very useful resources on data science. Below I list some additional resources that did not make my top 10 but are also great and I use them often:
I will keep on updating the list in case something slipped my mind or I discover something new 🙂
In this article, I showed you might 10 favorite resources for further developing my data science skillset. Do you have any favorite sources not mentioned here? Please let me know in the comments, I will be more than happy to find some more!
In case you found this article interesting, you might also like: