Medium Articles that Made Me a Better Data Scientist in July

Original article was published by on AI Magazine


Medium Articles that Made Me a Better Data Scientist in July

Improving my skillset and knowledge about contemporary machine learning

Photo by Alisa Anton on Unsplash

I am a data scientist and avid reader (and writer) of articles about data science and machine learning. It takes time to read journal articles, listen to podcasts, analyze interesting data, and play with new machine learning packages. Blog posts offer much of this knowledge in a condensed form.

When browsing, I look for articles that will teach me something new and applicable. Where is the field of machine learning heading? What is the latest research? How can I do my job as a data scientist better? How can I understand machine learning algorithms better?

My favorite articles are those that help me understand machine learning algorithms — especially algorithms that weren’t widely used (or didn’t exist) when I completed my MS in data science in 2017. The most useful articles are those that include code examples because recoding algorithms is time-consuming.

All this said, there are a lot of data science articles out there! Which to read?

In this post, I share some articles that I read and found useful in July. These helped grow my knowledge about data science and machine learning. I hope you find them just as valuable.

The Severe Limitations of Supervised Learning Are Piling Up

What is the future of machine learning research?

Supervised learning algorithms have brought a lot of value to businesses. However, the marginal value of small improvements in a supervised learning algorithm is decreasing. Why? The algorithms themselves are already pretty good and require many labels to provide good results. Labels are expensive to acquire and there are many datasets with no explicit “labels”!

Supervised learning is in the process of realizing another limitation: at its best, it only does exactly what we want it to do.

Supervised learning can only interpolate. Reinforcement learning and similar evolutionary algorithms have the potential to extrapolate.

The future of research is likely in unsupervised, semi-supervised, and reinforcement learning!

Why I Liked This Article

This article speaks to the future of machine learning research and what sort of break-throughs I might expect to see.

NGBoost algorithm: solving probabilistic prediction problems

NGBoost is a “natural gradient” boosting algorithm that can predict the distribution of a target variable, not just a point estimate. This is important because often the uncertainty of a model, or range of probable values, is just as important as the exact predicted value.

How is this done? Per the paper, “NGBoost generalizes gradient boosting to probabilistic regression by treating the parameters of the conditional distribution as targets for a multiparameter boosting algorithm.”

Why I Liked This Article

This article nicely explains a technical paper about the new NGBoost algorithm. As icing on the cake, the article also shared a python package that makes it easy to apply ngboost in practice. (Paper + open source code = gold)

GPT-3, a Giant Step for Deep Learning And NLP

Many articles (and tweets) have been written demonstrating the impressive capabilities of GPT-3. But how does it work? Fortunately, this article breaks down and explains the key points of the 72-page GPT-3 paper.

Why I Liked This Article

GPT-3 is a powerful model with many applications. I personally hope to incorporate it into a project sometime! As a practitioner, there is a lot of value in understanding how language models work, especially in choosing how to proceed in an NLP project. But the paper itself is bit long to read for general understanding; this blog post highlights the key points that I need to know.

SHAP explained the way I wish someone explained it to me

Explainable ML is an important new area in machine learning. Shap is a popular method that highlights how a black-box model uses data to make predictions.

The explainer of a black-box model should not itself be a black-box.

This article provides a visual and intuitive explanation of the SHAP algorithm.

Why I Liked This Article

I have worked with model explainability methods like SHAP before, but admittedly my knowledge of the algorithm was rudimentary. After this blog post, I have a much stronger understanding and can explain how it works.

Deep Learning for Anomaly Detection: A Comprehensive Survey

This article summarizes a survey paper on deep learning for anomaly detection.

The paper/article describes the key challenges of the anomaly detection task:

  1. The difficulty to achieve high anomaly detection recall rate
  2. Anomaly detection in high-dimensional and/or not-independent data
  3. Data-efficient learning of normality/abnormality
  4. Noise-resilient anomaly detection
  5. Detection of complex or multidimensional anomalies
  6. Anomaly explanation

There are three general ways to use deep learning for anomaly detection:

  • Deep learning for feature extraction
  • Learning feature representations of normality
  • End-to-end anomaly score learning

Why I Liked This Article

My reasons for sharing this article are simple: it is well-written and I am quite interested in this topic.