A Mathematical Explanation of Naive Bayes in 5 Minutes

Original article was published on Artificial Intelligence on Medium

A Mathematical Explanation of Naive Bayes in 5 Minutes

A thorough explanation of Naive Bayes with an example

Photo by Courtney Cook on Unsplash

Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say that it’s a poor algorithm despite the strong assumptions that it holds — in fact, Naive Bayes is widely used in the data science world and has a lot of real-life applications.

In this article, we’ll look at what Naive Bayes is, how it works with an example to make it easy to understand, the different types of Naive Bayes, the pros and cons, and some real-life applications of it.

Preliminary Knowledge

In order to understand Naive Bayes and get as much value out of this article, it’s expected that you have a basic understanding of the following concepts:

  • Conditional probability: a measure of the probability of event A occurring given that another event has occurred. For example, “what is the probability that it will rain given that it is cloudy?” is an example of conditional probability.
  • Joint Probability: a measure that calculates the likelihood of two or more events occurring at the same time.
  • Proportionality: refers to the relationship between two quantities that are multiplicatively connected to a constant, or in simpler terms, whether their ratio yields a constant.
  • Bayes Theorem: according to Wikipedia, Bayes’ Theorem describes the probability of an event (posterior) based on the prior knowledge of conditions that might be related to the event.

What is Naive Bayes?

Naive Bayes is a machine learning algorithm, but more specifically, it is a classification technique. This means that Naive Bayes is used when the output variable is discrete. The underlying mechanics of the algorithm are driven by the Bayes Theorem, which you’ll see in the next section.

How Naive Bayes Works

First, I’m going to walk through the theory behind Naive Bayes, and then solidify these concepts with an example to make it easier to understand.

The Naive Bayes Classifier is inspired by Bayes Theorem which states the following equation:

This equation can be rewritten using X (input variables) and y (output variable) to make it easier to understand. In plain English, this equation is solving for the probability of y given input features X.

Because of the naive assumption (hence the name) that variables are independent given the class, we can rewrite P(X|y) as follows:

Also, since we are solving for y, P(X) is a constant which means that we can remove it from the equation and introduce a proportionality. This leads us to the following equation:

Now that we’ve arrived at this equation, the goal of Naive Bayes is to choose the class y with the maximum probability. Argmax is simply an operation that finds the argument that gives the maximum value from a target function. In this case, we want to find the maximum y value.

Now let’s go through an example so that you can make more sense out of this algorithm.

Example of Naive Bayes

Suppose you tracked the weather conditions for 14 days and based on the weather conditions, you decided whether to play golf or not play golf.

First, we need to convert this into a frequency table, so that we can get the values of P(X|y) and P(X). Recall that we are solving for P(y|X):

Second, we want to convert the frequencies into ratios or conditional probabilities:

Finally, we can use the proportionality equation to predict y, given X.

Imagine that X = {outlook: sunny, temperature: mild, humidity: normal, windy: false}.

First, we’ll calculate the probability that you will play golf given X, P(yes|X) followed by the probability that you won’t play golf given X, P(no|X).

Using the chart above, we can get the following information:

Now we can simply input this information into the following formula:

Similarly, you would complete the same sequence of steps for P(no|X).

Since P(yes|X) > P(no|X), then you can predict that this person would play golf given that the outlook is sunny, the temperature is mild, the humidity is normal and it’s not windy.

TLDR

To synthesize what we just did…

  1. First, we created a frequency table and then a ratio table so that we could get the values for P(X) and P(y|X)
  2. Then for a given set of input features X, we computed the proportionality of P(y|X) for each class y. In our example, we had two classes, yes and no.
  3. Lastly, we took the highest value of P(y|X) of all classes to predict which outcome was the most likely.

Types of Naive Bayes

There are three main types of Naive Bayes that are used in practice:

Multinomial

Multinomial Naive Bayes assumes that each P(xn|y) follows a multinomial distribution. It is mainly used in document classification problems and looks at the frequency of words, similar to the example above.

Bernoulli

Bernoulli Naive Bayes is similar to Multinomial Naive Bayes, except that the predictors are boolean (True/False), like the “Windy” variable in the example above.

Gaussian

Gaussian Naive Bayes assumes that continuous values are sampled from a gaussian distribution and assumes the following:

Pros and Cons of Naive Bayes

Pros

  • As shown above, it is quite intuitive once you understand the concept
  • It’s easy to implement and performs well in multiclass prediction
  • It works well with categorical input variables

Cons

  • You can encounter the zero-frequency problem when there’s a category in the test set that’s not in the training set (although there are workarounds for this)
  • The probability estimates are not the most trustworthy from this algorithm
  • Naive Bayes holds strong assumptions, as discussed above.

Naive Bayes Applications

Below are some popular applications that Naive Bayes is used for:

  • Real-time prediction: Because Naive Bayes is fast and it’s based on Bayesian statistics, it works well at making predictions in real-time. In fact, a lot of popular real-time models or online models are based on Bayesian statistics.
  • Multiclass prediction: As previously stated, Naive Bayes works well when there are more than two classes for the output variable.
  • Text classification: Text classification also includes sub-applications like spam filtering and sentiment analysis. Since Naive Bayes works best with discrete variables, it tends to work well in these applications.
  • Recommendation systems: Naive Bayes is commonly used alongside other algorithms like Collaborative Filtering to build recommendations systems like Netflix’s recommended for you section, or Amazon’s recommended products, or Spotify’s recommended songs.

Thanks for Reading!

Terence Shin

Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here.