How to impliment a Gaussian Naive Bayes Classifier in python from scratch?

Source: Artificial Intelligence on Medium

Did you ever ask yourself what is the oldest Machine Learning algorithm?

Today we have a lot of Machine Learning algorithms, from simple KNN to ensemble algorithms one and even neural networks. Sometimes they look so complicated that you can think that they were developed in the latest years, and Machine Learning, in general, is something new. But the first algorithms appeared earlier than you think.

Naive Bayes algorithm.

Naive Bayes algorithm is one of the oldest forms of Machine Learning. The Bayes Theory (on which is based this algorithm) and the basics of statistics were developed in the 18th century. Since them until in 50′ al the computations were done manually until appeared the first computer implementation of this algorithm.

But what does this algorithm so simple that it could be used to manually?

The simplest form of this algorithm is formed from 2 main parts:

  • The Naive Bayes formula (Theorem).
  • And a distribution (in this case Gaussian one).

The Naive Bayes Theory (shortly).

The Naive Bayes Theorey in the most cases can be reduced to a formula:

The Naive Bayes formula [source — https://miro.medium.com/max/1000/1*7lg_uLm8_1fYGjxPbTrQFQ.png]

This formula means that the probability of happening of the event A knowing that event B happened already..

Somehow the explanation of how Naive Bayes Theory is out of the scope of this article, that’s why I highly recommend you to read this article on NB theory.

What is a distribution?

Distribution, basically show how values are dispersed in series, and how frequently they appear in this series. Here is an example:

Gaussian Distribution [source — https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Normal_Distribution_PDF.svg/1200px-Normal_Distribution_PDF.svg.png]

How you can see in the plot above, the Gaussian or Normal Distribution depends on 2 parameters of a series — The mean and the standard Deviation. Knowing these 2 parameters of the series we can find it’s distribution function. It has the next form:

The Gaussian Distribution Function [source — https://i.stack.imgur.com/bBIbn.png]

But, why do we need this function? Very simple, the majority of data in the world is represented as continuous values, but guess what, you can’t calculate the probability of the value X to get the value v. It would be 0. Why? Technically when you divide something by infinity you get what? correct — zero.

So, how we can solve this problem? Of course, using the Gaussian Distribution Function, illustrated above. Imputing instead of x their value from a series, the mean value of the series and its standard deviation you can find out the probability that the value x will occur. Voila.

So how this all work together?

I don’t know why, but personally for me sometimes was easier to understand how an algorithm work by implimenting it in code. So let’s start:

  1. Firstly let’s import all dempendencies:
# Importing all needed libraries
import numpy as np
import math

That’s all, yeah we need the pure numpy and math library.

2. Now let’s create a class that will have the implimentation of the algorithm and first function that will separate our data set by class.

# gaussClf will be the class that will have the Gaussian naive bayes classifier implimentation
class gaussClf:
def separate_by_classes(self, X, y):
''' This function separates our dataset in subdatasets by classes '''
self.classes = np.unique(y)
classes_index = {}
subdatasets = {}
cls, counts = np.unique(y, return_counts=True)
self.class_freq = dict(zip(cls, counts))
print(self.class_freq)
for class_type in self.classes:
classes_index[class_type] = np.argwhere(y==class_type)
subdatasets[class_type] = X[classes_index[class_type], :]
self.class_freq[class_type] = self.class_freq[class_type]/sum(list(self.class_freq.values()))
return subdatasets

separate_by_classes function separates out dataset by classes to calculate the mean and standart deviations for every column, separatly for every class.

3. The fit function.

def fit(self, X, y):
''' The fitting function '''
separated_X = self.separate_by_classes(X, y)
self.means = {}
self.std = {}
for class_type in self.classes:
# Here we calculate the mean and the standart deviation from datasets
self.means[class_type] = np.mean(separated_X[class_type], axis=0)[0]
self.std[class_type] = np.std(separated_X[class_type], axis=0)[0]

Next goes the fit function, where we just calculate for every class the mean and the standart deviation for every column.

4. The Gaussian Distribution Function.

def calculate_probability(self, x, mean, stdev):
''' This function calculates the class probability using gaussian distribution '''
exponent = math.exp(-((x - mean) ** 2 / (2 * stdev ** 2)))
return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent

The calculate_probability function calculates using the mean and standart deviation of a series the probability that a functions will accur in a series.

5. The predict_function.

def predict_proba(self, X):
''' This function predicts the probability for every class '''
self.class_prob = {cls:math.log(self.class_freq[cls], math.e) for cls in self.classes}
for cls in self.classes:
for i in range(len(self.means)):
print(X[i])
self.class_prob[cls]+=math.log(self.calculate_probability(X[i], self.means[cls][i], self.std[cls][i]), math.e)
self.class_prob = {cls: math.e**self.class_prob[cls] for cls in self.class_prob}
return self.class_prob

Here is the function that returns a dictionary with probabilities of the sample to belong to a class. In classic sklearn estimators predict_proba function gets a list of samples and return a list of labels. To make it more easy to use I decided to impliment it only for one sample.

Also in this funtion I don’t compute the prior probability, to get read of useless computatuion, because for every class estiamtion you need to divide to the same value getted value above.

6. The predict function.

def predict(self, X):
''' This funtion predicts the class of a sample '''
pred = []
for x in X:
pred_class = None
max_prob = 0
for cls, prob in self.predict_proba(x).items():
if prob>max_prob:
max_prob = prob
pred_class = cls
pred.append(pred_class)
return pred

Here I decided to use the classic method of it’s implimentation. List in, list out.

You can see the code on my github repository.

Comparing with sklearn.

Now let’s compare our implementation with sklearn one. In sklearn library, the Gaussian Naive Bayse is implemented as GaussianNB class, and to import it you should write this piece of code:

from sklearn.naive_bayes import GaussianNB

The implementation we will let on you, you can find how to do it there. So what are the results, on the iris dataset?

Our implimentation: 0.868421052631579 accuracy

Sklearn implimentation: 1.0 accuaracy.

That happened because the sklearn model uses a little other implementation of this model than we used, you can read more on the sklearn website.

Conclusion.

So in this article, I showed you how to implement the most simple form of practically the oldest Machine Learning algorithm, Gaussian Naive Bayes algorithm and shortly how it works. I highly recommend you to learn how works the sklearn implementation and try to implement the BernoulliNB on your own.

Thank you.