The best Machine Learning algorithm for Email Classification

Original article was published by Mahnoor Javed on Artificial Intelligence on Medium


Support Vector Machines

Support Vector Machines is also a type of Supervised Learning used for classification, regression as well as outlier detection. We can use the SVM algorithm to classify data points into 2 classes, through a plane that separates them. SVM has a straight decision boundary. The SVM algorithm is quite versatile, different Kernel functions can be specified for the decision function.

SVM algorithm is based on the hyperplane that separates the two classes, the greater the margin, the better the classification (also called margin maximization).

Our classifier is the C-Support Vector Classification with linear kernel and value of C = 1

clf = SVC(kernel = ‘linear’, C=1)

import sys
from time import time
sys.path.append("C:\\Users\\HP\\Desktop\\ML Code\\")
from email_preprocess import preprocess
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
### features_train and features_test are the features for the training
### and testing datasets, respectively
### labels_train and labels_test are the corresponding item labels
features_train, features_test, labels_train, labels_test = preprocess()
#defining the classifier
clf = SVC(kernel = 'linear', C=1)
#predicting the time of train and testing
t0 = time()
clf.fit(features_train, labels_train)
print("\nTraining time:", round(time()-t0, 3), "s\n")
t1 = time()
pred = clf.predict(features_test)
print("Predicting time:", round(time()-t1, 3), "s\n")
#calculating and printing the accuracy of the algorithm
print("Accuracy of SVM Algorithm: ", clf.score(features_test, labels_test))
SVM Results (Image by author)

The accuracy of the SVM algorithm is 0.9596. We can see a visible tradeoff between the accuracy and the training time. An increase in the accuracy of the algorithm is a result of the longer training time (22.7s as compared to 0.13s in the case of Naïve Bayes). We can play with the training data as well as the kernels to come to an optimum selection which would yield a good accuracy score with less training time!

We will first slice the training dataset down to 1% of its original size tossing out 99% of the training data. With the rest of the code unchanged, we can observe a significant reduction in the training time and a consequent reduction in accuracy. The tradeoff is that the accuracy almost always goes down when we slice down the training data.

Use the following code to slice the training data to 1%:

features_train = features_train[:len(features_train)//100]
labels_train = labels_train[:len(labels_train)//100]

As can be seen, with 1% training data, the training time of the algorithm has been reduced to 0.01s with reduced accuracy of 0.9055.

SVM with 1% Training Data (Image by author)

With 10% Training Data, the accuracy is 0.9550 with training time 0.47s.

SVM with 10% Training Data (Image by author)

We may also change the kernels and the value of C in the scikit-learn’s C-Support Vector Classification.

With 100% Training Data, RBF kernel, and the value of C set to 10000, we get an accuracy of 0.9891 with a training time of 14.718.

SVM with 100% Training Data, RBF kernel, and C=10000 (Image by author)