Support Vector Machines



In my previous story I used Support Vector Machines(SVMs) to identify vehicles in an image. In this article I am going to discuss what are Support Vector Machines, mathematical formulation, and their implementation in python.

What are SVMs?

SVMs are supervised learning models, typically used in binary classification problems. A support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection.

We represent a hyperplane using w’x=0 where w’ is w transpose. This equation represents a linear system where x lies in the null space of w’. But for the sake of simplicity we will use notation w.x+b=0 as our hyperplane which represents a line in 2-D. Here vector w is perpendicular to the hyperplane.

A SVM not only constructs a hyperplane but also constructs a optimal hyperplane and that is the reason it is more robust than the logistic regression models. Which hyperplane is optimal hyperplane? most of the discussion will be answering this question

One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized.

fig 1: SVM hyperpalnes and support vectors

In the image there are two classes one represented by circles and other represented by triangles. Each data point is a n-dimensional vector(2-D in our case) which represents an observation . In the image, data points x_1, x_2, x_3 & x_4 are the support vectors because they are the nearest vectors to the hyperplane H_o and the objective is to find a hyperplane H_o such that WIDTH is maximum.

Lets say circular data points are negative, and triangular data points are positive. To calculate an expression for WIDTH we need to impose some constraints:

  1. There should be no data point in between H_1 and H_2.
  2. For each positive data point x, w.x+b≥1. For a data point that is present on H_2, w.x+b=1
  3. For each negative data point x, w.x+b≤-1. For a data point that is present on H_1, w.x+b=-1

Lets introduce a new variable y_i which is 1 for positive data points and -1 for negative data points. Now we can represent equation 1 and 2 using a single expression y_i(w.x_i+b)≥1.

fig 2: SVM WIDTH

In the above image it is easy to see that WIDTH is equal to the projection of difference of vectors present on H_1 and H_2 onto the unit vector perpendicular to H_o. Hence

WIDTH = (x_3-x_1).(w/||w||) … equation 4

w/||w|| is the unit vector which is perpendicular to the hyperplanes.

we know that for x_1: w.x_1+b+1=0 … equation 5

we know that for x_3: w.x_3+b-1=0 …equation 6

using equation 5 & 6 w.(x_3-x_1)=2, substituting this result in equation 4

w*WIDTH=2(w/||w||)

WIDTH=2/||w||

To increase WIDTH we need to minimize||w||. For mathematical convenience minimize (1/2)(||w||²). However, this optimization problem is constrained, hence we need to use Lagrange Multiplier to honor the constraints.

fig 3: Lagrange Multiplier 1

In equation 1 of above image that negative term is the summation of constraints multiplied with the Lagrange multiplier alpha for each sample data.

fig 3: Lagrange Multiplier 2

This mathematical expression implies that this this optimization depends only on the dot products of pairs of sample vectors x_i and x_j.

Implementation in Python

from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

def train_model(X_train,y_train):
svc=SVC()
svc.fit(X_train,y_train)
return svc

Above function train model takes training data and labels as input, SVC is a class defined in sklearn.svm package. Function svc.fit() takes training examples and corresponding labels as inputs and trains the model. Now to predict the values using the model

clf=train_model(X_train,y_train)
prediction = clf.predict(features)

using the predict() function you can determine the class of a particular object. Here ‘features’ are the features of the object whose class you want to identify. This jupyter notebook has the complete implementation of a SVM in context of vehicle detection.

I hope you enjoyed this post and learned something new and useful. If you find any flaws in my understanding feel free to comment.

Thanks for reading :)

Source: Deep Learning on Medium