In my previous story I used Support Vector Machines(SVMs) to identify vehicles in an image. In this article I am going to discuss what are Support Vector Machines, mathematical formulation, and their implementation in python.

**What are SVMs?**

SVMs are supervised learning models, typically used in binary classification problems. A support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection.

We represent a hyperplane using **w’x=0 **where **w’ **is **w **transpose. This equation represents a linear system where **x **lies in the null space of **w’. **But for the sake of simplicity we will use notation **w.x+b=0 **as our hyperplane which represents a line in 2-D. **Here vector w is perpendicular to the hyperplane.**

A SVM not only constructs a hyperplane but also constructs a optimal hyperplane and that is the reason it is more robust than the logistic regression models. Which hyperplane is optimal hyperplane? most of the discussion will be answering this question

One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized.

In the image there are two classes one represented by circles and other represented by triangles. Each data point is a n-dimensional vector(2-D in our case) which represents an observation . In the image, data points **x_1, x_2, x_3 & x_4** are the support vectors because they are the nearest vectors to the hyperplane **H_o** and the objective is to find a hyperplane **H_o **such that **WIDTH **is maximum.

Lets say circular data points are negative, and triangular data points are positive. To calculate an expression for **WIDTH** we need to impose some constraints:

- There should be no data point in between
**H_1**and**H_2.** - For each positive data point
**x, w.x+b≥1.**For a data point that is present on**H_2, w.x+b=1** - For each negative data point
**x, w.x+b≤-1.**For a data point that is present on**H_1, w.x+b=-1**

Lets introduce a new variable **y_i **which is 1 for positive data points and -1 for negative data points. Now we can represent equation 1 and 2 using a single expression **y_i(w.x_i+b)≥1.**

In the above image it is easy to see that **WIDTH** is equal to the projection of difference of vectors present on **H_1** and **H_2 **onto the unit vector perpendicular to **H_o**. Hence

**WIDTH = (x_3-x_1).(w/||w||) … **equation 4

**w/||w|| **is the unit vector which is perpendicular to the hyperplanes.

we know that for x_1: **w.x_1+b+1=0 … **equation 5

we know that for x_3: **w.x_3+b-1=0 …**equation 6

using equation 5 & 6 **w.(x_3-x_1)=2, **substituting this result in equation 4

**w*WIDTH=2(w/||w||)**

**WIDTH=2/||w||**

To increase **WIDTH** we need to minimize**||w||. **For mathematical convenience minimize **(1/2)(||w||²). **However, this optimization problem is constrained, hence we need to use Lagrange Multiplier to honor the constraints.

In equation 1 of above image that negative term is the summation of constraints multiplied with the Lagrange multiplier alpha for each sample data.

This mathematical expression implies that this this optimization depends only on the dot products of pairs of sample vectors **x_i **and **x_j.**

**Implementation in Python**

fromsklearn.svmimportLinearSVCfromsklearn.svmimportSVCfromsklearn.preprocessingimportStandardScalerdeftrain_model(X_train,y_train):

svc=SVC()

svc.fit(X_train,y_train)

returnsvc

Above function train model takes training data and labels as input, SVC is a class defined in sklearn.svm package. Function svc.fit() takes training examples and corresponding labels as inputs and trains the model. Now to predict the values using the model

clf=train_model(X_train,y_train)

prediction = clf.predict(features)

using the predict() function you can determine the class of a particular object. Here ‘features’ are the features of the object whose class you want to identify. This jupyter notebook has the complete implementation of a SVM in context of vehicle detection.

I hope you enjoyed this post and learned something new and useful. If you find any flaws in my understanding feel free to comment.

Thanks for reading :)

Source: Deep Learning on Medium