SVM Hyper-parameter Tuning using GridSearchCV

Source: Artificial Intelligence on Medium

In my previous article, I have illustrated the concepts and mathematics behind Support Vector Machine (SVM) algorithm, one of the best supervised machine learning algorithms for solving classification or regression problems. It is used in a variety of applications such as face detection, handwriting recognition and classification of emails. In order to show how SVM works in Python including, kernels, hyper-parameter tuning, model building and evaluation on using the Scikit-learn package, I will be using the famous Iris flower dataset to classify the types of Iris flower.

About the dataset

The Iris flower data set is a multivariate data set introduced by Sir Ronald Fisher in the 1936 as an example of discriminant analysis.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so there are 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.

Here’s a picture of the three different Iris species ( Iris setosa, Iris versicolor, Iris virginica). Given the dimensions of the flower, we will predict the class of the flower.

Import the libraries

import pandas as pd 
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

Read the input data from the external CSV

irisdata = pd.read_csv('iris.csv')

Take a look at the data

irisdata.head()
irisdata.info()
The head() function is to return the first 5 rows of the iris data
info() function is to print a short summary of the iris data

Visualise Data with Pairs Plots

we apply Seaborn which is a library for making statistical graphics in Python. It is built on top of matplotlib and closely integrated with pandas data structures. This function will create a grid of Axes such that each numeric variable in irisdata will by shared in the y-axis across a single row and in the x-axis across a single column.

import seaborn as sns
sns.pairplot(irisdata,hue='class',palette='Dark2')
A pairs plot allows us to see both distribution of single variables and relationships between two variables.

Train Test Split — Split your data into a training set and a testing set.

from sklearn.model_selection import train_test_split
X = irisdata.drop('class', axis=1)
y = irisdata['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

Apply kernels to transform the data to a higher dimension

kernels = ['Polynomial', 'RBF', 'Sigmoid','Linear']#A function which returns the corresponding SVC model
def getClassifier(ktype):
if ktype == 0:
# Polynomial kernal
return SVC(kernel='poly', degree=8, gamma="auto")
elif ktype == 1:
# Radial Basis Function kernal
return SVC(kernel='rbf', gamma="auto")
elif ktype == 2:
# Sigmoid kernal
return SVC(kernel='sigmoid', gamma="auto")
elif ktype == 3:
# Linear kernal
return SVC(kernel='linear', gamma="auto")

Train a model

Now it’s time to train a Support Vector Machine Classifier.

Call the SVC() model from sklearn and fit the model to the training data

for i in range(4):
# Separate data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
# Train a SVC model using different kernal
svclassifier = getClassifier(i)
svclassifier.fit(X_train, y_train)
# Make prediction
y_pred = svclassifier.predict(X_test)
# Evaluate our model
print("Evaluation:", kernals[i], "kernel")
print(classification_report(y_test,y_pred))

Since SVMs is suitable for small data set: irisdata, the SVM model would be good with high accuracy expect using Sigmoid kernels. We could be able to determine which kernel performs the best based on the performance metrics such as precision, recall and f1 score.

In order to improve the model accuracy, there are several parameters need to be tuned. Three major parameters including:

  1. Kernels: The main function of the kernel is to take low dimensional input space and transform it into a higher-dimensional space. It is mostly useful in non-linear separation problem.

2. C (Regularisation): C is the penalty parameter, which represents misclassification or error term. The misclassification or error term tells the SVM optimisation how much error is bearable. This is how you can control the trade-off between decision boundary and misclassification term.

when C is high it will classify all the data points correctly, also there is a chance to overfit.

3. Gamma: It defines how far influences the calculation of plausible line of separation.

when gamma is higher, nearby points will have high influence; low gamma means far away points also be considered to get the decision boundary.

Tuning the hyper-parameters of an estimator

Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn, they are passed as arguments to the constructor of the estimator classes. Grid search is commonly used as an approach to hyper-parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid.

GridSearchCV helps us combine an estimator with a grid search preamble to tune hyper-parameters.

Import GridsearchCV from Scikit Learn

from sklearn.model_selection import GridSearchCV

Create a dictionary called param_grid and fill out some parameters for kernels, C and gamma

param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001],'kernel': ['rbf', 'poly', 'sigmoid']}

Create a GridSearchCV object and fit it to the training data

grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
grid.fit(X_train,y_train)

Find the optimal parameters

print(grid.best_estimator_)
found the best estimator using grid search

Take this grid model to create some predictions using the test set and then create classification reports and confusion matrices

grid_predictions = grid.predict(X_test)
print(confusion_matrix(y_test,grid_predictions))
print(classification_report(y_test,grid_predictions))
#Output
[[15 0 0]
[ 0 13 1]
[ 0 0 16]]

For the coding and dataset, please check out here.

Summary: Now you should know

  • Visualise data with Pairs Plots
  • Understand three major parameters of SVMs: Gamma, Kernels and C (Regularisation)
  • Apply kernels to transform the data including ‘Polynomial’, ‘RBF’, ‘Sigmoid’, ‘Linear’
  • Use GridSearch to tune the hyper-parameters of an estimator

Final Thoughts

Thank you for reading. Hope you now understand how to build the SVMs in Python. Please leave your comments below if you have any thoughts.

You can connect with me on LinkedIn, Medium, Instagram, and Facebook.