Python Implementation of SVM, Logistics Regression, Naive Bayes, Decision Tree, Random Forest…

Original article was published on Deep Learning on Medium

Python Implementation of SVM, Logistics Regression, Naive Bayes, Decision Tree, Random Forest using Scikit-learn (just 3 line of Code)

Python Implementation of 5 Machine Learning Algorithm for Machine Learning Classification Problems

Image by mohamed Hassan from Pixabay

Hello Programmers!

Here I am gonna show How to Implement SVM, Logistics Regression, Naive Bayes, Decision Tree, Random Forest in Python using Scikit-learn or sklearn. And yeah this is too easy to implement, just write three lines of Python code, and you get your Decision Tree classifier.

Because this is beauty of sklearn (Scikit-learn).

Note: You can get this notebook in my Github, I give you link below.

So let’s dirty our hands by some coding.

First we need a dataset, and I have a dataset of Market where you have to predict that customer purchasing item or not.

What has in my Dataset?

  1. Have three columns
  2. Age: Age of Person
  3. Estimated Salary: Salary of an individual
  4. Purchased: Customer buy or not

Important Python Libraries:

  1. Numpy for handling arrays
  2. Pandas for creating DataFrame
  3. Sklearn for Machine Learning

Installing Libraries Using pip

#dependencies
!pip install numpy
!pip install pandas
!pip install sklearn

In [1]

train_test_split: Split arrays or matrices into random train and test subsets or you can say it split the dataset into train set and test set by 20%

# Importing the libraries
import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('dataset.csv')
display(dataset.head())

X = dataset[['Age', 'EstimatedSalary']].values
y = dataset['Purchased'].values
print('-'*80)
print(f'Shape of X is {X.shape}\nShape of y is {y.shape}')

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
print('-'*80)
print(f"Lenght of X_train: {len(X_train)}\nLenght of X_test: {len(X_test)}")
print(f"Lenght of y_train: {len(y_train)}\nLenght of y_test: {len(y_test)}")

In [2]

StandardScaler: Standardize features by removing the mean and scaling to unit variance

The standard score of a sample x is calculated as:

z = (x — u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False. You can find more about it: Click

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
for i in range(10):
print(X_train[i])
print('-'*80)
for i in range(10):
print(X_test[i])

Algorithms

  1. Support Vector Classifer (SVC)
# Fitting Support Vector Classifer to the Training set
from sklearn.svm import SVC
classifier = SVC()
print(classifier)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Accuracy on the Test set results
from sklearn.metrics import accuracy_score
print('\n'+'-'*20+'Accuracy Score on the Test set'+'-'*20)
print("{:.0%}".format(accuracy_score(y_test,y_pred)))

2. Logistic Regression

# Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
print(classifier)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Accuracy on the Test set results
from sklearn.metrics import accuracy_score
print('\n'+'-'*20+'Accuracy Score on the Test set'+'-'*20)
print("{:.0%}".format(accuracy_score(y_test,y_pred)))

3. Naive Bayes

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
print(classifier)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Accuracy on the Test set results
from sklearn.metrics import accuracy_score
print('\n'+'-'*20+'Accuracy Score on the Test set'+'-'*20)
print("{:.0%}".format(accuracy_score(y_test,y_pred)))

4. Decision Tree Classifier

# Fitting Decision Tree Classifier to the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
print(classifier)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Accuracy on the Test set results
from sklearn.metrics import accuracy_score
print('\n'+'-'*20+'Accuracy Score on the Test set'+'-'*20)
print("{:.0%}".format(accuracy_score(y_test,y_pred)))

5. Random Forest Classifier

# Fitting Random Forest Classifier to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
print(classifier)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Accuracy on the Test set results
from sklearn.metrics import accuracy_score
print('\n'+'-'*20+'Accuracy Score on the Test set'+'-'*20)
print("{:.0%}".format(accuracy_score(y_test,y_pred)))

Github Notebook: Click Here