Artificial Intelligence Bootcamp

Original article can be found here (source): Deep Learning on Medium

Machine Learning Algorithms

Now that we know the difference between the types of learning algorithms, let’s dive into some examples of each type in the context of machine learning.

K-Means Clustering

K-Means is an unsupervised learning algorithm used to categorize data by a specified number of clusters. Central points, or centroids, are picked with the goal of minimizing their overall size while assigning the closest points to each centroid to its respective class. The graph below gives a good example of this in two dimensions (this is not limited to two dimensions, but it is good for intuition). There are two centroids that create two categories (consider them here purple and yellow), the closest points to each centroid will be assigned until all points are classified.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = pd.read_excel('Data Science/banknote.xlsx')km = KMeans(n_clusters=2).fit(data.drop(['Variance of Wavlet Transform', 'Skewness of Wavelet Transform', 'Class'], axis=1))centroids = km.cluster_centers_
print(centroids)
plt.scatter(data['Kurtosis of Wavelet Transform'], data['Entropy of Image'], c= km.labels_.astype(float), s=75, alpha=0.3)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=100)
plt.show()
Code Source

This helps us identify unknown relationships within our data. As an example in the marketing context, we may notice a particular class holds a certain age group, gender, or product. These results can be used for further analysis or (if appropriate) can be used to come up with labels for supervised learning algorithms! A good example of this in my experience is with identifying and predicting anomalies and using unsupervised learning models (isolation forests) to label a dataset for a supervised learning model.

Support Vector Machines

Support vector machines are a supervised learning algorithm widely used for classification. The way support vector machines work is simple and intuitive in two dimensions, but like K-Means it is not limited to two dimensions.

Photo Source

The goal is to create an optimal hyperplane to classify the data into their respective categories. In the chart above the inputs or features being used to predict circle or triangle are x and y. The support vector machine uses the inputs and known data labels (circle or triangle) to optimize the hyperplane to find the best possible line through the data to categorize the points. Below is a more complicated but somewhat more practical multivariate visualization in python.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
from sklearn.svm import SVC
data = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Skewness of Wavelet Transform', 'Kurtosis of Wavelet Transform', 'Class'], axis=1)
y = data['Class']
model = SVC()
model.fit(X, y)
# Plot Decision Region using mlxtend's awesome plotting function
plot_decision_regions(X=X.values,
y=y.values,
clf=model,
legend=2)
# Update plot object with X/Y axis labels and Figure Title
plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)
plt.show()
Code Source

To actually build this model for binary classification we will…

  • Use build a train test split as mentioned above
  • Fit the training data to the model
  • Create predictions using our testing set
  • Assign the predictions to 0 or 1
  • Evaluate the predictions using a classification report
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
data = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Class'], axis=1)
y = data['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y)model = SVC()
model.fit(X_train, y_train)
predictions = model.predict(X_test)predictions[predictions>.5] = 1
predictions[predictions<=.5] = 0
print(classification_report(y_test, predictions))

Support Vector Classifier Results

precision recall f1-score support

0 1.00 0.98 0.99 188
1 0.98 1.00 0.99 155

accuracy 0.99 343
macro avg 0.99 0.99 0.99 343
weighted avg 0.99 0.99 0.99 343
[Finished in 0.661s]

Support vector machines are applicable to a large number of real-world problems. In the past, I have used it in tandem with in-house systems to assist in balancing videogames.