Original article can be found here (source): Deep Learning on Medium

# Machine Learning Algorithms

Now that we know the difference between the types of learning algorithms, let’s dive into some examples of each type in the context of machine learning.

## K-Means Clustering

K-Means is an unsupervised learning algorithm used to categorize data by a specified number of clusters. Central points, or *centroids*, are picked with the goal of minimizing their overall size while assigning the closest points to each centroid to its respective class. The graph below gives a good example of this in two dimensions (this is not limited to two dimensions, but it is good for intuition). There are two centroids that create two categories (consider them here purple and yellow), the closest points to each centroid will be assigned until all points are classified.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeansdata = pd.read_excel('Data Science/banknote.xlsx')km = KMeans(n_clusters=2).fit(data.drop(['Variance of Wavlet Transform', 'Skewness of Wavelet Transform', 'Class'], axis=1))centroids = km.cluster_centers_

print(centroids)plt.scatter(data['Kurtosis of Wavelet Transform'], data['Entropy of Image'], c= km.labels_.astype(float), s=75, alpha=0.3)

plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=100)plt.show()

This helps us identify unknown relationships within our data. As an example in the marketing context, we may notice a particular class holds a certain age group, gender, or product. These results can be used for further analysis or (if appropriate) can be used to come up with labels for supervised learning algorithms! A good example of this in my experience is with identifying and predicting anomalies and using unsupervised learning models (isolation forests) to label a dataset for a supervised learning model.

## Support Vector Machines

Support vector machines are a supervised learning algorithm widely used for classification. The way support vector machines work is simple and intuitive in two dimensions, but like K-Means it is not limited to two dimensions.

The goal is to create an optimal hyperplane to classify the data into their respective categories. In the chart above the inputs or *features* being used to predict circle or triangle are *x *and *y*. The support vector machine uses the inputs and known data labels (circle or triangle) to optimize the hyperplane to find the best possible line through the data to categorize the points. Below is a more complicated but somewhat more practical multivariate visualization in python.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from mlxtend.plotting import plot_decision_regions

from sklearn.svm import SVCdata = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Skewness of Wavelet Transform', 'Kurtosis of Wavelet Transform', 'Class'], axis=1)

y = data['Class']model = SVC()

model.fit(X, y)# Plot Decision Region using mlxtend's awesome plotting function

plot_decision_regions(X=X.values,

y=y.values,

clf=model,

legend=2)# Update plot object with X/Y axis labels and Figure Title

plt.xlabel(X.columns[0], size=14)

plt.ylabel(X.columns[1], size=14)

plt.title('SVM Decision Region Boundary', size=16)plt.show()

To actually build this model for binary classification we will…

- Use build a train test split as mentioned above
- Fit the training data to the model
- Create predictions using our testing set
- Assign the predictions to 0 or 1
- Evaluate the predictions using a classification report

import pandas as pd

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_reportdata = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Class'], axis=1)

y = data['Class']X_train, X_test, y_train, y_test = train_test_split(X, y)model = SVC()

model.fit(X_train, y_train)predictions = model.predict(X_test)predictions[predictions>.5] = 1

predictions[predictions<=.5] = 0print(classification_report(y_test, predictions))

**Support Vector Classifier Results**

precision recall f1-score support

0 1.00 0.98 0.99 188

1 0.98 1.00 0.99 155

accuracy 0.99 343

macro avg 0.99 0.99 0.99 343

weighted avg 0.99 0.99 0.99 343[Finished in 0.661s]

Support vector machines are applicable to a large number of real-world problems. In the past, I have used it in tandem with in-house systems to assist in balancing videogames.