# Artificial Intelligence Bootcamp

Original article can be found here (source): Deep Learning on Medium

# Machine Learning Algorithms

Now that we know the difference between the types of learning algorithms, let’s dive into some examples of each type in the context of machine learning.

## K-Means Clustering

K-Means is an unsupervised learning algorithm used to categorize data by a specified number of clusters. Central points, or centroids, are picked with the goal of minimizing their overall size while assigning the closest points to each centroid to its respective class. The graph below gives a good example of this in two dimensions (this is not limited to two dimensions, but it is good for intuition). There are two centroids that create two categories (consider them here purple and yellow), the closest points to each centroid will be assigned until all points are classified.

`import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansdata = pd.read_excel('Data Science/banknote.xlsx')km = KMeans(n_clusters=2).fit(data.drop(['Variance of Wavlet Transform', 'Skewness of Wavelet Transform', 'Class'], axis=1))centroids = km.cluster_centers_print(centroids)plt.scatter(data['Kurtosis of Wavelet Transform'], data['Entropy of Image'], c= km.labels_.astype(float), s=75, alpha=0.3)plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=100)plt.show()`

This helps us identify unknown relationships within our data. As an example in the marketing context, we may notice a particular class holds a certain age group, gender, or product. These results can be used for further analysis or (if appropriate) can be used to come up with labels for supervised learning algorithms! A good example of this in my experience is with identifying and predicting anomalies and using unsupervised learning models (isolation forests) to label a dataset for a supervised learning model.

## Support Vector Machines

Support vector machines are a supervised learning algorithm widely used for classification. The way support vector machines work is simple and intuitive in two dimensions, but like K-Means it is not limited to two dimensions.

The goal is to create an optimal hyperplane to classify the data into their respective categories. In the chart above the inputs or features being used to predict circle or triangle are x and y. The support vector machine uses the inputs and known data labels (circle or triangle) to optimize the hyperplane to find the best possible line through the data to categorize the points. Below is a more complicated but somewhat more practical multivariate visualization in python.

`import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom mlxtend.plotting import plot_decision_regionsfrom sklearn.svm import SVCdata = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Skewness of Wavelet Transform', 'Kurtosis of Wavelet Transform', 'Class'], axis=1)y = data['Class']model = SVC()model.fit(X, y)# Plot Decision Region using mlxtend's awesome plotting functionplot_decision_regions(X=X.values, y=y.values, clf=model, legend=2)# Update plot object with X/Y axis labels and Figure Titleplt.xlabel(X.columns[0], size=14)plt.ylabel(X.columns[1], size=14)plt.title('SVM Decision Region Boundary', size=16)plt.show()`

To actually build this model for binary classification we will…

• Use build a train test split as mentioned above
• Fit the training data to the model
• Create predictions using our testing set
• Assign the predictions to 0 or 1
• Evaluate the predictions using a classification report
`import pandas as pdfrom sklearn.svm import SVCfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportdata = pd.read_excel('Data Science/banknote.xlsx')X = data.drop(['Class'], axis=1)y = data['Class']X_train, X_test, y_train, y_test = train_test_split(X, y)model = SVC()model.fit(X_train, y_train)predictions = model.predict(X_test)predictions[predictions>.5] = 1predictions[predictions<=.5] = 0print(classification_report(y_test, predictions))`

Support Vector Classifier Results

`precision recall f1-score support 0 1.00 0.98 0.99 188 1 0.98 1.00 0.99 155 accuracy 0.99 343 macro avg 0.99 0.99 0.99 343weighted avg 0.99 0.99 0.99 343[Finished in 0.661s]`

Support vector machines are applicable to a large number of real-world problems. In the past, I have used it in tandem with in-house systems to assist in balancing videogames.