Original article was published by Tenzin Migmar on Artificial Intelligence on Medium
Algorithms in Machine Learning
A wide scope — but not comprehensive — guide to algorithms.
Algorithms are the agents that propel machine learning forward, and they’re the source of life that powers artificial intelligence — a technology that will transform virtually every aspect our our lives. In this guide, you will be exposed to these algorithms and gain a rudimentary understanding of how these models work.
Linear regression is a supervised learning algorithm used to estimate the predictions of values between two continuous variables. The algorithm searches for and defines a linear relationship between the independent (the value we’re trying to predict) and the dependent variables (values used to predict) by fitting a line of best fit through the data. The line of best fit can be modeled as a single linear equation: y = θ0 + θ1 * x. To improve the results and accuracy of our algorithms predictions, we must minimize the error between the data points and line of best fit to the highest degree possible. Through training, our model learns to continuously update and minimize θ0 and θ1, but to do this, we’ll introduce the cost function.
Cost function (J) is used as the root mean squared error or RMSE of the predicted y values (our model predicts the price of a 3-floor house in Toronto is 987,000) and the actual/true y values (the price of a 3-floor house in Toronto costs 1,200,000). We want to minimize the RMSE to the lowest extent that exists, and in doing so, we’ll refer to Gradient Descent.
The concept of Gradient Descent is to start at arbitrary values for θ0 and θ1 and then to iterate through updating the values to minimize error with each iteration until arriving at the minimum cost.
A form of a classification algorithm, Logistic Regression predicts in a binary fashion. The output is designated to be assigned to one of two classes. For example, malignant or benign tumors, whether a student failed or passed history class, etc. The logistic function is also known as the sigmoid function: 1 / (1 + e^-value)
Logistic Regression operates similar to its cousin Linear Regression, but a threshold (0.5) is set to filter the continuous values between 0 and 1 into two classifications. For example: suppose we wanted to classify breast cancer tumors as malignant or benign based on features like radius, texture, concavity, etc. If the total value output was greater than 0.5, then the tumor would be classified as malignant, and if not, then the tumor would be classified as benign.
Decision trees heavily rely on a flow chart model and preset conditions combined with data to determine the prediction for a given value. Decision trees are composed of individual nodes. Nodes have hierarchical classes: root nodes are the first feature we encounter to split the data upon, internal nodes are features that can be further split on, the ending nodes where predictions of 0 or one (final prediction for one of two classes) is made are referred to as the leaf nodes, and the nodes as a result of internal nodes being split on are called the node’s children.
When constructing the decision trees, the order of influence matters. For example, if we were working with the Titanic dataset, we would look to select a root node that has the highest degree of predictive power, in which case, we would elect the sex of the passenger as the root node because woman were in higher favor of survival with lifeboats historically. From then, we would consider the remaining features and select their positions in the decision tree in accordance with the predictive power.
The premise of decision trees is to rely on control flow to arrive at a prediction.
Random forest is a supervised learning algorithm for classification and regression problems that utilizes the methodology of decision trees, but true to the name of the algorithm, relies on a forest of decision trees. When training, Random Forest generates the decision trees and when predicting new data points, if the problem at hand involves classification, Random Forest will output the mode of the classes. If the solution entails regression, then the algorithm will give the mean prediction of the individual trees.
The concept behind Random Forest is that because Decision Trees individually are prone to overfitting due to the nature of the model, the Random Forest is designed to reduce overfitting (when our model is overly fit to our training data, it performs poorly on the testing data due to the incorrect assumption that real-world data will bare an identical resemblance to the training data) Random Forests generally perform well on most datasets and have high interpretability making them easy to understand and explain, however, they are quite computationally heavy.
Support Vector Machines
Support Vector Machines (SVMS) are a supervised learning algorithm for classification and regression problems, but more commonly so for classification. The model works with hyperplanes in n-dimensional space — n correlates to the number of features your data has — and plots individual data points. The value of each feature is treated as one of the coordinates (also known as support vectors, hence the name of the algorithm) of the data point. For instance, suppose our data has two features: weight and height. These two features will be plotted in two-dimensional space, and each value for the features will be a coordinate.
Now, we’ll fit a line that splits the two classifications of groups in our data, and the line must be positioned as such that it leaves the largest margin of space possible between the two closest points with each of the one points representing one of the two classifications. If the data cannot be split to produce accurate predictions, then we’ll have to introduce more dimensions.
K-Nearest Neighbors, also known as KNN, is a supervised learning algorithm that is designed to take in as an assumption that data values in closer proximity are within the same classification. The whole model heavily relies on the concept that distance can be used to make accurate predictions.
The K in K-Nearest Neighbors refers to the number of neighbors distances we’ll calculate to make a prediction. Let’s suppose we had an Iris dataset in which we’ll classify various species of Irises (Setosa, Virginica, & Versicolor) and our features for classification were petal width, petal length, sepal width, & sepal length.
The training data points for this dataset would be plotted to their respective values. From there, upon prediction, suppose we had a testing data point, an object with an unknown classification. If we set k = 3, the amount of nearest neighbors to the unknown object our algorithm would search for would be 3, and the object would be assigned to the most common class.
It’s best practice to set k to an uneven number which would eradicate running the risk of a tie.
Artificial Neural Networks
Artificial neural networks are a type of machine learning that is loosely modeled off the human brain. To create a neural network we must have at least 3 layers of neurons: an input layer, hidden layer, and output layer.
The input layer is made up of the features of our data, for example: if we wanted our neural network trained to classify handwritten digits, each pixel could be a feature. While the output layer is composed of the predictions. On the same note of the last example, each output neuron may represent a digit outcome. The hidden layer(s) usually make up the bulk of our neural network, and the layers in between the hidden layers have neurons that have connections between them. For a neural network used to classify images, each layer may serve to recognize a certain portion of the image. The first hidden layer may focus on small corners in the digits, the next may grapple with the smaller lines in the digits, and so and so forth. With each layer, the complexity of the portion of the image gradually increases.
Depending on the complexity of our model, we may have many hidden layers and that would result in a deep neural network. Deep learning is a subset of machine learning. As we’re training the neural network, what’s happening is that the network is refining and tuning the parameters of the connections between the neurons until it is able to make accurate predictions.