Ultimate Showdown of Machine Learning Algorithms

Original article was published on Artificial Intelligence on Medium


Details on Parameters Chosen

gif by Giphy

1) Random forest

n_estimators — number of decision trees in a forest. [10,50,100]

max_features — Features to be considered while splitting [‘auto’,’sqrt’]

max_depth — the maximum number of levels in the tree [2,4,6,8,10]

n_jobs — number of processes running in parallel, usually set as -1 to do maximum processes at a time.

criterion — it is a way to calculate loss and thus update the model to make the loss lesser and lesser. [‘entropy’,’cross_validation’]

I used ‘auto’ as the max_feature; 8 as the max_depth; -1 as n_jobs and ‘entropy’ as my criterion because they usually give the best results.

the Graph to find an optimum number of trees

However, to find out the optimal number of trees, I used GridSearchCV. It tries all the given parameter combinations and creates a table to show results. As can be seen from the figure, there is no significant increase in the test score after 80 trees. Thus, I decided to train my classifier on 80 trees.

2) K-Nearest Neighbors (KNN)

n_neighbors — number of nearest data points to compare [2,5,8]

n_jobs — number of processes running in parallel, usually set as -1 to do maximum processes at a time

I did not change any default parameters for this model because they would give the best result.

However, to find the optimum number of n_neighbors, I have used GridSearchCV, and this is the graph I got:

The graph to find an optimum number of N-neighbors

According to the graph, the test score declines after 5 n_neighbors, which means 5 is the optimum number of neighbors.

3) Multi-Layer Perceptron (MLP)

alpha– Most commonly known as the learning rate, it tells the network how fast to adjust the gradient. [0.01, 0.0001, 0.00001]

hidden_layer_sizes — It is a tuple of values that consists of the number of hidden nodes at each layer. [(50,50), (100,100,100), (750,750)]

activation — A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

solver — Also known as the optimizer, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

batch_size — It is the number of images to be processed at once. [200,100,200].

I have chosen activation as ‘relu’ and solver as ‘adam’ because these parameters give the best result.

However, to choose the number of hidden layers and alpha, I have used GridSearchCV.

Table to find an optimum number of N-neighbors

As can be seen in the table the best result is received when alpha is 0.001, and hidden_layer_size is (784,784). Therefore, I decided to use those parameters.

4) Convolution Neural Networks (CNN)

learning_rate– it tells the network how fast to adjust the gradient. [0.01, 0.0001, 0.00001]

hidden_layer_sizes — It is a tuple of values that consists of the number of hidden nodes at each layer. [(50,50),(100,100,100),(750,750)]

activation — A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

solver — Also known as the optimizer, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

batch_size — It is the number of images to be processed at once. [200,100,200]

Epochs — Number of times the program should run or how many times the model should be trained. [10,20,200]

I have chosen activation function as ‘relu’ and solver as ‘adam’ because these parameters usually give the best results. In the network, I have added 3 convolution layers, 2 maxpool layers, 3 dropout layers, and at the end one softmax activation function. I did not use GridSearchCV here because there can be a lot of possible combinations that can be tried out, but there won’t be much difference in the results.

5) Mobile Net

Input_shape- It is a tuple consisting of dimensions of an image. [(32,32,1),(128,128,3)].

Alpha- It is the width of the network. [<1,>1,1]

activation — A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

optimizer — Also known as the solver, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

batch_size — It is the number of images to be processed at once. [200,100,200]. Epochs — Number of times the program should run or how many times the model should be trained. [10,20,200]

classes- Number of classes to be classified. [2,4,10]

loss- It tells the network which method to use to calculate the loss i.e. the difference between the predicted and actual value. [‘categorical_crossentropy’, ‘RMSE’]

First, I resized the 28*28 images to 140*140 images, as the mobile net requires a minimum of 32*32 images, so the final input_shape value I used was (140,140,1), where 1 is the image channel (in this case, black and white). I set alpha to 1 because it usually gives the best results. The activation function was set to default, which is ‘relu’. I have used ‘Adadeltaoptimizer as it gave the best results. batch_size was set to 128 to train the model faster. I have used 20 epochs for better accuracy. Classes were set to 5 as we have 5 classes to classify.