# Details on Parameters Chosen

## 1) Random forest

**n_estimators **— number of decision trees in a forest. [10,50,100]

**max_features **— Features to be considered while splitting [‘auto’,’sqrt’]

**max_depth **— the maximum number of levels in the tree [2,4,6,8,10]

**n_jobs **— number of processes running in parallel, usually set as -1 to do maximum processes at a time.

**criterion **— it is a way to calculate loss and thus update the model to make the loss lesser and lesser. [‘entropy’,’cross_validation’]

I used

‘auto’as themax_feature;8as themax_depth;-1asn_jobsand‘entropy’as mycriterionbecause they usually give the best results.

However, to find out the optimal number of trees, I used GridSearchCV. It tries all the given parameter combinations and creates a table to show results. As can be seen from the figure, there is no significant increase in the test score after 80 trees. Thus, I decided to train my classifier on 80 trees.

## 2) K-Nearest Neighbors (KNN)

**n_neighbors **— number of nearest data points to compare [2,5,8]

**n_jobs **— number of processes running in parallel, usually set as -1 to do maximum processes at a time

I did not change any default parameters for this model because they would give the best result.

However, to find the optimum number of** n_neighbors**, I have used GridSearchCV, and this is the graph I got:

According to the graph, the test score declines after *5* **n_neighbors**, which means *5* is the optimum number of neighbors.

## 3) Multi-Layer Perceptron (MLP)

**alpha**– Most commonly known as the learning rate, it tells the network how fast to adjust the gradient. [0.01, 0.0001, 0.00001]

**hidden_layer_sizes — **It is a tuple of values that consists of the number of hidden nodes at each layer. [(50,50), (100,100,100), (750,750)]

**activation **— A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

**solver — **Also known as the optimizer, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

**batch_size — **It is the number of images to be processed at once. [200,100,200].

I have chosen

activationas ‘relu’ andsolveras ‘adam’ because these parameters give the best result.

However, to choose the number of **hidden layers** and **alpha**, I have used GridSearchCV.

As can be seen in the table the best result is received when **alpha** is *0.001,* and **hidden_layer_size** is *(784,784)*. Therefore, I decided to use those parameters.

## 4) Convolution Neural Networks (CNN)

l**earning_rate**– it tells the network how fast to adjust the gradient. [0.01, 0.0001, 0.00001]

**hidden_layer_sizes — **It is a tuple of values that consists of the number of hidden nodes at each layer. [(50,50),(100,100,100),(750,750)]

**activation **— A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

**solver — **Also known as the optimizer, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

**batch_size — **It is the number of images to be processed at once. [200,100,200]

**Epochs **— Number of times the program should run or how many times the model should be trained. [10,20,200]

I have chosen

activation functionas ‘relu’ andsolveras ‘adam’ because these parameters usually give the best results. In the network, I have added 3convolution layers, 2maxpool layers, 3dropout layers, and at the end one softmaxactivation function. I did not use GridSearchCV here because there can be a lot of possible combinations that can be tried out, but there won’t be much difference in the results.

## 5) Mobile Net

**Input_shape- **It is a tuple consisting of dimensions of an image. [(32,32,1),(128,128,3)].

**Alpha- **It is the width of the network. [<1,>1,1]

**activation **— A function which provides value to important characteristics in an image and deletes the irrelevant information. [‘relu’,’tanh’,’logistic’].

**optimizer — **Also known as the solver, this parameter tells the network which technique to use for training the weights in a network. [‘sgd’,’adam’].

**batch_size — **It is the number of images to be processed at once. [200,100,200]. **Epochs **— Number of times the program should run or how many times the model should be trained. [10,20,200]

**classes- **Number of classes to be classified. [2,4,10]

**loss- **It tells the network which method to use to calculate the loss i.e. the difference between the predicted and actual value. [‘categorical_crossentropy’, ‘RMSE’]

First, I resized the 28*28 images to 140*140 images, as the mobile net requires a minimum of 32*32 images, so the final input_shape value I used was (140,140,1), where 1 is the image channel (in this case, black and white). I set

toalpha1because it usually gives the best results. Theactivation functionwas set todefault, which is ‘relu’. I have used ‘Adadelta’optimizeras it gave the best results.batch_sizewas set to128to train the model faster. I have used20epochsfor better accuracy.Classeswere set to5as we have 5 classes to classify.