How to rapidly test dozens of deep learning models in Python

Let’s develop a neural network assembly line that allows us to easily experiment with numerous model configurations.

Assembly Line of Neural Networks (Source: all_is_magic on Shutterstock & Author)

Optimizing machine learning (ML) models is not an exact science. The best model architecture, optimization algorithm and hyperparameter settings depend on the data you’re working with. Thus, being able to quickly test several model configurations is imperative in maximizing productivity & driving progress in your ML project. In this article, we’ll create an easy-to-use interface which allows you to do this. We’re essentially going to build an assembly line for ML models.

Each model is governed by a set of hyperparameters. Let’s create some functions that generalize to these hyperparameters and build ad hoc models. Here’s are the primary hyperparameters that govern neural networks:

  • Number of hidden layers
  • Number of neurons per layer
  • Activation functions
  • Optimization algorithm
  • Learning rate
  • Regularization technique
  • Regularization hyperparameters

We can package all of these in a hash table:

model_info = {}
model_info['Hidden layers'] = [100] * 6
model_info['Input size'] = og_one_hot.shape[1] - 1
model_info['Activations'] = ['relu'] * 6
model_info['Optimization'] = 'adadelta'
model_info["Learning rate"] = .005
model_info["Batch size"] = 32
model_info["Preprocessing"] = 'Standard'
model_info["Lambda"] = 0
model_2['Regularization'] = 'l2'
model_2['Reg param'] = 0.0005

Before we begin experimenting with various model architectures, let’s quickly visualize the data to see what we’re working with (data). Although from my experience standard scaling seems to be the de facto preprocessing method, I visualized the data using a variety of preprocessing tactics. I used PCA and t-SNE to reduce the dimensionality of the data for each preprocessing method. Below are the data visualizations which appear to be the most separable:

Source: Author

We can then define a function that will construct & compile a neural network given a hyperparameter hash table:

We can quickly test a few baseline models now that we have a fast, flexible way of constructing and compiling neural networks. This allows us to draw quick inferences about what hyperparameters seem to be working best:

Using the function above, I discovered deeper and wider architectures are necessary to obtain high performance on the data after evaluating over a dozen model architectures with 5-fold cross validation. This is most likely due to the highly non-linear structure of our data.

Aside: If you’re not familiar with k-fold cross validation, it’s a model evaluation technique that involves divvying up the data into K disjoint partitions. One of those partitions is utilized as the test set and the rest of them as the training set. We then iterate through each fold so that every partition has a turn being the test set. Performing k-fold cross validation allows us to obtain a robust assessment of the model’s performance.

Source: Author

Although k-fold cross validation is a great way of assessing a model’s performance, it’s computationally expensive to obtain these results. We can simply split the data into a training and test set to draw faster heuristics while optimizing hyperparameters. We save our model after each epoch so we can retrieve it subsequent to training if need be. We also utilize the Tensorboard callback so we can examine how the model was trained:

We can then obtain a more robust performance assessment once we have gained some insights as to what hyperparameter settings are working well.

Grid search is not the go-to method for hyperparameter optimization in industry. Rather, a method referred to as the coarse-to-fine approach is more frequently employed. In the coarse-to-fine method, we start with a broad range of hyperparameters, then hone in on the parameter settings that work best. We then randomly sample hyperparameter settings from the narrow range of values we want to experiment with. We can rapidly iterate over numerous model configurations now that we have a way of dynamically instantiating deep neural networks:

Aside: When calling the Tensorboard log directory from your terminal you CANNOT have spaces in the file path. On Windows, spaces in the log directory preclude Tensorboard from loading the data properly.

The code above will also save important metrics (e.g. the area under the ROC curve) for each model into a CSV file so we can easily compare and contrast what hyperparameters lead to variations in performance.

Once we have a better idea of what hyperparameter values work well, we can begin to optimize the model within this range of values. The following function generates a randomized neural network. We can then use this function to experiment with various randomized hyperparameter settings within the range of values we have narrowed down:

We learned how to quickly experiment with numerous model architectures and hyperparameter settings throughout this article. As always, constructive criticism is appreciated. If you liked the article or learned something new, please feel free to follow me on Medium, leave a clap, or shoot me a message at Thanks again!

Source code: here


I have discovered a bug in the code that may be caused by the Tensorboard Keras callback or the build_nn() function. It’s not a major issue, but I want to bring this to the surface with you. The problem is that multiple graphs are written to the Tensorboard log files when testing a list of neural nets. For instance, when we run the “Model experimentation” code above, we get this graph visualization for model_1:

Erroneous Graph (Source: Author)

As you can see, there are two distinct graphs here: one corresponding to model_0 (left) and one for model_1 (right). This doesn’t happen for model_0’s graph because it was trained prior model_1:

Visualizing model_0 Architecture via Tensorboard (Source: Author)

However, if we load model_1 subsequent to training, we can see the architecture is correct for the hash table we passed to build_nn():

print("Model 1")
saved_model = load_model(FILE_PATH)
Model 1
Layer (type) Output Shape Param #
dense_8 (Dense) (None, 110) 10120
activation_8 (Activation) (None, 110) 0
dense_9 (Dense) (None, 110) 12210
activation_9 (Activation) (None, 110) 0
dense_10 (Dense) (None, 110) 12210
activation_10 (Activation) (None, 110) 0
dense_11 (Dense) (None, 1) 111
activation_11 (Activation) (None, 1) 0
Total params: 34,651
Trainable params: 34,651
Non-trainable params: 0

This suggests there are no underlying issues in using this code to obtain accurate assessments, but it disallows us from visualizing models in the pipeline. If you have any insights, suggestions or know why this is occurring, please feel free to make changes to the code or let me know! I would like to continue looking into this, but am unable to do so as a full-time student with three part-time jobs. Thanks again for reading!

Source: Deep Learning on Medium