Dual-input CNN with Keras

Source: Deep Learning on Medium

This post details my solution for Microsoft’s Artificial Intelligence Professional Program Capstone Project, hosted by DrivenData as a data science competition.

The Microsoft Professional Program for Artificial Intelligence consists of 9 courses followed by a capstone project. You learn Python, Math, Ethics, Data Analysis, Azure Machine Learning, Computer Vision, Natural Language, Processing, Speech Recognition and CNTK (Microsoft’s Cognitive Toolkit library).

The courses are hosted by edX, and you need to pass each one of them to advance to the capstone project. Passing a course on edX means passing the assessments (min 70% score), paying for the course and getting a certificate at the end. The price is $99 per course / $990 for the entire program (check the program page as the price might change). The 9th course is made up of 3 optional ones, so you have to choose from Computer Vision, NLP and Speech Recognition (if possible to them all).

Besides edX certificates, at the program’ successful completion you are awarded a certificate from Microsoft.

It took me ~250 hours to complete the whole program, from April to October mainly because the Capstone project is available only at the start of each quarter and I’ve missed the train in July. At the same time, I also finished the Data Scientist with Python from DataCamp because I felt that I need to get a little bit deeper into Python’s libraries for data science and looking back it was an excellent way to complement the Microsoft’s track.

The capstone challenge consists of using standard AI tools to identify 11 different types of appliances from their electric signatures, quantified by current and voltage measurements.

The dataset contains current and voltage measurements sampled at 30 kHz from 11 different appliance types that are present in an average household. For each appliance, plug load measurements were post-processed to extract a two-second-long window of measurements of current and voltage. For some observations, the window contains both the transient startup state (turning the appliance on) as well as the steady-state operation (once the appliance is running). For others, the window only contains the steady-state operation. The observations were then transformed into two Mel spectrograms, one for current, and one for voltage.

Current and voltage spectrograms for a Hairdryer

In the dataset the label values represent the following appliances:

  • 0: Compact Fluorescent Lamp
  • 1: Hairdryer
  • 2: Microwave
  • 3: Air Conditioner
  • 4: Fridge
  • 5: Laptop
  • 6: Vacuum
  • 7: Incandescent Light Bulb
  • 8: Fan
  • 9: Washing Machine
  • 10: Heater

There are 988 spectrogram pairs in the training set and 659 in the test set.

For a more detailed description of the problem and see the competition page here.

For the implementation, I decided to use the Keras deep-learning library with TensorFlow as a back-end. I’ve also chosen to run this online, inside the Google Colaboratory which is a free Jupyter notebook environment that requires no setup and provides GPU access.

I also wanted to keep this simple and be able to experiment by changing things on the fly without too much hassle, and Jupyter notebooks are perfect for that.

Download and extract data

I’m downloading and extracting the data using wget and unzip but it can be done in Python as well if you feel like coding it. There was no need to do it in this case, and I like this 2 lines approach.

!wget https://mpp0xc0ae45ef.blob.core.windows.net/drivendata-mpp-storage/data/10/public/data-release.zip
!unzip -o data-release.zip

Import dependencies

I’m using the following libraries:

  • Pandas & NumPy — for data processing
  • scikit-image — for reading the spectrogram images
  • matplotlib — for data visualization
  • scikit-learn — for splitting the data set into train and test sets
  • Keras — to create and train a Deep Neural Network (DNN) used to predict the appliance based on the current and voltage spectrograms

Read the data

Here I use Pandas’ read_csv function to read the train_labels.csv file into DataFrame train_df, and I also create two additional columns where I map the id with the correct file.


Before going forward, I needed to create some helper values:

  • as_gray — I experimented with the spectrograms as RGB or Gray images
  • in_channels — How many channels the input image has: for RGB skimage gives me 4 channels and for Gray 1
  • img_rows, img_cols — the images are 128 pixels tall (height) and 118 pixels wide (width)
  • num_classes — we have to predict this amount of appliances


The next 2 variables enter in the category of hyper-parameters because changing their value affects the time it takes to train the model, memory requirements or even the accuracy of the model:

  • batch_size — defines the number of samples to be propagated through the network. Too low and the less accurate the estimate of the gradient will be, too high and it requires more memory to train the network. It is a trade-off between accuracy and speed.
  • epochs — an epoch is a single pass through the entire training set while training the network.

Process current and voltage files

Next, we read the images and process them using the read_spectrograms helper function and convert the labels from a class vector (integers) to a binary class matrix (known as one hot encoding) in order to play well with our loss function (see below)

Helper functions

This function reads the spectrogram images from disk, Greyscale or RGB, converts them as NumPy arrays and then normalizes the pixel values (from 0–255 ) to be between 0 and 1. Before returning the result, the arrays are also reshaped to match Keras’ (with TensorFlow as back-end) expectations:

(samples, rows, columns, channels)

Show random appliance

To check everything is loaded correctly I’m displaying a random training set from the dataset:

Current and voltage spectrograms for a Fan

Split and train test batches

I’m splitting the data set into training (70%) and validation samples (30%) that are used to fit the model.

Because I have 2 inputs (current and voltage) thus 2 sets of training data, I needed to stack them, so the train_test_split function splits on the same pair of images. The stacking happens on the 4th axis and after the split, I un-stack the results:

Create the model

For the network design, I got some inspiration from established networks like VGGNet, ResNet or Inception and adapted for the small dataset that you get for this challenge.

Each spectrogram has its convolution layers to extract the features which are then concatenated and feed to the fully connected layers below.

The network looks like this:

For each of the input, I have 3 convolution layers for feature extraction made from the following parts

  1. The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 filters, and a kernel of size 3×3 and a LeakyReLU activation function (this is the leaky version of a Rectified Linear Unit ReLU, and it allows a small gradient when the unit is not active). This layer is the input layer, expecting images with the shape outline above.
  2. Next, a pooling layer that takes the max called MaxPooling2D. It is configured with a pool size of 2×2 (it halves the input in both spatial dimensions).
  3. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 25% of neurons in the layer to reduce over-fitting.

The layers above are replicated two more times increasing the filter size to 64 and 128 (to adapt to more complex features) and the dropout to 25% and 40% (to prevent over-fitting).

The output of the convolution layers for both inputs goes through a concatenation layer, and the result is converted from a 2D matrix to a vector by the Flatten layer. It allows standard fully connected layers to process the output.

Next, a fully connected layer with 512 neurons, a leaky rectifier activation function followed by another Dropout layer configured to exclude 50% of neurons randomly processes the output from the Flatten layer.

Finally, the output layer has 11 neurons for the 11 appliances and a softmax activation function to output probability-like predictions for each class.

The network diagram translates into this code (using Keras’ functional API):

For the optimization algorithm, I picked Adam ( Adaptive Moment Estimation) and for the loss function, categorical_crossentropy (the reason we had to one hot encode the labels). The optimization algorithm minimizes the loss function.

…and the helper function that creates the convolution layers:

Train the model

For training, I use a callback called ModelCheckpoint that saves the best weights and then call the fit function of the model object.


Theoretically I should’ve put aside some data for the model evaluation, but in this case, there was the competition’s test dataset for that. The code below loads the best weights and evaluates the validation set on them (the result is our best accuracy during the training).


After training the model, I run it against the competition test set which I have to load and process first:

followed by calling the predict function on the model and preparing the submission file:

After I managed to pass the exam and then to get 100% points on it (finished 25/265 ), I decided to explore the Keras library and try different network architectures and different hyperparameter values:

  • batch_size — 16, 32, 64
  • epochs — 50, 100, 200, 500
  • optimization algorithm — Adam, SGD, Adagrad, RMSProp, Nadam (Adam RMSprop with Nesterov momentum)
  • learning rate — for this I used the ReduceLROnPlateau callback from Keras reducing the learning rate by a factor of 2–10 once learning stagnates.
  • activation function — ReLU, eLU, LeakyReLU
  • number of neurons in the fully connected layer — 256, 512, 1024
  • dropout of the last layer — 40%, 50%, 60%.

The full Jupyter notebook can be found on my Github account.

In a next post, I will show how to use hyperparameters optimization to get the optimal combination for the model.