NUS Makerthon 2020: Creating an emotion detecting AI toy using Raspberry Pi and Machine Learning

Original article was published on Artificial Intelligence on Medium

NUS Makerthon 2020: Creating an emotion detecting AI toy using Raspberry Pi and Machine Learning

NUS Makerthon 2020

Beyond learning within school curriculum, my group of friends and I always believed that we had to use our knowledge to make a positive impact on society. When my team found out about NUS Makerthon 2020, we knew that we had to sign up right away.

NUS Makerthon is a competition jointly organised by the School of Computing, School of Design and Environment, and supported by NUS Enterprise. The theme for this year’s competition: Designing for Social Connection.

Our inspiration

For this project, my team focused on one of the most common mental illnesses in Singapore — depression. While there are current measures put in place to support people with depression, such measures are insufficient.

Fig 1. Increasing number of people have experienced mental disorder in their lifetime. In the article, it was revealed that depression was the most common disorder, with 1 in 16 people in Singapore having the condition at some point in their lives.

Our Product — Litmus Box

We came up with *Litmus Box*, an emotion detecting AI toy which aims to help people in depression regain and improve social life by

  • providing emotional support to the user and
  • enabling psychotherapists to provide more effective treatment

After weeks of intense brainstorming and coding, I am proud to say that my team came in 2nd in the competition, winning a prize of $3000 SGD.

How it works

Fig 2: Prototype of Litmus Box

Litmus Box is a video journaling toy with AI technology and it comes in different characters. For this competition, my team chose to work with a toy that everyone is familiar with, Elmo. The inbuilt microphone and speaker allows for recording of video journals, while the small touchscreen attached to the front allow users to operate the toy.

Users can articulate their thoughts to a soft toy through video journaling. Using artificial intelligence, Litmus Box is able to detect the emotions experienced by the user during the video.

At the end of the video journal, the toy will return a quote to the user depending on the dominant emotion experienced. If the user was feeling happy, the toy will return a quote to encourage the user to continue staying cheerful. As such, Litmus Box serves as an emotional buddy for users, acting as a friend or family member who might not always be available.

Fig 3: Example of quote displayed on screen if angry is dominant emotion detected

Litmus Box will also collect and compile data generated from each video journal. During psychotherapy sessions, users can choose to present the data to their psychotherapists. Using data analytics and visualization tools, data collected from the video journals will be presented in a clear and analytical way to the psychotherapists. As psychotherapists are not always available to talk to their patients, this tool provides them with a way of staying updated with their patients’ therapy progress.

Technology used


Litmus Box is built on a Raspberry Pi 3B model which is hidden in the body of the soft toy. We managed to obtain a microphone from a tech shop, and the microphone stand is used as the support for the soft toy. The raspberry pi camera was obtained from an online shop and installed in the soft toy’s eye.

User Interface

Upon configuring the microphone and camera to work on the Raspberry Pi, we next created a user interface to allow recording of video journals. Given the small screen on the soft toy, we wanted to keep the user interface simple and easy to use. We used TkInter (Python’s de-facto standard GUI package) using the following code:

The win boolean is simply to track the state of the application while the 2 buttons are wrappers to call our start_recording() and stop_recording() functions. Which starts and stop the video recording accordingly.

Essentially, the GUI allows users to: 1) Start and stop a recording 2) View relevant emotion quotes

Video Recording

The start recording function records each frame of the PiCamera as well as record the audio from the microphone in chunks using the pyaudio library.

Thereafter, when the recording is stopped, each frame in the videoFrameStore is then processed. Each frame is first converted into grey and the location of the face is calculated using the facecasc.detectMultiScale method. Thereafter we send the cropped image into the model for prediction and we simply store the most prominent prediction of that frame into a dictionary.

At the end of the program we will then be able to process the total amount of emotions expressed in each frame.

With a better processor and camera one could perhaps achieve higher frame rates and perform the same prediction & processing more quickly.

Emotion Detection Algorithm

Our model was trained on FER-2013, a large dataset of 35887 grayscale, 48×48 sized face images with seven emotions — angry, disgusted, fearful, happy, neutral, sad and surprised. As our project focused on only 5 emotions, we grouped some of the emotions together. We first used haar cascade to detect faces within each frame of the video journal, and then resized it to a 48×48 image. Using a convolutional neural network model, we would be able to obtain a score of each of the 5 emotions. The emotion displayed for each frame would be the one with the highest score. By reducing the number of emotions to 5, our model was able to achieve 70% prediction accuracy.

Fig 4: Emotion Detection Algorithm detecting a happy and a sad face

Data Visualization

To illustrate the change in user’s emotions over the video journal, we used animation and pyplot package from matplotlib python library to create a bar chart race animation. Other matplotlib packages were also used for data visualization to allow psychotherapists to track their patients’ progress more easily.

Bar chart race animation was used to illustrate user’s emotion experienced throughout the vlog

Moving forward

While we are proud of our project, we acknowledge that there are certainly areas in which we can improve our product.

  1. Data collected from video journals can be uploaded onto an online databse immediately, enabling psychotherapists to detect potential outbreaks at any point in time and check in on their patients.
  2. The emotion detection algorithm is able to achieve 70% accuracy which can certainly be improved. Litmus Box can include a calibration feature to learn the facial features of each user. Also, natural language processing can be applied to detect emotion of the user from audio recording.
  3. The audio files and video files are not synchronised due to the difference frame quality and recording (chunk) rate. We wish to improve on this.


Emotion Detection algorithm:

Bar Chart Race algorithm:

Litmus Box code: