Unlocking the Mysteries of the Brain With AutoML

Original article was published by Frederik Bussler on Artificial Intelligence on Medium

Indeed, we can see that there are much higher amounts of delta brainwaves when students are confused. The next most important frequency band is Gamma-2. Research suggests that gamma waves are associated with greater focus, especially in people who meditate.

Another important frequency is Theta, which is commonly associated with slow thinking. Paying attention to fast-paced learning videos requires more active brainwaves, and we can indeed see that high theta is associated with greater confusion. Confused students have a median theta value of around 115K, while non-confused students have a median theta value of around 55K.

More Data (Feature Extraction and More Channels)

Above, we used brainwave frequency bands like alpha, beta, and gamma to predict a mental state. Essentially, we’re using various frequency bands from raw EEG data (read by just 1 channel) as the features.

However, there are many, many other features we could derive from EEG data with various extraction techniques. We’ll also look at raw data that’s read from 4 electrodes, instead of just 1, to see how upping the channel count alone can improve accuracy.

Relaxed or Concentrating?

This Kaggle dataset, for instance, derives 988 features from EEG data to describe whether someone is relaxed, concentrating, or neutral. The features rely on statistical techniques like fast Fourier transform, Shannon entropy, max-min features in temporal sequences, log-covariance, and more, all computed in semi-overlapping time windows.

We can upload this much more attribute-rich dataset to Apteo, and select Label as the KPI. A Gradient Boosting Classification model is automatically selected, with an extremely high Jaccard Score of 0.972. As it turns out, extracting an enormous amount of features lets us make very accurate predictions.

In this case, individual attributes have much lower importance scores, but taken together, we achieve great accuracy.

Happy or Sad?

Another attribute-rich dataset we can analyze is the Kaggle “EEG Brainwave Dataset: Feeling Emotions.”

Two subjects each watched 6 minutes of positive video, 6 minutes of negative video, and 6 minutes of nothing (neutral data), so 18 minutes of data were recorded per subject, for a total of 36 minutes.

The sampling rate was 150Hz (150 samples per second), and with 2,160 seconds (36 minutes * 60) of data, we have a dataset of 324,000 data points.

As with the previous attribute-rich dataset, we’re able to achieve very higher accuracy than with just a few frequency bands.

Music or Reading?

To look at how a greater channel count alone can improve accuracy, we’ll analyze this dataset from Kaggle that measures EEG data from someone while they’re listening to music, and while they’re reading. The recording device has 4 channels, as opposed to the 1-channel data analyzed at the beginning of the article.

The goal is to simply predict if their brain is busy listening or reading — a much simpler task than predicting if someone is confused or not, and with 4x the channel count, we end up achieving a Jaccard Score of 1.0 with a logistic regression classifier. In other words, we have 100% accuracy. This raises a red flag, and makes it seem like our model is overfitting.

Looking at the data, however, we can see why the model achieves such high accuracy.

Below, class 0 refers to listening to music, and class 1 refers to reading. There’s no overlap at all in the bulk of the channel 4 data between the two classes (just some outliers overlap), which means that it’d be easy with even just channel 4 to make fairly accurate predictions.

Given the other three channels, each with varying degrees of overlap, there’s a clear distinction between reading and listening in the data. It’s not so much that the model is overfitting, as that it’s just a simple problem to solve.

We can also see, for instance, that there’s minimal overlap between classes in the channel 2 data. The reading class has an extremely wide range of channel 2 data, from roughly -5000 to +3000, while the music class is limited to a narrower band of around -300 to +1000, and in this band, there aren’t many reading examples.

If we zoom in close enough, we can imagine a clearly separable hyperplane in the 3rd dimension.

We can’t visualize a fourth channel dimension, but that dimension is used in training, and makes it even easier to predict a class.


In this article, we’ve used AutoML to accurately predict confusion, relaxation or concentration, positive or negative emotions, and if someone is reading or listening to music, with data from as few as 1 electrode.

Imagine that, instead of analyzing just 1 electrode, we had data from 1,024 electrodes, in direct contact with the brain. That is the promise of neural implants like Neuralink. We could analyze extremely complex neural correlates, even with the current, beta version of Neuralink, which aims to achieve “orders of magnitude” more channels in future versions.

As we’ve seen, a higher channel count and more features enable more accurate predictive models. It’s not far-fetched to imagine that, one day, neural analyses will reveal the mysteries of consciousness itself.