7 tips for biosignals preprocessing: how to improve the robustness of your Deep Learning…

Source: Deep Learning on Medium

Why dealing with noises and distortions is so important

Typically, any classification task (abnormality detection), related to biosignals, such as Electrocardiography (ECG), Electroencephalography (EEG), Electromyography (EMG), etc. may be considered as time-series recognition problem.

According to Karush–Kuhn–Tucker conditions, input signals should fit criteria of stationarity. Simply, the patterns of input signals must be same or similar like in a training set without changing the distribution of the signal over time.

Usually, any recording of biosignals is exposed by a lot of noises. These distortions insert additional variance into the model because of violating the stationarity criteria.

These noises may have different nature and more specific information you may find here and here. These articles describe ECG noises, but it also may be applied to any biosignals.

That means, the overall performance of your DL classifier is defined by the efficiency of pre-processing techniques.

Let’s take a look, how, practically, it’s possible to improve the robustness of the deep learning model with preprocessing.

1. 50% of the efficient digital signal processing is the efficient analog processing

Any digital signal processing starts with efficient analog signal conditioning. The most common mistake is related to aliasing problem.

According to the Nyquist theorem, the sampling rate of the ADC should be in 2 times higher than the highest frequency of the input signal. Any signal does not fit this criteria aliases into the main frequency domain and masks useful one as additional noises:

To prevent this issue, the analog low-pass filter is applied before ADC. Very often hardware engineers consider simple RC-circuit sufficient for that purpose. But there is a huge difference in frequency response between the perfect low-pass filter and the real one:

Comprising of the frequency characteristic of ideal (left) and real (left) analog filters

Make sure your Inti-Aliasing LPF meets requirements of the suppression on the Nyquist frequency (For additional details, I recommend this book):

  • 50dB for 8-bit ADC
  • 62dB for 10-bit ADC
  • 74dB for 12-bit ADC
  • 98dB for 16-bit ADC

2. Use the same hardware for training and predictions

Different devices define different signal recording conditions, such as non-linear distortions of electronics, different enclosure, different position of sensors, etc.

Since different conditions define different signals, I’d recommend to use the same hardware for training model and making predictions. It can be a cause of the additional bias on the training set.

If no options, it’s possible to try pre-distortion of the training set, but it requires additional expertise in the hardware and noises domains.

3. Nyquist theorem to accelerate the training

As described above, Nyquist theorem defines a minimum sample rate of the ADC to save 100% of the information of the analog signal after the converting. That means, if the maximum frequency of the signal is lower than Fs/2, it has the redundancy, which can be used to accelerate the training of the Deep Network.

Let’s consider an example.

There is the ECG signal with the sampling rate 125 Hz provided by Physionet database (30 Hz filter was applied):

The tip for ECG preprocessing: ECG signals allocate 0–100Hz, but the 30Hz low-pass filter may be applied. It keeps P and T waves untouched, but it decreases the amplitude of the R peak by 20-30%. It’s not critical for the detection of any abnormalities and heart rate counting.

The Power Spectrum Density of that signal looks like:

As shown above, the main part of the energy of the signal is concentrated between 0–30Hz. Let’s decimate it into 80Hz and compare it with the original signal:

Demonstration of decimation effect: signal with 80Hz (upper) and 125Hz (lower) sampling rate

The original shape is kept, but the overall length of the signal is reduced by 35%, from 92 to 59 samples. It is equal to 35% acceleration of training without loss in accuracy.

Demonstration of the efficiency of that approach is shown in my Github project.

Important note: make sure your decimation does not lose any additional details that could be used for the recognition. Experimenting is the only way to proof. But practically, training two stacked (CNN+LSTM) models on downsampled signals is usually quicker than training one model with the original sampling rate without loss of the performance.

4. Understand the requirements for the system

Before trying more complex filtering algorithms, like Wavelette or adoptive, I’d recommend understanding what features are required for recognition.

Here’s an example.

Let’s consider the task for Deep Learning model is arrhythmia detection while walking. Typically, ECG walking data contains low-frequency noise:

Meanwhile, the clear ECG signal looks like this:

P and T waves are masked and it’s a quite non-trivial task to extract it. Before trying to develop complex algorithms, let’s take a look, what actually arrhythmia is:

For arrhythmia detection, only pulse counting is enough to build the efficient detector, but obviously, low-frequency baseline wandering inserts an additional variety with violation of the stationarity.

Different parts of ECG may allocate different spectrum domains:

Simply, a simple 5–15Hz Bandpass filter solves the issue of extraction R-peaks. With applying that filter, P and T waves are suppressed (and abnormalities related to it are not available for recognition), but the requirements for the system are met.

The main rule: as more algorithm is complex, as less it robust and it requires more resources for implementing (both time and money). The simplest digital filtration should be the first thing you have to try.

5. Use the MiniMax principle in developing pipelines

The MiniMax principle is the great strategy from the game theory.

The main problem with biosignals is the changing of quality of signals over time:

  • Case 1. High quality during low activity of the subject:
  • Case 2. Poor quality of the data during intense moving. P and T are masked and there is no way to extract it from the noise with a 1-channel system:

For the first case P, QRS, T are detectable, it means most of abnormal ECG patterns (heart attack, atrial fibrillation, etc.) could be recognized.

For the second one, only some abnormals related to QRS (arrhythmia, etc.) could be recognized.

As shown above, the best way to extract QRS is to apply 5–15Hz bandpass filter, meanwhile P and T will be suppressed.

For case 2 it will be not critical since P and T are masked by the noise, but it limits the amount of possible detected pathologies while high-quality data at the input.

The best way to avoid this problem is to apply an adaptive filter, which changes it’s impulse response to the changing environment:

The idea is simple:

  1. Make detector of the data quality (linear detectors/ CNN);
  2. Define a set of filters;
  3. Make a rule of changing impulse response depending on input signal quality.

6. The smart way to use High-Pass filters

Usually, High-pass filtering is required for dealing with baseline wandering:

EEG with baseline noise

The obvious approach considers applying the high-pass filter. The main constraint for that is a very low cut-off frequency (0.05Hz) and high stopband suppression (> 30dB). In order to meet requirements, the filter must have high order, that means long delay, which may not suitable for real-time applications.

An alternative way:

  • To decimate input signal;
  • Extract baseline noise with applying a low-pass filter with a 0.05 Hz cut-off frequency;
  • Interpolate the signal;
  • Subtract the baseline from the original signal

The code example (Matlab) is available in this GitHub repository.

7. Iterative experimenting

Like any Data Science problem, classification of biosignals is an iterative experimental process, because different filtering approaches may be suitable for different applications.

I’ve summarised a shortlist of filtering techniques, from the most reliable to worst.

NOTE: It’s my personal opinion only, it can not coincide with yours.

  • Digital filtering(FIR, IIR). FIR is recommended due to the absence of group delay distortions. It has moderate performance, ideally for non-specific conditions, very simple for implementing and 100% robust.
  • Wavelet filtering. Strong performance, but realization may be complex in terms of selection of parameters.
  • Adaptive filtering. This method shows poorer performance than Wavelet filtering, but it’s much simpler for implementing with good agility and performance.
  • Independent Component Analysis (ICA)/ Blind Source Separation (BSS). Implementation of the Fast ICA algorithm in most popular programming languages are available here. I’d recommend it to try last, because:
  1. It works with multi-channel configurations only;
  2. I’ve found the robustness of that approach very poor because convergence is not guaranteed ;
  3. It requires relatively more computation resources, may not suitable for real-time applications.


Found the paper useful? Please, leave your feedback about the article by this link


Dmitrii Shubin, R&D enginer, Medical devices

Toronto, ON, Canada

Contact info:

Email: shubin.dmitrii.n@gmail.com

LinkedIn, GitHub