Application full name | Abbreviation | Input Signal | Main Kernels | Dominant Phase |
---|---|---|---|---|

Heartbeat Classifier | HeartBeatClass | ECG | Morphological filtering | Preprocessing |

Seizure Detection SVM | SeizureDetSVM | ECG | FFT, SVM | Feature extraction |

Seizure Detection CNN | SeizureDetCNN | EEG | CNN | Inference |

Cognitive Workload Monitoring | CognWorkMon | EEG | FFT, Random forest | Feature extraction |

Gesture Classifier | GestureClass | EMG | ICA, MLP | Feature extraction |

Cough Detection | CoughDet | Audio, IMU | MFCC, Random forest | Feature extraction |

Emotion Classifier | EmotionClass | PPG, GSR, ST | KNN | Inference |

Biological-Backpropagation free | Bio-BPfree | EEG | CNN gradients | Fine-tuning |

We have selected eight biomedical wearable applications that offer representative workloads and varied profiles for the processing, idle, and acquisition phases. The applications are complementary and enable the evaluation of different architectural parts (e.g., sleep mode, digital signal processing). BiomedBench will be launched with eight applications but is open to future additions that present new challenges in any of the three phases.

The table below shows each application’s computing pipeline and implementation details. It includes the most common kernels found in real-time patient monitoring across the three different processing phases: signal preprocessing, feature extraction, and inference. Additionally, it shows the arithmetic representation and the existence of a multicore version.

Application full name | Signal Preprocessing | Feature Extraction | Inference | Training | Arithmetic | Multicore |
---|---|---|---|---|---|---|

HeartBeatClass | MF, RMS, Rel-En | REWARD, ECG-FPDEL | RP, NFC | – | 16-bit FxP | Yes |

SeizureDetSVM | MAVG | PT, ECG-EDR, PLOMB | SVM | – | 32-bit FxP | Yes |

SeizureDetCNN | – | – | CNN | – | 16-bit FxP | Yes |

CognWorkMon | BPF, BLR | FFT | RF | – | 32-bit FxP | No |

GestureClass | – | ICA | MLP | – | 32-bit FP | Yes |

CoughDet | LPF | FFT, MFCC | RF | – | 32-bit FP | No |

EmotionClass | – | AVG | KNN | – | 32-bit FP | No |

Bio-BPfree | – | – | – | CONV-GRAD | 32-bit FP | No |

** FxP stands for fixed-point, and FP stands for floating-point

All applications are coded in C or C++ (GitHub). A thorough description of each application follows:

## Heartbeat classifier (HeartBeatClass)

The HeartBeatClass [1] detects abnormal beating patterns in real time for common heart diseases using the ECG signal. The input signal is captured by three different ECG leads at 256 Hz with 16-bit accuracy for 15 seconds. The input signal is processed through a morphological filter(MF)[2], and the root-mean-square (RMS) combines the three signal sources before enhancing the signal through relative energy (Rel-En) [3]. In feature extraction, the relative-energy-based wearable R-peak detection (REWARD) algorithm [3] detects the R peaks before delineating the other fiducial points of ECG (ECG-FPDEL) [4]. Finally, a neuro-fuzzy classifier(NFC) using random projections (RP) [1] of the fiducial points classifies the heartbeats as abnormal or not.

The application uses 16-bit fixed-point arithmetic. MF is the dominant kernel, accounting for more than 80 % of the execution time. The MF implementation involves a queue to perform dilation and erosion, translating into data movements and min/max search. We also included the multicore version of the application [5] and additionally improved the parallelization strategy for the delineation and classification phases with dynamic task partition instead of static.

[1] Rubén Braojos, Giovanni Ansaloni, and David Atienza. 2013. A methodology for embedded classification of heartbeats using random projections. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, Grenoble, France, 899-904.

[2] Ruben Braojos, Giovanni Ansaloni, David Atienza, and Francisco J. Rincón. 2012. Embedded real-time ECG delineation methods: A comparative evaluation. In 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE). IEEE, Larnaca, Cyprus, 99-104.

[3] Lara Orlandic, Elisabetta de Giovanni, Adriana Arza, Sasan Yazdani, Jean-Marc Vesin, and David Atienza. 2019. REWARD: Design, Optimization, and Evaluation of a Real-Time Relative-Energy Wearable R-Peak Detection Algorithm. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, Berlin, Germany, 3341-3347.

[4] Elisabetta De Giovanni, Amir Aminifar, Adrian Luca, Sasan Yazdani, Jean-Marc Vesin, and David Atienza. 2017. A patient-specific methodology for

prediction of paroxysmal atrial fibrillation onset. In 2017, Computing in Cardiology (CinC). IEEE, Rennes, France, 1–4.

[5] Elisabetta De Giovanni, Fabio Montagna, Benoît W. Denkinger, Simone Machetti, Miguel Peón-Quirós, Simone Benatti, Davide Rossi, Luca Benini,and David Atienza. 2020. Modular Design and Optimization of Biomedical Applications for Ultra-Low Power Heterogeneous Platforms. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 39, 11 (2020), 3821-3832.

## Seizure detector support vector machine (SeizureDetSVM)

The SeizureDetSVM [1] works on ECG input and recognizes real-time epileptic episodes. The ECG signal is sampled from a single lead at 64 Hz with 16-bit accuracy for 60 seconds. The preprocessing phase consists of a simple moving average (MAVG) subtraction from the acquired signal. For feature extraction, the R-peak interval (RRI) and ECG-derived respiration (EDR) time series are calculated using the Pan-Tompkin (PT) [2] and ECG-EDR [3] algorithms, respectively. From RRI, heart-rate variability (HRV) features [4], and Lorenz plot features [5] are extracted. From EDR, the linear predictive coefficients and the power of different sub-bands of the power spectral density are used [4]. For the frequency feature extraction (FFE) of RRI and HRV, the Lomb-Scargle periodogram (PLOMB) algorithm [6], [7], which involves a fast Fourier transform (FFT), is used. For inference, a support vector machine (SVM) uses all the extracted features to classify the patient’s state.

PLOMB is the dominant kernel, accounting for more than 75 % of the execution time. Since the implementation is in 32-bit FxP arithmetic, the main operations are 32-bit integer multiplications with a 64-bit intermediate result followed by a shift. This application originally contained a self-aware mechanism to determine the number of features and the complexity of the SVM. For this benchmark, we only use the full pipeline to avoid variability among executions and

test the most complete version. Finally, we designed a parallel version of this application since it features a high degree of parallelism.

[1] Farnaz Forooghifar, Amir Aminifar, and David Atienza Alonso. 2018. Self-Aware Wearable Systems in Epileptic Seizure Detection. In DSD 2018. IEEE, Prague, Czech Republic, 426-432.

[2] Jiapu Pan and Willis J. Tompkins. 1985. A Real-Time QRS Detection Algorithm. IEEE Transactions on Biomedical Engineering BME-32, 3 (1985), 230-236.

[3] P. de Chazal, C. Heneghan, E. Sheridan, R. Reilly, P. Nolan, and M. O’Malley. 2003. Automated processing of the single-lead electrocardiogram for the detection of obstructive sleep apnoea. IEEE Transactions on Biomedical Engineering 50, 6 (2003), 686-696.

[4] Farnaz Forooghifar, Amir Aminifar, Leila Cammoun, Ilona Wisniewski, Carolina Ciumas, Philippe Ryvlin, and David Atienza Alonso. 2019. A Self-Aware Epilepsy Monitoring System for Real-Time Epileptic Seizure Detection. Mobile Networks and Applications 27 (2019), 677 – 690.

[5] Jesper Jeppesen, Sándor Beniczky, Peter Johansen, Per Sidenius, and Anders Fuglsang-Frederiksen. 2014. Using Lorenz plot and Cardiac Sympathetic Index of heart rate variability for detecting seizures for patients with epilepsy. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, Chicago, IL, USA, 4563-4566.

[6] N. R. Lomb. 1976. Least-squares frequency analysis of unequally spaced data. Astrophysics and Space Science 39, 2 (feb 1976), 447-462.

[7] Jeffrey Scargle. 1983. Studies in astronomical time series analysis. II – Statistical aspects of spectral analysis of unevenly spaced data. The Astrophysical Journal 263 (01 1983).

## Seizure detector convolutional neural network (SeizureDetCNN)

The SeizureDetCNN [1] is based on EEG data and detects real-time epileptic seizure episodes. The signal is sampled from 23 leads at 256 Hz with a 16-bit accuracy for 4 seconds. This application does not feature any signal preprocessing or feature extraction kernels. Instead, the input is sent directly to the input layer of a fully convolutional network (FCN). The proposed FCN architecture has three 1D convolutional layers, each including batch normalization, pooling, and ReLU layers, as well as two fully connected layers. Most computations are 16-bit FxP multiply-accumulate (MAC) operations due to convolution, as 90 % of the execution is spent in the convolutional layers. Moreover, this application includes a parallel implementation, as the high degree of parallelism is an inherent characteristic of CNNs.

## Cognitive workload monitor (CognWorkMon)

The CognWorkMon [1] is designed for real-time monitoring of the cognitive workload state of a subject and is based on EEG input. The EEG signal is sampled by four leads at 256 Hz with 32-bit accuracy. The input signal is processed in 14 4-second batches for a total of 56 seconds. Preprocessing and feature extraction are executed 14× per channel before the classification phase is executed. Preprocessing involves blink removal (BLR) and a band-pass filter (BPF) through infinite impulse response (IIR) filters. Feature extraction contains time-domain features (i.e., skewness/kurtosis, Hjorth activity), frequency-domain features (i.e., power spectral density), and entropy features. A random forest (RF) uses these features to classify the stress condition of the subject.

The extraction of frequency features, which contains the FFT, is the most demanding computational kernel, accounting for more than 80% of the total computation time. The main operations are 32-bit integer multiplications with a 64-bit intermediate result followed by a shift since we transformed the original application into an FxP implementation with a negligible accuracy drop.

## Gesture classifier (GestureClass)

The GestureClass [1] aims to classify hand gestures by inspecting signals captured by sEMG of the forearm. The idea is to extract the motor unit action potential train (MUAPT) and identify the motor neuron activity patterns to classify the hand gesture. The signal is sampled from 16 channels at 4 kHz with 24-bit accuracy for only 0.2 seconds. This application has no signal preprocessing. The authors apply a blind source separation (BSS) method to the input signal, namely independent component analysis (ICA), and classify the gesture using either an SVM or a multilayer Perceptron (MLP). In BiomedBench, we select the MLP to boost the variability of the kernels under test.

GestureClass is implemented in 32-bit floating-point arithmetic. The dominant workload is the ICA, which features matrix multiplications. Hence, the main operations are 32-bit FP MACs. We have included the original parallel implementation of this application and converted it to run on single-core platforms.

## Cough detector (CoughDet)

The CoughDet [1] is a novel application using non-invasive chest-worn biosensors to count the number of cough episodes people experience per day, thus providing a quantifiable means of evaluating the efficacy of chronic cough treatment. The device records audio data, sampled at 16 kHz with 32-bit precision, as well as 3-axis accelerometer and 3-axis gyroscope signals from an inertial measurement unit (IMU), each sampled at 100 Hz with 16-bit precision. Biosignals are processed every 0.3 seconds, based on the average duration of coughs in the training database [2].

Feature extraction includes computations in the time and frequency domain. Time-domain computations include the extraction of statistical values (such as zero crossing rate, root means-squared, and kurtosis) of the IMU signals. An FFT is used to extract spectral statistics (including standard deviation and dominant frequency), power spectral density, and mel-frequency cepstral coefficients (MFCC) [3] of the audio signal. Features extracted from audio and IMU signals are forwarded to an RF classifier that computes the probability of a cough event.

The MFCC constitutes the most intensive kernel that requires the iterative computation of FFT and transcendental functions (i.e., the logarithm in the discrete cosine transform(DCT)). The application is implemented in 32-bit FP arithmetic, and the main operations include FP multiplications.

[1] Orlandic L, Thevenot J, Teijeiro T, Atienza D. A Multimodal Dataset for Automatic Edge-AI Cough Detection. Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul:2023:1-7.

[2] Lara Orlandic, Jérôme Thevenot, Tomas Teijeiro, and David Atienza. 2023. A Multimodal Dataset for Automatic Edge-AI Cough Detection. Type: dataset.

[3] Vibha Tiwari. 2010. MFCC and its applications in speaker recognition. International Journal on Emerging Technologies 1, 1 (2010), 19-22

## Emotion classifier (EmotionClass)

The EmotionClass [1] classifies patients’ fear status to prevent gender-based violence based on three physiological signals: Galvanic skin response (GSR), PPG, and skin temperature (ST). PPG is sampled at 200 Hz with 32-bit precision, GSR is sampled at 5 Hz with 32-bit precision, and ST is sampled at 1 Hz with 16-bit precision. The acquisition window lasts 10 seconds and is divided into 10 batches of partial inference before the final classification is performed based on the 10 partial classifications.

EmotionClass has no signal preprocessing step. Feature extraction includes the average (AVG) of the three input signals over 1 second before forwarding them to a k-nearest neighbors (KNN) classifier. The classifier computes the distances of the new 3D tuple from the training points that have already been labeled as fear or no fear. Using *n* training points, we select the √*n* closest training points by running √*n* steps of selection sort before classifying the new tuple based on the percentage of neighboring fear-labeled points. We use 685 training points, which is a good tradeoff between accuracy and complexity [1].

EmotionClass uses both 16-bit and 32-bit FxP arithmetic, as it uses different representations for the three different signals. The dominant kernel is the KNN, which includes the 32-bit FP calculation and sorting of the Euclidean distances in 3D. Sorting includes multiple minimum search iterations over the array of distances.

## Biological back-propagation-free (Bio-BPfree)

Bio-BPfree [1] is the only benchmark that performs on-device training. Its main notion is to perform training per layer by maximizing the distance between the intermediate outputs of different classes and minimizing the distance between the intermediate outputs of the same class. Bio-BPfree avoids the prohibitive memory cost of backpropagation, thus opening possibilities for on-device training in ULP devices.

We use Bio-BPfree to fine-tune the FCN presented in [2] for seizure detection. The model is originally trained on the server using a leave-one-out-patient on the CHB-MIT database. Later, we retrain the model on the device with Bio-BPfree by exploiting the personalized samples acquired from the patient under test. The on-device training yields a significant improvement in the F1 score by up to 25%, thanks to the personalized samples available on the device while ensuring data privacy.

The implementation of Bio-BPfree is based on computing the loss function gradients with respect to the trainable parameters (CONV-GRAD). We define a custom loss function per layer [1] and then compute the gradient using the chain rule to account for the intermediate layers (e.g., ReLU, batch normalization, max pooling). The main operations are 32-bit FP MACs because of the convolution in the forward passes and the vector-matrix multiplications involved in the chain rule of the gradient computation. There is no acquisition phase. We assume that four pre-recorded input samples are already stored in the FLASH and expect the retraining to occur during the device charging phase. One epoch is executed for benchmark purposes.

Bio-BPfree (reduced): We apply BioBPfree to just one convolutional block layer of the CNN network [2] and scale down the dimensions of the convolution. We use an input of length 64 with 32 channels and 64 filters. Scaling down the dimensions does not change the computational profile of the application and allows us to deploy the application in devices with RAM < 300 KiB without involving the FLASH, thus ensuring a fair comparison among platforms.

[1] Saleh Baghersalimi, Alireza Amirshahi, Tomas Teijeiro, Amir Aminifar, and David Atienza. 2023. Layer-Wise Learning Framework for Efficient DNN Deployment in Biomedical Wearable Systems. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN). IEEE, Boston, MA, USA, 1-4.

[2] Catalina Gomez, Pablo Arbelaez, Miguel Navarrete, Catalina Alvarado-Rojas, Michel Le Van Quyen, and Mario Valderrama. 2020. Automatic seizure detection based on imaged-EEG signals through fully convolutional networks. Scientific Reports 10 (12 2020).