1. Introduction
The need for advanced physiological monitoring solutions has become increasingly apparent with the rising prevalence of chronic conditions such as cardiovascular diseases, diabetes, and stress-related disorders. The World Health Organization (WHO) [
1] states that work-related stress is a major problem affecting the labor market worldwide. Work-related stress is the body’s response to workplace demands and pressures that exceed an individual’s capacity to cope. Stress, along with depression and anxiety, is the second most common workplace health problem in Europe, leading to increasing sick leave. A survey by the European Agency for Safety and Health at Work revealed that 51% of Europe’s workers find stress “commonplace” in their workplace [
2]. Furthermore, 66% of European employees report experiencing unhealthy levels of work-related stress [
3]. The European Trade Union Institute (ETUI) estimates that work-related stress costs the EU over €100 billion annually [
4,
5]. This includes healthcare costs, lost productivity, and absenteeism due to stress-induced illnesses [
4]. Conventional monitoring approaches often fail to provide continuous assessment, timely interventions, and personalized insights. Traditional physiological monitoring systems typically rely on specialized medical equipment, limiting their accessibility and continuous application. AI-driven embedded systems address these limitations by enabling the intelligent processing of physiological signals, adaptive learning from user data, and context-aware health assessments.
Recent advancements in miniaturized sensors, low-power microprocessors, and efficient AI algorithms have accelerated the development of embedded physiological monitoring systems. Wearable sensors can monitor parameters such as blood pressure and support the management of conditions including epilepsy, diabetes, and cardiac and gastrointestinal disorders. Vital signs can now be monitored using sensors embedded in infusion pumps, chest bands, finger pulse oximeters [
6,
7], wrist-worn accelerometers (to measure movement in epilepsy patients) [
8] and EEG sensors to measure brain activity. These technologies converge to create platforms capable of capturing, processing, and interpreting physiological signals with high fidelity while operating within the stringent constraints of embedded environments. The Internet of Medical Things (IoMT) paradigm has further enhanced these capabilities by establishing interconnected ecosystems of physiological monitoring devices that communicate with cloud services, healthcare providers, and other stakeholders [
9].
Despite significant progress, the implementation of AI in embedded systems for physiological monitoring faces numerous challenges. These include hardware limitations, power consumption constraints, signal integrity issues, algorithmic complexity, and concerns related to privacy and security. Additionally, the accuracy and reliability of AI models in diverse real-world scenarios remain critical considerations for clinical adoption and user acceptance.
This paper presents a comprehensive methodology for the practical implementation of real-time emotion recognition on resource-constrained microcontrollers using physiological signals. Our main contribution is a structured workflow that systematically addresses the critical trade-off between model performance and the tight memory constraints of embedded hardware. The proposed solution uniquely integrates a multi-step process that includes: (1) the benchmarking of various machine learning architectures to identify optimal candidates; (2) the application of model compression as a crucial step to ensure deployability on the target microcontroller platform; and (3) empirical validation of the performance of compressed models under real-world conditions. This work demonstrates a promising and structured way to integrate sophisticated AI functions into everyday devices to continuously and unobtrusively monitor physiological signals.
The paper is structured as follows:
Section 2 reviews the state of the art in physiological monitoring and the challenges of deploying models on embedded systems.
Section 3 details the methodology, covering the dataset, data preprocessing, feature engineering, model training, and implementation on the microcontroller.
Section 4 discusses the user-study results. Finally,
Section 5 concludes the paper with a summary of the key findings.
3. Materials and Methods
3.1. Dataset Description
The dataset used in this study originates from a prior experimental investigation that aimed to determine whether physiological signals can reflect consistent patterns corresponding to different emotional states experienced by an individual. Specifically, the research focused on identifying recurring physiological signal patterns associated with a set of eight states: seven specific emotions (anger, hatred, sadness, platonic love, romantic love, joy, respect) and one neutral/baseline state (no emotion).
The dataset generated and analyzed during the current study is not publicly available but is available from the corresponding author on reasonable request. All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects involved in the study. Prior to their participation, all individuals were provided with a detailed explanation of the study’s objectives, procedures, and the nature of the data being collected. All participants provided verbal consent to participate.
The dataset comprises 160 individual recordings, capturing more than 5,760,000 heartbeats in total. These data were collected over a 20-day period. During each experimental session, 30 min of physiological data were continuously recorded, allowing for the detailed temporal analysis of heart-related signals across varying emotional conditions.
The data were collected from 12 healthy individuals (see
Table 3). The sample was composed of five female participants, aged between 25 and 35 years, with a mean age of 29.6 years, and seven male participants, whose mean age was 30 years, ranging from 24 to 36 years. Each subject participated in a single experimental session, during which they remained seated while watching and listening to video material designed to elicit specific emotional responses.
Figure 2 presents an example of a photoplethysmography (PPG) signal from the first subject of the first dataset, illustrating the nature of raw physiological data and the typical artifacts encountered in real-world acquisition scenarios. In the top panel, a 500 s segment of the PPG signal is shown. This longer recording period demonstrates the presence of low-frequency baseline drift, which is typically caused by human movement or varying sensor–skin contact pressure. The bottom panel shows a zoomed-in 4 s interval of the same signal. In this magnified view, high-frequency noise components become clearly visible.
3.2. Parameter Analysis
Physiological signal parameters can be classified into two main groups: biomedically significant parameters and signal quality parameters. Biomedically significant parameters are those directly related to human physiology and may be used to assess specific health conditions, emotional states, or other organism-related aspects. Signal quality parameters, on the other hand, are intended to evaluate the integrity, stability, and noise level of the physiological signal, all of which can significantly impact the accuracy and reliability of subsequent data analysis.
Key physiological parameters that can be derived from the PPG signal:
Figure 3 illustrates two photoplethysmography (PPG) signal pulses, highlighting three key measurable parameters. These three primary parameters provide a basis for calculating the additional physiological parameters mentioned earlier.
To maintain the focus of this research work on machine learning optimization and embedded hardware deployment, the standard mathematical formulas for these physiological parameters (e.g., calculating mean blood pressure from systolic and diastolic peaks) are omitted. For this study, the parameter values were statistically aggregated (e.g., averaged) over the epochs corresponding to the duration of the specific emotional stimuli. While this approach captures the sustained physiological state associated with the target emotion, the authors acknowledge that the length of the signal window plays a critical role in emotion recognition. Short windows are highly sensitive to noise and motion artifacts, whereas excessively long windows may fail to capture transient emotional spikes. Optimizing the temporal window size for feature extraction was outside of the scope of this hardware-focused study, but it remains a crucial direction for future work to further enhance the system’s sensitivity.
3.3. Data Preparation
Before model training, the essential preprocessing steps included organizing the datasets, removing signal noise, and eliminating corrupted components. This ensures that only high-quality, analyzable signals are used for further processing and parameter extraction.
Digital filtering techniques were used to reduce unwanted signal components and improve signal quality. In this work, a Butterworth Infinite Impulse Response (IIR) filter was selected due to its computational efficiency. Compared to Finite Impulse Response (FIR) filters, the IIR filter requires fewer coefficients to achieve an equivalent output response, which is particularly important in systems where computational resources are limited.
High-pass filtering was applied with a cutoff frequency of 0.5 Hz to remove baseline drift caused by low-frequency respiratory movements and blood flow artifacts. Although noise components generally lie within the 0 to 0.67 Hz frequency band, a 0.5 Hz cutoff was chosen to retain potentially useful signal components near 0.6 Hz, balancing noise suppression with signal preservation.
A 3rd-order low-pass filter with a cutoff frequency of 4 Hz was also implemented. Its main purpose was to suppress motion artifacts and eliminate the 50 Hz interference typically emitted by electrical devices. The attenuation characteristics of this filter were chosen to preserve the main physiological components of the signal, which are typically found between 0.6 Hz and 2 Hz. In addition, the filter parameters should ideally be adjusted to consider individual characteristics such as age, weight, height, and acute medical conditions, as these factors can significantly affect heart rate and signal morphology.
After filtering, the PPG signal was inverted to ensure that its peaks correspond to the systolic blood pressure maxima. This step facilitates the accurate detection of peak values using the peak detection algorithms. The first step of parameter extraction involved the detection of systolic (SBP) and diastolic (DBP) blood pressure peaks. Based on these initial parameters, other required parameters were subsequently calculated.
Figure 4 shows the points where DBP and SBP peaks are detected within the PPG signal sourced from the dataset. It is evident that the signal components were filtered, as significant distortions, which were present in
Figure 2, are no longer observable.
Following initial data preparation, normalization was performed. This is a data processing technique aimed at rescaling the data so that different features (attributes) are on a similar scale or contribute more equally. Normalization is particularly important for many machine learning algorithms, as some are sensitive to variations in feature scales; this process mitigates such influence.
For the normalization process, the StandardScaler from the Scikit-learn v1.2 library was used to apply the normalization. StandardScaler normalizes data by transforming them into
Z-scores, following a standard normal distribution. This means that the mean of each feature will be 0, and its standard deviation will be 1. This transformation is performed according to the formula
where
x is the original feature value; μ is the mean of the feature; and
σ is the standard deviation of the feature.
Normalization reduces differences in feature scales, which can help improve model performance, reduce training time, and make results more interpretable. Furthermore, it is particularly crucial when an algorithm relies on feature distances (e.g., k-means clustering or the K-Nearest Neighbors algorithm), as these algorithms are highly sensitive to variations in feature scales.
In the next step, cross-validation was used to assess the generalization abilities of models when they are applied to new data. The dataset was partitioned into N distinct subsets, commonly referred to as folds. The value of N was predetermined based on the size and characteristics of the dataset. In each iteration of the cross-validation process, the model was trained on N − 1 folds and tested on the remaining fold. This procedure was repeated N times, ensuring that each fold was used exactly once as the test set.
For each iteration, the performance metrics—in particular classification accuracy, precision, recall, and
F1-score—were computed. After all iterations, the metric values were averaged to produce a single performance estimate that reflects the model’s ability to generalize across the entire dataset. This methodology can be mathematically described as
where
is the average performance metric (e.g., accuracy);
N is the number of folds; and
is the metric obtained in the
i-th iteration.
Using cross-validation instead of a single train–test split ensures that all data points are used for both training and validation, which is particularly important in studies with limited sample sizes, as is often the case in physiological signal analysis. This helps minimize variance in performance estimation and supports more reliable conclusions regarding the model’s effectiveness.
3.4. Parameter Importance Evaluation
In this study, the importance of different parameters was analyzed using Random Forest regressors for each individual emotion. During this process, a Random Forest model was trained to predict emotional states based on the provided physiological data. This method enabled the identification of the most relevant features for each specific emotion and provided insights into their utility in emotion recognition systems.
The examination of the most significant parameter for each emotion revealed that the physiological differences among emotional states are diverse and complex. This finding contributes to a deeper understanding of the physiological underpinnings of emotions and their recognition. Notably, the standard deviation and RR interval emerged as highly influential parameters for most emotions, indicating that these two physiological markers are particularly critical for accurately distinguishing and interpreting different emotional states (see
Figure 5).
In addition, heart rate variability (HRV) and heart rate (HR) themselves demonstrated a substantial relevance for certain emotions, making them important criteria for identifying these emotional states. These metrics provide valuable temporal insights into autonomic nervous system dynamics, which are closely linked to emotional arousal and regulation.
On the other hand, the blood pressure-related parameters—such as systolic, diastolic, and mean arterial pressure—were found to have a lower overall importance in the emotion recognition process. However, for certain emotions, these parameters may still carry a degree of predictive value. This suggests that, although blood pressure is physiologically associated with emotional responses, its influence on recognition accuracy may be more limited compared to other features such as HRV or RR interval.
The analysis presented in
Table 4 highlights the most significant physiological features and their corresponding importance percentages for each emotion. Notably, the standard deviation and RR interval consistently emerged as key indicators across multiple emotional states. For instance, the RR interval was the most important parameter for emotions such as anger (16.6%), platonic love (16.2%), romantic love (17.2%), joy (19.9%), and respect (21.5%), suggesting that heart rate dynamics play a crucial role in distinguishing these emotions. Similarly, the standard deviation was most relevant for the “no emotion” state (18.3%) and hatred (15.7%), indicating the importance of signal variability. Additionally, sadness was the most strongly associated with the signal-to-noise ratio (16.1%), underscoring the relevance of signal quality. These findings emphasize that, while some parameters are broadly informative, the emotion recognition performance can be optimized by tailoring the feature sets to each specific emotion.
The dominance of the RR interval and the standard deviation (SD) as the most discriminative features is strongly supported by physiological and signal processing principles. Physiologically, the SD of the RR intervals is a primary time-domain measure of heart rate variability (HRV). HRV is directly modulated by the autonomic nervous system (ANS). High-arousal emotional states (such as anger or joy) trigger sympathetic nervous system activity, which typically shortens the RR interval and decreases the overall standard deviation of the rhythm.
Furthermore, from a signal processing perspective, these parameters exhibit high robustness against the specific noise profiles of a mouse-embedded PPG sensor. Real-world mouse usage introduces significant motion artifacts and variable skin contact pressure, which severely distort the amplitude of the PPG signal. Because the RR interval and SD are timing-based features (relying solely on the temporal detection of systolic peaks rather than absolute signal amplitude), they remain relatively stable even when the signal’s baseline wanders or its amplitude fluctuates. This explains why amplitude-dependent features, such as estimated systolic and diastolic blood pressure, ranked substantially lower in importance compared to robust temporal metrics.
3.5. Model Development
In this study, several machine learning algorithms—Random Forest, SVM, KNN, MLP, RNN, LSTM, and CNN—were applied for emotion recognition. For each algorithm, a separate model was developed and trained using both a maximal set of parameters and a reduced, optimal subset. The primary focus of model optimization was on adjusting the architecture and tuning the hyperparameters, including the learning rate and regularization coefficient. Hyperparameter optimization was carried out using grid search, random search, and Bayesian optimization methods, aiming to enhance the models’ performance for each algorithm.
To ensure the robustness and reproducibility of the models, a rigorous training protocol was implemented. The dataset was evaluated using a k-fold cross-validation strategy to avoid overfitting and ensure robust generalization. For deep learning architectures (MLP, CNN, LSTM, and RNN), given the multi-class nature of the emotion recognition task (eight different states), categorical cross-entropy was strictly applied as the loss function. Hyperparameter optimization (using grid search, random search, and Bayesian methods) yielded specific configurations for each architecture. The models were built using the Adam and RMSprop optimizers with learning rates ranging from 0.001 to 0.01. Specifically, the MLP model was trained for 200 epochs with a learning rate of 0.001. The CNN model was trained for 100 epochs using a batch size of 32 and a learning rate of 0.01, and including L1/L2 regularization (0.01) to reduce overfitting. The recurrent models (LSTM and RNN) used a time step sequence length of 8 and included dropout layers (set to 0.5) as an additional structural regularization measure. These precise configurations were chosen to balance classification accuracy with the tight computational and memory constraints of the target STM32F411 microcontroller.
The post-training comparison of model accuracy revealed a varying sensitivity to parameter optimization (see
Figure 6). The Random Forest (RF1) model slightly improved from 86.5% to 86.7%, indicating robustness and low overfitting. RF2 demonstrated a significant accuracy increase from 48.3% to 57.0%, showing a strong dependence on parameter tuning. The KNN model maintained a stable 88.8% accuracy, regardless of parameter count. The MLP saw a minor decrease from 71.6% to 70.9%, suggesting moderate stability. LSTM accuracy declined from 79.7% to 74.8% when optimized, suggesting that the more complex configuration performed better. CNN and RNN models improved notably—from 57.0% to 61.2% and from 50.0% to 58.4%, respectively—demonstrating a reliance on proper parameter configuration. The SVM model slightly decreased from 12.0% to 11.0% and remained the least accurate overall.
To gain deeper insights into the classification dynamics and to evaluate the trade-offs between theoretical accuracy and embedded feasibility, confusion matrices were generated for the two most significant optimized models (
Figure 7). The left matrix illustrates the performance of the K-Nearest Neighbors (KNN) model, which achieved the highest overall accuracy (88.8%) during initial benchmarking. As can be seen in the matrix, the KNN model successfully isolates the emotional states with minimal cross-talk, demonstrating strong diagonal alignment and only minor confusion between adjacent classes. However, due to its massive memory footprint (exceeding 134 MB), this model represents a theoretical upper bound rather than a practical embedded solution.
Conversely, the right matrix illustrates the performance of the optimized LSTM model, which was selected for the final embedded deployment due to its superior compression capabilities. While the LSTM matrix still demonstrates solid diagonal alignment, it reveals the realistic challenges of physiological signal classification. The model successfully distinguishes pronounced high-arousal emotions but exhibits noticeable cross-talk between specific affective states (e.g., instances of Class 6 being confused with Class 7).
3.6. Embedded Implementation of Machine Learning Models
This section presents the deployment of trained models on an STM32F411 microcontroller. A photoplethysmography (PPG) heart rate sensor is attached to a computer mouse to measure the user’s pulse through the thumb. The analog signal is processed by the microcontroller, which extracts features and performs emotion recognition. The resulting classification output is then transmitted to a computer via USB. The block diagram (
Figure 8) illustrates the system architecture and signal flow used for model validation and real-time inference.
Due to the limited memory capacity of the selected STM32F411 microcontroller (Flash—512 KB, RAM—128 KB), the RF, KNN, and SVM models were discarded because their coefficient arrays exceeded the available memory, with sizes of 4 MB, 134.9 MB, and 34.5 MB, respectively.
Despite the exclusion of RF, KNN, and SVM models due to excessive coefficient array sizes, the remaining models (MLP, CNN, LSTM, and RNN) still exceeded the STM32F411 microcontroller’s memory limits, requiring model compression.
Figure 9 illustrates how memory usage varies depending on the applied compression level. A notable decrease in memory demand occurs when shifting from lossless to low compression. With low compression, memory usage was reduced to 196.58 KB for MLP, 258.54 KB for CNN, 639.84 KB for LSTM, and 1009.83 KB for RNN. However, LSTM and RNN still exceeded the 512 KB flash memory limit by 1.25× and 1.97×, respectively, and therefore required medium-level compression. Under high compression, the models consumed 100.78 KB (MLP), 236.67 KB (CNN), 411.62 KB (LSTM), and 403.66 KB (RNN), corresponding to approximately 19.7%, 46.2%, 80.4%, and 78.8% of the available flash memory. This demonstrates that compression is essential to enable model deployment on the microcontroller, and the required compression level depends on the specific model and the hardware constraints.
The LSTM model was made memory-feasible through targeted structural optimization. While the initial uncompressed LSTM model (3.1 MB) exceeded the hardware limits, its parametric nature allowed for aggressive quantization. As shown in
Figure 9, by applying medium/high compression, the LSTM footprint was reduced to 411.62 KB. This footprint occupies approximately 80.4% of the available microcontroller Flash memory, leaving sufficient space for the application logic and signal processing stacks. In contrast, the KNN model is inherently non-parametric and requires the storage of the entire training dataset to perform inference. In our case, this resulted in a model size exceeding 134.9 MB, which is approximately 263 times larger than the total Flash memory of the STM32F411 microcontroller (512 KB). Quantizing or compressing a KNN model is not feasible without drastically reducing the reference dataset, which would negate its accuracy advantage.
The model compression pipeline was implemented using the TensorFlow Lite (TFLite) converter framework. This process involves translating the high-level Keras models into an optimized flatbuffer format, which includes dead-code elimination, operator fusion, and graph optimizations specifically tailored for embedded ARM Cortex-M environments. To evaluate the impact of bit-precision on memory and performance, four distinct compression levels were defined:
Lossless: Standard conversion without additional optimization parameters, maintaining original floating-point precision.
Low: Post-training dynamic range quantization, which quantizes weights to 8-bit integers while keeping activations in floating-point during inference.
Medium: Float16 quantization, reducing the precision of all weights and constants from 32-bit to 16-bit floating-point values.
High: Full integer quantization, which maps all model tensors (including input and output) to 8-bit integer precision. This compression level utilized a representative dataset to calibrate the dynamic ranges of activations.
Figure 10 illustrates the changes in model accuracy as a function of the applied compression level. A slight decrease in accuracy is observed across all models as compression increases.
The observed accuracy variations across different compression levels provide a critical measure of the model’s structural robustness. Specifically, the optimal LSTM model exhibited a marginal decrease in accuracy, dropping from 79.7% (uncompressed) to 79.0% (medium compression). This represents a mere 0.88% relative loss in predictive power, while simultaneously achieving a 7.57-fold reduction in the model size. Such a minor change falls within the expected variability for physiological signal processing, where the inherent noise floor of the PPG signal and the variance within the cross-validation folds typically exceed this percentage. Furthermore, this trade-off is highly favorable for Edge-AI applications: the negligible loss in accuracy is a necessary and justified compromise to overcome the binary constraint of the STM32F411’s 512 KB Flash memory. Consequently, the compressed LSTM model remains largely comparable to its original version in practical use in real-world emotion recognition tasks, while enabling local, real-time inference on low-cost hardware.
4. Experiments and Results
The aim of the experiment was to recognize human emotions using the machine learning model that demonstrated the highest accuracy in prior testing. The experiment involved presenting video clips intended to induce the following emotional states: no emotion, anger, hatred, sadness, platonic love, romantic love, joy, and respect.
A total of four participants took part in the study—two women and two men (see
Table 5). Each participant performed the same activity: while seated and holding a computer mouse embedded with a PPG (photoplethysmography) sensor, they watched video sequences specifically selected to evoke the target emotions. Each video lasted approximately five minutes, with a three-minute rest period between clips to minimize emotional overlap.
The experiment assessed the model’s average emotion recognition performance based on physiological signals captured during the video sessions. The results presented in
Figure 11 reflect the mean classification accuracy achieved across all participants and emotional categories.
The experiment evaluated the performance of an LSTM model in recognizing eight distinct emotions based on physiological responses from the participants. The analysis revealed that hatred was the most accurately recognized emotion across all participants, with accuracy ranging from 84% to 91%. Anger also showed consistently high recognition rates (80–85%), making these two emotions the most reliably detected by the model. At the other end of the spectrum, platonic love, joy, and romantic love were among the least accurately recognized emotions. For example, Participant 1 showed only a 50% accuracy for platonic love, while Participant 3 achieved just 56% for joy and 58% for romantic love.
Comparing participants, Participant 4 demonstrated the most stable performance, with high accuracy in hatred (91%), anger (80%), and romantic love (88%). Participant 1 also performed well for hatred (90%) and anger (84%) but had significant drops for more subtle emotions like platonic love. Participant 2 showed similar trends, with high accuracy for hatred (84%) but notable difficulties with romantic love (65%).
The stark contrast in recognition rates between high-intensity emotions and subtle affective states is unlikely to stem from a severely imbalanced training dataset, as the stimulus epochs were proportionately distributed. Instead, this disparity underscores a fundamental physiological limitation of relying solely on PPG-derived features. From a psychophysiological perspective, emotions like hatred and anger are characterized by high arousal, triggering distinct and immediate sympathetic nervous system responses (“fight-or-flight”), which drastically alter the RR intervals and heart rate variability (HRV). Conversely, subtle states like platonic love or the resting baseline represent low-arousal conditions. The physiological responses during these states are often indistinguishable from one another using only peripheral cardiovascular metrics. Distinguishing such nuanced emotions inherently requires either more sensitive multi-modal sensing (e.g., EEG or galvanic skin response) or significantly longer observational windows to capture subtle autonomic shifts.
Overall, the LSTM model was effective in recognizing strong, physiologically pronounced emotions such as hatred and anger, but its performance declined with more nuanced affective states, particularly those related to affection or positive social bonding. This highlights areas for model refinement, especially in enhancing sensitivity to subtler emotional patterns.
5. Conclusions
The experiments highlight the importance of parameter optimization in machine learning models, with performance varying significantly across architectures. RF and CNN models achieved accuracy improvements (up to 8.7% for RF and 4.2% for CNN) when fine-tuned, while KNN maintained stable performance (88.8%) regardless of parameter adjustments. In contrast, SVM proved impractical due to excessive training time and poor accuracy (11–12%). These findings suggest that, while some models benefit from parameter tuning, others, like KNN, offer inherent stability, making them preferable for certain applications despite the computational trade-offs.
The initial model sizes significantly exceeded the typical memory constraints of microcontrollers—for instance, even the smallest Random Forest model required around 4 MB, while the KNN model exceeded 229 MB, far surpassing the common 512 KB limit. This highlighted the necessity of compression to enable deployment in embedded systems. Despite a noticeable reduction in accuracy for MLP, CNN, LSTM, and RNN models under higher compression levels, all models remained functionally effective. Notably, the LSTM model consistently achieved the highest accuracy across all compression stages, demonstrating its robustness and making it a strong candidate for real-time emotion recognition tasks on resource-constrained devices.
Emotion-specific analysis revealed disparities in recognition performance, with a high accuracy for distinct emotions like hatred (91%) and anger (85%) but lower detection rates for subtle states such as platonic love (50%). This variability suggests that expanding training datasets and refining model architectures could improve the detection of complex affective states, including stress. While preliminary results validate the feasibility of physiology-based emotion recognition, further research is needed to enhance generalization across diverse emotional spectra. The study underscores the potential of AI in affective computing while highlighting the necessity of balancing computational efficiency, memory constraints, and predictive accuracy for real-world implementation.
While the proposed embedded ML models demonstrate promising accuracy, it is important to acknowledge the limitations of this study. The dataset was collected from 12 participants within a specific age group (24–36 years), which introduces a potential risk of bias toward this group. However, the risk of overfitting was actively mitigated through several strategies. First, the longitudinal nature of the data collection (over 20 days, yielding 5.7 million heartbeats) ensured high intra-subject variability. Second, rigorous feature selection reduced input dimensionality. Most importantly, the strict memory constraints of the STM32F411 microcontroller necessitated severe model compression. This compression inherently acts as a strong regularization technique, preventing the models from having the parameter capacity to overfit the training data. Therefore, this work serves primarily as a foundational proof-of-concept for embedded Edge-AI implementation in physiological monitoring. Future studies will require larger, more diverse cohorts to eliminate bias and further validate the generalizability of these highly compressed embedded models.