Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild

He, Maxine; Alkurdi, Abdulrahman; Clore, Jean L.; Sowers, Richard B.; Hsiao-Wecksler, Elizabeth T.; Hernandez, Manuel E.

doi:10.3390/app151810099

Open AccessReview

Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild

by

Maxine He

^1,†

,

Abdulrahman Alkurdi

^2,†

,

Jean L. Clore

³,

Richard B. Sowers

⁴

,

Elizabeth T. Hsiao-Wecksler

^1,2,5

and

Manuel E. Hernandez

^1,5,6,7,8,*

¹

Neuroscience Program, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

²

Department of Mechanical Science and Engineering, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

³

Department of Psychiatry and Behavioral Medicine, University of Illinois College of Medicine Peoria, University of Illinois Chicago (UIC), Peoria, IL 61805, USA

⁴

Department of Industrial and Enterprise Systems Engineering, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

⁵

Department of Biomedical and Translational Sciences, Carle Illinois College of Medicine, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

⁶

Department of Health and Kinesiology, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

⁷

Department of Bioengineering, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

⁸

Beckman Institute, University of Illinois Urbana-Champaign (UIUC), Urbana, IL 61801, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(18), 10099; https://doi.org/10.3390/app151810099

Submission received: 24 July 2025 / Revised: 4 September 2025 / Accepted: 9 September 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Application of Deep Learning and Big Data Processing)

Download

Browse Figures

Versions Notes

Abstract

The field of anxiety detection and use of machine learning (ML) has experienced rapid growth necessitating an updated review of commonly used ML models and their performance, anxiety-inducing methodologies, data collection conditions, and dataset utilization. Feature-based ML models have been extensively employed due to their interpretability and simplicity. However, these models require manual feature engineering, which can be labor-intensive and potentially biased. End-to-end deep learning models have emerged as alternatives, capable of utilizing raw signal directly and handling large datasets. This review aims to provide a detailed exploration of anxiety detection using ML, including use of feature-based vs. end-to-end models, a taxonomy of stressors, performance benchmarks, challenges in deployment to real-world scenarios, and generalizability of findings, given limitations in sociodemographic diversity and heterogeneity in the use of validated anxiety measures. A total of 105 eligible papers were retrieved from the Scopus, IEEE Xplore, and PubMed databases. Stressors were categorized into six distinct types—social, mental, physical, emotional, driving, and daily-life stressors—to provide a better overview of methodologies used to elicit anxiety. Papers were organized according to the type of data collection—lab-based or real-world conditions—and characterized through the type of anxiety instrument used, population examined, and classification performance. This review underscores the need for further investigation into model architecture and their suitability for different types of data, limitations in population diversity and representation in existing studies, and advocating for a more nuanced and personalized approach to anxiety detection using machine learning.

Keywords:

anxiety; deep learning; machine learning

1. Introduction

Mental health includes cognitive wellness and the ability to cope with everyday adverse events. An appropriate amount of anxiety (i.e., excessive worrying, rumination, or fear of future threats) is essential for survival and motivation. Prolonged and heightened anxiety, however, can lead to debilitating anxiety disorders [1]. The global COVID-19 pandemic significantly worsened quality of life, with isolation, quarantine, and social avoidance further exacerbating mental health issues, leading to increased reports of stress, anxiety, and depression [2]. Anxiety disorders, particularly, have become a major global health challenge due to their high prevalence and substantial economic impact, yet access to mental health services remains limited [3,4]. Anxiety disorders are also associated with increased physical health conditions like cardiovascular diseases and weakened immunity, which could further impair decision making and cognitive performance [5,6]. These issues highlight the need for the development of effective diagnostic and therapeutic tools to improve treatment access and support mental health care systems [7,8].

The detection and monitoring of anxiety through machine learning (ML) has become an increasingly crucial area of research in affective computing. Significant advancements have been made with traditional feature-based (FB) models. Such models are popular for their interpretability and ease of implementation. They rely on manually engineered features derived from physiological or behavioral signals, enabling domain knowledge to play a critical role in their performance. However, this manual feature engineering process can be labor-intensive and may introduce biases or fail to capture subtle, complex patterns within the data [9,10].

End-to-end (E2E) approaches eliminate the need for explicit feature extraction by learning representations directly from raw data. These models, often including deep learning models, excel in handling complex, nonlinear relationships and temporal dependencies in time-series signals [11,12]. A comparative study [11] highlights the dependency of E2E models on dataset size and quality, while also noting that FB models retain their relevance in many affective computing applications. Despite their computational intensity and less transparent operation, these models have demonstrated comparable performance in anxiety-detection tasks.

As the volume of academic work in this field continues to grow, there is a pressing need to review existing model architectures, methodologies, data collection conditions, and dataset utilization. This review aims to provide a comprehensive overview and summary of the current state-of-the-art research in ML’s application to anxiety detection, a taxonomy of stressors, and performance benchmarks. This review also seeks to examine the challenges of deploying ML models in real-world scenarios and the generalizability of findings given limitations in sociodemographic diversity and heterogeneity in the use of validated anxiety measures. Lastly, this review explores future research directions for advancing the field toward the future of robust ML models and personalized healthcare.

1.1. Defining Anxiety

Stress and anxiety are closely related concepts often used interchangeably in the literature, which can obscure definitions and reduce clarity. Stress is typically characterized as a current stimulus that triggers temporary physiological and psychological responses [13]. Anxiety, a negative affective and cognitive state that contributes to mental health disorders, consists of excessive worrying, rumination, or fear of future threats [14], as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [15]. Spielberger’s theory further differentiates anxiety into state and trait anxiety [13,16,17]. State anxiety refers to a temporary and transient negative emotional reaction accompanied by physiological responses to stressful stimuli. Trait anxiety, on the other hand, denotes an individual’s inherent tendency to experience constant anxiety to future threats [18]. Thus, through the focus on state anxiety, an early indicator of maladaptive responses can be identified.

While stress and anxiety share overlapping physiological responses mediated by the hypothalamic–pituitary–adrenal (HPA) axis and sympathetic nervous system (e.g., cortisol release, increased heart rate) [14,19], this review focuses on ‘state anxiety’ to describe the transient and stimuli-driven physiological responses. Many studies use the term ‘stress’ to describe these responses, while their investigations often align closely with what would be categorized as ‘state anxiety’, detecting physiological markers and psychological responses to transient threats [20,21,22,23,24]. While studies have used State–Trait Anxiety Inventory (STAI) [17] scores to confirm an emotional response to anxiety-inducing stimuli [20,21,25], validated measures are not always utilized. Furthermore, although the terms ‘stress’ and ‘stressors’ are sometimes used interchangeably to refer to both the cause and responses to adverse stimuli [26], we reserve the term ‘stressor’ specifically for the adverse stimuli presented to participants to evoke anxiety.

Throughout this review, we will use ‘anxiety’ as a simplified alternative term for state anxiety, capturing both subjective emotional states and physiological responses for understanding the impact of stressors. This approach, which distinguishes stressors (external triggers) from anxiety (cognitive, emotional, and physiological responses), aligns with Bystritsky and Kronemyer’s transdiagnostic model of the ‘stress/anxiety complex’ [26]. Here, stress and anxiety are conceptualized as interdependent phenomena on a continuum, where stressors elicit anxiety and anxiety exacerbate stress reactivity through shared neurobiological pathways (e.g., HPA axis dysregulation) [26]. While their framework underscores the interdependence of these phenomena, the persistent conflation of ‘stress’ and ‘anxiety’ in the literature necessitated the inclusion of ‘stress’ in our search strategy to ensure a comprehensive review of the current field.

1.2. Measuring Anxiety

There are three interconnected domains: subjective, behavioral, and physiological systems [27]. The subjective system captures an individual’s self-reported experiences of anxiety, often through questionnaires or clinical interviews. The behavioral system encompasses observable expressions, such as facial expressions, body language, and other non-verbal cues. Finally, the physiological system reflects the biological and autonomic processes underlying anxiety.

i.: Traditional Measures

Traditional anxiety assessment tools include standardized questionnaires such as the STAI [17] and the Beck Anxiety Inventory [28]. The subjective measures are straightforward to administer, though these methods are limited by recall bias, a systematic error due to inaccuracies in remembering past events, and their inability to monitor changes in real time across different contexts [9,29]. Clinical interviews conducted by trained professionals include assessment of anxiety symptoms, history of mental illness, and stressful events exposure. They offer a more precise diagnosis and cause of anxiety disorders than self-reports but are costly and time-consuming [30].

ii.: Behavioral Measures

Behavioral observations focus on non-verbal aspects like facial expressions and body language and are useful for those unable to complete questionnaires or discuss their emotions. However, these observations can be subjective and may not accurately reflect anxiety in individuals who may suppress their symptoms [31,32].

iii.: Physiological Measures

The exploration of anxiety detection has significantly benefited from physiological measures, notably through the analysis of brain and cardiac activities, electromyography (EMG), respiration, and electrodermal activity (EDA). Brain activity measurements such as electroencephalograms (EEGs), functional near-infrared spectroscopy, and magnetic resonance imaging offer insights into neurological responses to anxiety, despite challenges such as spatial resolution, susceptibility to noise, and cost [33]. Cardiac activity, particularly heart rate variability (HRV) interpolated from an electrocardiogram (ECG) or photoplethysmography (PPG), has been established as a robust indicator of anxiety, with advancements in wearable technologies enhancing real-time, non-invasive monitoring [34,35]. EMG [36] also provides peripheral insights into anxiety levels through muscle tension and autonomic nervous system activity [37,38,39]. EDA [40] and other measures, including pupil dilation [41] and skin temperature changes [42], have been incorporated into ML algorithms for anxiety detection, demonstrating the potential of non-invasive techniques [35,43]. Behavioral measures, leveraging observable physical and interactive markers, offer a complementary approach to understanding anxiety through the analysis of physical activity, voice characteristics, or other behavioral patterns, despite the need for adaptation to individual differences and real-world applicability [44,45]. These multifaceted approaches underscore the complexity of anxiety detection and the necessity for diverse measurement techniques to capture its nuanced manifestations.

1.3. Inducing Anxiety

Anxiety can be elicited in the laboratory by presenting different stressors. In-lab environments are specifically designed to conduct experiments under standardized conditions and allow researchers to carefully manipulate variables while minimizing noise and artifacts, external influences, and confounding factors. The consistent and controlled atmosphere enables the collection of high-quality data to ensure that any observed effects on anxiety are attributable to the experimental stressor rather than random variables. On the other hand, real-world conditions passively monitor participants’ responses with minimal intervention or control. Here we characterize these stressors into six main categories: social, mental, physical, emotional, driving, and daily-life stressors.

Social stressors, such as the Trier Social Stress Test (TSST) [46] are designed to induce anxiety through social evaluation and performance pressure such as public speaking or tasks involving public evaluation. Participants are asked to perform these tasks in front of an audience or evaluators, who often maintain neutral or critical expressions to trigger stress responses associated with social evaluative threat and fear of negative judgment.

Mental stressors typically involve tasks that demand high levels of cognitive processing, thus creating conditions of mental overload known to induce stress. Commonly used tasks include the Stroop Color–Word Test (SCWT), mental arithmetic, and n-back tasks. SCWT is a classic psychological task used to evaluate the cognitive inference, attention, and selective inhibition response via the Stroop effect [47] to elicit state anxiety and acute stress responses [48]. SCWT typically involves congruent and incongruent conditions, in which participants are required to name the color of presented words. The word itself might be colored by the same (congruent) or a different color (incongruent; e.g., the word ‘blue’ printed in red font). In incongruent conditions, the conflict between the word’s meaning and the font color creates cognitive interference, requiring the brain to inhibit automatic responses, leading to increased mental effort and stress [47]. Mental arithmetic tasks are another common type of mental stressor. These tasks require quick mental calculations and induce anxiety by taxing cognitive resources, such as working memory and processing speed, as well as frustration or stress due to their difficulty, the potential for errors, or time pressures [47]. The n-back working memory task is another popular mental task that requires participants to monitor a sequence of stimuli and indicate when the current stimulus matches the one presented n steps earlier [49]. This task increases in difficulty as the value of n increases to put greater demands on the participant’s working memory and cognitive control.

Physical stressors, such as the cold pressor test [50], elicit anxiety responses through physical exertion, discomfort, or pain. During the cold pressor test, participants immerse their hand in ice-cold water for a set period, activating the sympathetic nervous system due to the discomfort associated with cold exposure.

Emotional stressors involve exposing participants to stressful images or videos, such as those from the International Affective Picture System (IAPS) [51]. These stimuli are designed to elicit negative emotional states like fear, disgust, or sadness for anxiety responses.

Driving stressors are listed as a separate category because driving encompasses a combination of cognitive, emotional, and physical demands that requires attention, decision making, and coordination of motor movements [52,53]. It is also a common real-world activity, and responses induced in this context are highly relevant for studying anxiety detection in daily functioning [53]. Driving tasks usually involve simulated driving tasks that place participants in virtual or controlled real-world driving environments mimicking challenging or demanding driving conditions. These scenarios induce anxiety due to the need for constant attention, quick decision making, and the potential for simulated hazards, reflecting real-world stressors faced by drivers. Such stressors are particularly relevant for studying anxiety in contexts where performance and safety are critical.

Daily-life stressors refer to anxiety-inducing situations encountered in naturalistic settings, such as workplace challenges and academic examinations. There is a growing interest in real-world (‘in-the-wild’) conditions, where data collection occurs in naturalistic settings as participants conduct their routine daily activities with little or no interference or supervision. These scenarios provide a more ecologically valid context for studying anxiety [54,55].

1.4. Detecting Anxiety

ML-based anxiety-detection efforts predominantly utilize FB models, which require extensive domain-specific knowledge for manual feature extraction and engineering [12,44]. These models often depend on handcrafted features derived from physiological signals or behavioral data. The typical process involves the following steps: (1) data collection, where physiological signals are gathered from participants; (2) data preprocessing, which includes removing noise and artifacts; (3) feature extraction, which involves the derivation of relevant features; (4) feature selection using techniques to retain the most informative features; (5) model training, where selected features are used to train classification or prediction models, with optional hyperparameter fine-tuning to optimize performance; (6) model validation employing methods such as test datasets, k-fold cross-validation, or leave-one-participant-out validation; (7) model evaluation using performance metrics like accuracy, precision, recall, and F1-scores to assess effectiveness (Figure 1).

FB models often allow easier interpretation of results and identification of features contributing the most to the classification or prediction, and experts with domain knowledge can incorporate features known to be relevant to improve the model’s performance. FB models generally require less computation power and can be suitable for small datasets. However, relying on feature extraction and selection could potentially introduce bias and information loss. For instance, important subtle indicators of anxiety might be overlooked if they are not part of the selected features. Generalizability may also be limited because handcrafted features may not generalize well to new datasets or populations, and FB models may struggle with the nonlinear nature and complexities in physiological signals. In contrast, the field has seen a significant shift toward E2E deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM). These models can automatically learn hierarchical representations from raw data, reducing the need for manual feature extraction and potentially capturing more complex and subtle patterns associated with anxiety [12]. Advancements in computational power, the availability of large datasets, and improved algorithms have facilitated this transition, enabling models that can generalize better across different populations and settings.

2. Review of Anxiety Detection Using ML

2.1. Methods

The literature review methodology followed PRISMA scoping review guidelines and included a thorough search across the Scopus, IEEE Xplore, and PubMed databases, covering publications from 2010 to November 2024. The search was conducted using keyword combinations, such as ‘stress or anxiety or mental workload detection,’ and ‘AI or machine learning or deep learning’ (Table 1). A systematic screening process was implemented, involving title screening, abstract evaluation, and full-text analysis. Inclusion criteria focused on original articles that used one or more stressors to induce anxiety and used physiological signals to measure anxiety. Review papers were excluded from consideration. No other exclusion criteria was applied. No registration of the review protocol is available.

The initial search yielded 3331 articles, i.e., 2061, 1103, and 167 articles from Scopus, IEEE Xplore, and PubMed, respectively (Figure 2). Title screening narrowed the field to 2040. Further abstract evaluation reduced this number to 446, and a full-text review culminated in 105 relevant papers. Data was independently extracted from these articles by the authors (A.A. and M.H.), and included ML architectures, ground truth questionnaires, environmental conditions, tasks for inducing anxiety, model performance, physiological signals, and datasets utilized. A more detailed summary of each paper, categorized based on stressor types, can be found in the Supplementary Materials, Tables S1–S6, and was used to evaluate the generalizability of the findings through examination of the sociodemographic information, sample size, and use of validated anxiety measures. These articles were further segregated between ML approaches that used FB models (98 articles) [21,22,23,24,25,42,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145] or E2E models (16 articles) [11,23,56,57,69,73,87,124,146,147,148,149,150,151,152,153] (Table 2 and Table 3). Seven papers overlapped and used both FB and E2E models [23], [56,57,69,73,87,124] (note that [56,73] used traditional ML models for E2E).

2.2. ML Models and Architectures

2.2.1. FB Models

Early studies in anxiety detection predominantly utilized FB models including traditional ML and deep learning models (Table 2, alphabetized by last names). These models depend on engineered features specifically chosen for their relevance to the state of interest and are often supervised by manually created labels or ground truth. Their advantages include interpretability, simplicity, and lower computational demands, making them particularly effective when working with limited datasets. However, the performance of FB models may be dependent on the quality of features, which requires domain expertise. Deep learning methods within the FB paradigm, while also requiring engineered features as input, process these features through networks inspired by the human brain’s structure, consisting of interconnected nodes organized into layers: one input layer, one or more hidden layers, and an output layer [154]. Artificial Neural Networks (ANNs) serve as the foundation for more complex architectures, with feed-forward neural networks being the simplest form where information flows only in one direction from input to output. Building upon ANNs, specialized architectures like CNN originally designed to process images have become increasingly popular for processing engineered features in anxiety detection [12]. However, regardless of whether traditional ML or deep learning is used, the performance of FB models remains dependent on the quality of the engineered features, which requires domain expertise [154].

2.2.2. E2E Models

E2E models, typically based on deep learning, operate directly on raw data without the need for manual feature engineering (Table 3, alphabetized by last names). These models, like CNN and long short-term memory (LSTM), can capture intricate details through complex architecture and automatically learn features to capture patterns and relationships within the data. LSTM is a specialized deep learning architecture, as a subset of recurrent neural networks, designed to handle sequential data by maintaining a memory cell that can selectively remember or forget information over time, making it ideal for time-series physiological signals [155]. For example, Onim & Thapliyal [143] achieved 96% accuracy using LSTM to analyze EDA, BVP, and temperature data from wearable devices for detecting stress in elderly individuals. However, these models are often criticized for being ‘black box’ models due to a lack of interpretability and are computationally intensive, requiring substantial data to prevent overfitting [114].

3. Results

3.1. ML Techniques and Performances

The review of the literature revealed a diverse range of FB machine-learning approaches applied to anxiety detection using physiological signals (Table 3). Deep learning algorithms have also gained popularity in recent studies using extracted features, suggesting a transitional phase before fully E2E models are more broadly adopted [132,135]. Traditional machine-learning models using engineered features have demonstrated a consistently strong performance across multiple studies. Among classical algorithms, SVM was the most prevalent, used in 60 studies, followed by RF in 40 studies, and kNN in 39 studies. SVM has been particularly effective, as shown by Schmidt et al. [122], publisher of the WESAD dataset (see Section 3.2), who achieved 93.12% accuracy for binary classification. RF has also shown impressive results, with [114] reporting accuracy as high as 99.5% using HRV features from the same WESAD dataset [122]. These classical FB models perform well with smaller datasets and provide interpretable results through feature importance analysis [100].

E2E approaches have shown varying degrees of success, with performance ranging significantly across studies. Fan et al. [148] demonstrated strong results using a CNN with a Convolutional Attention Block Module, achieving 97.5% accuracy in three-class classification using the WESAD dataset. However, other studies show more modest results—Dziezyc et al. [11] reported 79% accuracy using fully convolutional networks on raw physiological signals. This variability in performance is particularly evident in real-world applications, as shown by [107], where accuracy dropped significantly from lab conditions (76.4%) to in-the-wild conditions (64.7–70.1% F1-score).

The performance of these ML models, as measured typically by accuracy, also varied depending on the nature of the task used to induce stress, the environment of the data collection, and the signals utilized. For instance, laboratory studies generally reported higher accuracy, with models achieving up to 99% accuracy [63,114]. In contrast, studies that attempted to detect stress in wild or semi-wild environments faced more challenges in that they often showed a decrease in model performance due to the increased noise and variability in the data [116,117].

Also, most of the studies focused on binary classification (e.g., no anxiety vs. anxiety), while a few also moved on to multiclass classification (e.g., low vs. moderate vs. high anxiety) for added granularity in anxiety detection. For example, Schmidt et al. [122] and Bobade et al. [67] compared binary classification with three-class classification (baseline vs. amusement vs. stress) using the WESAD dataset. However, it is noted that the multiclass classification often yielded a lower performance due to the intrinsic complexity in multiclass problems and subtle difference in physiological responses across tasks [60,62,67,122]. There has also been a trend of using ensemble methods to incorporate multiple models and improve results [54,55,85,97,106,131].

3.2. Open Datasets for Anxiety Detection

Several widely used datasets are employed to validate model performance on various stressor types for anxiety detection.

Social Stressors: The WESAD (Wearable Stress and Affect Detection) dataset includes data from 15 participants exposed to the TSST alongside baseline and amusement conditions [122]. It provides multimodal measurements using ECG, EDA, PPG, respiration, temperature, and acceleration.

Mental Stressors: The SWELL-KN (Smart Reasoning for Well-being at Home and at Work—Knowledge Work) dataset examines mental workload via HRV in 25 participants performing simulated office tasks, such as writing reports and reading emails, under no stress, time pressure, or interruptions [156]. The CLAS (Cognitive Load and Stress) dataset focuses on cognitive stress, featuring ECG, EDA, and respiration data collected during tasks with varying mental stressors (math problems, logic problems, and SCWT). The MAUS (Multimodal Affect and Understanding Stress) dataset used n-back tests for mental stress and provides ECG, EDA, and PPG [49]. The CogLoad dataset captures ECG, EDA, and body temperature during six mental tasks [157]. Similarly, the EEG During Mental Arithmetic Tasks Dataset provides EEG data to analyze neural activity using mental stressors [158]. The SAM40 (Stress and Anxiety Monitoring) dataset offers EEG with peripheral signals like ECG, EDA, and respiration data collected from 40 participants during SCWT and mirror image recognition tasks [159].

Driving Stressors: The SRAD (Stress Recognition in Automobile Drivers) dataset contains multimodal data of 17 healthy drivers completing driving tasks in a semi-controlled wild condition of driving around Boston, MA, USA [160]. ECG, EMG, foot and hand EDA, and respiration data were collected. The protocol included a resting state, driving in the city, and driving on the highway. The AffectiveROAD dataset emphasizes affective states during driving and includes ECG, EDA, and respiration to also provide multimodal understanding of driving stress [161].

Emotional Stressors: The DEAP (Dataset for Emotion Analysis using Physiological Signals) includes EEG, ECG, EDA, respiration, and BVP signals from 32 participants watching music videos designed to evoke specific emotional responses [162]. It features self-reported measures of arousal, valence, and dominance to annotate emotional states. The CASE (Continuously Annotated Signals of Emotion) dataset has physiological signals (ECG, BVP, EMG, EDA, respiration, and skin temperature) from 30 participants viewing video clips that contained different emotions [163].

Daily-Life Stressors: The Multimodal Dataset for Nurses provides ECG, EDA, and skin temperature data collected from nurses during physically demanding work shifts, with observational data on workload and patient interactions adding context to the stress measurements [164].

Mixed Types of Stressors: The Non-EEG Dataset for Neural Status Assessment captures non-EEG signals and uses multiple types of stressors. ECG, EDA, and respiration were collected during multiple anxiety-inducing conditions, including a physical stressor (physical exercise), mental stressor (SCWT), and emotional stressor (watching a horror movie), to evaluate neurological states [165].

3.3. Model Performances Based on Stressor Types

Table 4 separates the tasks used into different stressor categories, namely social, mental, emotional, driving, physical, and daily-life stressors. Seventy-nine studies used a single stressor in the protocol, while twenty-six studies used a combination of different types of stressors. Ninety-three studies were conducted in a lab-controlled environment, while nine studies focused on natural settings for real-world anxiety detection, five of which included both lab and in-the-wild conditions. Eight papers using the SRAD dataset were considered as a ‘semi-wild environment’, combining real-world variability with a predefined protocol to monitor drivers’ physiological states [160].

3.3.1. Social Stressors

Overall, over 82% of studies reported sex composition and more than 72% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S1). However, 70% of studies used samples of less than 20 participants and less than 1% of studies reported race or ethnicity, which limits generalizability. TSST is mostly used to induce social-evaluative threat and state anxiety, and the WESAD dataset is a common benchmark for comparing and evaluating model performance. Most studies used the study protocol design of the dataset as the ground truth, while a few also used STAI scores or clustering to label anxiety states [24,100,127,136]. ECG, BVP via PPG, EDA, and respiration were common physiological signals used. Studies often combined these signals to capture a more multimodal view of physiological changes associated with anxiety, as each signal provides unique insights into autonomic nervous system activity. Commonly extracted features included statistical measures (mean, standard deviation, min, max), HRV metrics such as root mean square of successive differences (RMSSD), low-frequency (LF) to high-frequency (HF) ratios, and EDA-derived metrics like skin conductance peaks and rise time. Notably, studies using feature selection often found that EDA- and ECG-derived features were the most predictive of anxiety levels [113,127,153].

Traditional models dominate in anxiety-detection research. SVM was frequently used for binary classification tasks and often demonstrated high accuracy. For example, Akella et al. [59] reported SVM accuracy ranging from 24% to 91% using EEG signals, and Sandulescu et al. [98] achieved SVM accuracy between 73.26% and 83.08% using PPG and EDA signals. RF, AdaBoost, and kNN are also common FB ML models showing high accuracy in anxiety detection, and cardiovascular signals and EDA signals are popular candidates as biomarkers of stress responses. Moreover, it was observed that binary classification of stress tended to achieve higher accuracy compared to multiclass classification, where distinguishing between different types of stressors or different levels can be more challenging.

More recent studies adopt deep learning models, and their performance is comparable to traditional FB models. Studies report that CNN and LSTM had a comparable, sometimes higher, performance compared to traditional ML models, sometimes achieving accuracies above 90% in binary classification tasks [131,132,135,136,143]. Bobade et al. [67] found that ANN outperformed other classifiers, achieving 95.2% accuracy in binary classification, while Chatterjee et al. [136] demonstrated that CNN-based models could achieve accuracies ranging from 90.3% to 94.2%. Studies such as [11,148,149,152] have reported promising results with deep learning and E2E architectures, indicating that these models can learn effective feature representations from raw signals. This advantage reduces the bias introduced by manual feature selection, enhancing model generalizability.

In terms of other ML performance metrics, F1-scores ranged from 75% to 97%, with deep models showing high performance. For instance, Fan et al. [148] achieved F1-scores up to 97.8% using deep CNNs, with marginal gains from attention modules. Another study [132] reported F1-scores of 96.5–97.7% on WESAD and 94.4–96.4% on SWELL using a graph CNN trained on HRV features. Chatterjee et al. [136] also reached ~90% F1-scores using a lightweight CNN for both binary and multiclass tasks. Traditional models performed well on WESAD datasets, with SVMs reaching 91.4% [79].

Area under the curve (AUC) values provided additional insights. Dziezyc et al. [11] showed FCNs outperforming ResNet and CNN-LSTM with AUCs of 91% vs. 89% and lower. Vaz et al. [24] found RF and XGBoost achieving AUCs of 91–99% using multimodal features, while another study [145] demonstrated that self-supervised pretraining boosted AUCs from 73.2% to 99.9%. Validation methods included widespread use of leave-one-subject-out validation (LOSO) for subject-independent assessment (e.g., [67,111,121]), along with traditional 5- or 10-fold cross-validation CV (e.g., [11,68,148]).

3.3.2. Mental Stressors

This category is the most represented across studies, with 58 studies focusing on tasks that induce anxiety through mental cognitive tasks. Overall, 10% of studies used samples of less than 20 participants, over 60% of studies reported sex composition, less than 4% of studies reported race or ethnicity, and 40% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S2). Sample sizes ranged widely in number, from as few as 5 to as many as 90 participants, and included diverse groups such as teenagers, college students, and older adults. In addition to the aforementioned common mental stressors in Section 3.1, some studies also used more unique tasks such as gaming or virtual reality exercises [89,113,139]. EEG, HRV derived from ECG or PPG, and EDA were commonly used for anxiety detection induced by mental stressors. Some studies also reported their preprocessing steps to ensure data quality, using techniques like independent component analysis and bandpass filtering to remove noise and artifacts for EEG [89,105]. Other methods like template matching, threshold filters, outlier removal and empirical mode decomposition, and baseline correction were usually used for ECG and PPG signals for reliable HRV extraction [87,95,96,119].

While deep learning models like CNNs and LSTMs achieved high accuracy, traditional machine-learning models such as SVM and RF still demonstrated strong performance, particularly in binary tasks. For example, in the study by Badr et al. [63], SVMs achieved accuracies as high as 99.58% [89,113,139] in detecting stress using EEG data during SCWT. Benchekroun et al. [112] reported RF accuracies between 71% and 84% for stress detection, and Campanella et al. [68] found similar results using a combination of PPG and EDA features.

Deep learning methods have been increasingly applied, especially in studies using EEG data. For example, Appriou et al. [134] achieved a 70% accuracy in classifying mental workload with CNNs, although the small dataset limited their effectiveness. LSTM networks, known for their ability to analyze sequential data, achieved a 95% accuracy in stress detection during gaming tasks in the study by Dhaouadi et al. [139]. Ensemble methods further enhanced performance by combining predictions from multiple models for final classification [85,106,171].

F1-scores demonstrated substantial heterogeneity, ranging from modest values of 39% in wild conditions using RF [116] to exceptional performance reaching 99% in controlled laboratory settings [117]. Notably, several studies achieved F1-scores above 90%, including [132] with Graph CNN achieving 94.39–96.37% on the SWELL dataset, and [106] reporting 99.74% using ensemble methods for EEG-based classification.

AUC metrics generally indicated robust discriminative performance. Chandra et al. (2024) [56] achieved AUC values of 99.98% using RF with EEG time-frequency features, while [112] reported AUC values ranging from 75 to 92% for ECG and PPG-based stress detection. Souchet et al. [104] demonstrated particularly strong performance with Gradient Boosting achieving 97.8% ROC AUC for VR-based SCWT anxiety detection.

Validation methodologies predominantly employ k-fold CV (typically 5- or 10-fold), with some studies utilizing LOSO approaches. Studies implementing LOSO validation generally reported lower performance metrics, reflecting the challenge of subject-independent stress detection. For instance, [134] reported LOSO accuracies of 47.97–72.73% compared to higher values typically observed in k-fold validation. The validation approach could significantly influence reported performance, suggesting important considerations for real-world generalizability of anxiety-detection systems [65].

Similar features to those used in social stressors from ECG and EDA signals also show potential in classifying anxiety levels. However, generalization across different stress conditions and tasks posed a significant challenge in these studies. Several researchers examined how well models trained under one set of conditions would perform under different stress-inducing scenarios. Mamdouh et al. [89] tested models trained on mental arithmetic tasks against data from virtual-reality-induced stress, finding that kNNs slightly outperformed SVMs and LDAs, with an accuracy of 87.1%. This finding underscored the challenges in transferring models between different stress contexts.

3.3.3. Physical Stressors

Overall, over 58% of studies used samples of less than 20 participants, 75% of studies reported sex composition, less than 1% of studies reported race or ethnicity, and 42% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S3). Physical stressors in these studies included tasks such as physical exercise using a cycle-ergometer or walking exercises or thermal stress induced by physical activity at high temperatures or by cold ice water at low temperatures (cold pressor test [50]). These stressors were used to evaluate how physical demand can trigger significant anxiety responses. In the study by Delmastro et al. [72], physical anxiety was induced using a cycle-ergometer exercise in conjunction with SCWT to examine anxiety responses in older adults. The study utilized ECG and EDA signals to detect changes in anxiety levels, with RF and AdaBoost models achieving accuracies between 85.4% and 88.2%. These findings suggest that physical activity, while inducing a low level of anxiety, can positively impact cognitive performance during stressful tasks. F1-scores typically ranged from 70% to 84% and AUCs from 80% to 90%, depending on the features and classifiers used. Most validations were based on train–test splits, and only a few studies reported k-fold cross-validation results, which generally yielded slightly lower but more reliable performance metrics.

Han et al. [77] explored anxiety detection during both in-lab and real-world conditions, using a variety of stressors including physical exercises such as plank exercises and the TSST. The study recorded ECG, PPG, and EDA signals to monitor anxiety responses. The kNN model showed high accuracy, with up to 94.55% accuracy for in-lab data and 100% for certain real-world conditions when physical activities were excluded due to motion artifacts. This result highlights the challenges of detecting anxiety in dynamic, real-world environments where physical activity can introduce significant noise. In another study [119], physical anxiety was induced by the cold pressor test and walking exercises. The study aimed to compare the performance of different heart rate monitoring devices in detecting anxiety. Using ECG and PPG signals, RF achieved accuracies between 81% and 85%, demonstrating that slight differences in heart rate readings across devices did not significantly impact the accuracy of anxiety detection. Lastly, Sandulescu et al. [22,98] examined the feasibility of using a customized wearable system to detect thermal and mental anxiety. The study involved physical activity at 40 °C to induce thermal stress and used TSST for mental stress. However, the results for thermal anxiety detection were not clearly reported, leaving some uncertainty about the system’s effectiveness under thermal stress conditions.

3.3.4. Emotional Stressors

Overall, nearly 17% of studies used samples of less than 20 participants, 75% of studies reported sex composition, less than 1% of studies reported race or ethnicity, and 58% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S4). This category focused on tasks that evoked emotional reactions and challenges through emotionally challenging stimuli such as viewing video clips with negative arousal. Three studies [104,116,145,167] used emotional visual stimuli from IAPS [155], which provided images that induced negative, positive or neutral emotions. For example, Ding et al. [167] used negative visual stimuli from IAPS and a stop-signal task to induce emotional stress, with physiological signals like ECG and EDA used to predict state anxiety with regression models. The correlation coefficients between predicted and actual STAI scores using LASSO regression ranged from 0.4748 to 0.5528, suggesting moderate accuracy in capturing emotional responses. Han et al. [77] utilized ECG, PPG, and EDA signals, with kNN and SVM models achieving high accuracy in lab settings (94.55%) where motion artifacts were minimal.

Other studies, e.g., [135,146], used video clips containing emotionally charged content to elicit anxiety. Another study by Henry et al. [78] explored the generalizability of ML models focusing solely on cardiac signals using two datasets (CASE and WESAD), one with emotional stress by watching video clips with negative emotions and the other used a social stressor, respectively. SVM models performed slightly better than other models, and WESAD had higher accuracy compared to CASE. Using ECG signals also suggested better performance than PPG due to higher signal quality. Gazi et al. [115] examined anxiety detection in individuals with arachnophobia by exposing them to spiders in a virtual environment. This study used ECG, EDA, and respiration signals, achieving an accuracy of 88% with an RF model when adding respiratory features as complementary information. Many of these studies used 5-fold or LOSO cross-validation, with LOSO showing slightly reduced performance but better estimating real-world generalizability.

While ECG and EDA are commonly used to detect anxiety under emotional stressors, researchers have also utilized pupil diameter, and even facial cues. Giannakakis et al. [75] investigated the use of facial cues and camera-based PPG to detect stress during tasks such as social exposure, emotional recall, and viewing stressful images. The study reported accuracy between 85.54% and 91.68%, with kNN and AdaBoost models having the highest performance. The inclusion of facial features like eye blinks, head movement, and changes in pupil diameter, alongside heart rate data, proved to be effective in classifying stress responses to emotionally charged stimuli. Using pupil diameter features alone, Erkus et al. [73] reported accuracies ranging between 44.4% and 67.9% using LDA and coarse tree. Two studies explored brain activity for anxiety detection. For ECG, time and frequency domain and nonlinear HRV features such as RMSSD, LF/HF ratios, and standard deviations from the Poincaré plot were commonly used [77,115,167], while for EDA, features like mean skin conductance response and mean amplitude [145,148] also contributed to anxiety detection as an indicator of the sympathetic nervous system activation.

3.3.5. Driving Stressors

Overall, 60% of studies used samples of less than 20 participants, 10% of studies reported sex composition, no studies reported race or ethnicity, and only 10% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S5). Eight out of twelve studies in Table 4 utilized the SRAD dataset, and four studies [87,92,93,94] used simulated driving tasks in lab-controlled conditions. ECG, EMG, foot and hand GSR, and respiration data were collected. The protocol included a resting state (low stress), driving in the city (medium stress), and driving on the highway (high stress). Additionally, self-reported stress and anxiety levels were collected from participants.

The primary physiological signals used across these studies included ECG and EDA, with some studies [102,146,170] also incorporating additional signals such as respiration and EMG. These signals were processed to extract various features, including HRV features, skin conductance response from EDA, and ECG-derived respiration. To ensure accurate stress detection, noise reduction methods like template matching for ECG R peak detection, z-scores for removing unreliable RR intervals, baseline correction, and wavelet decomposition were commonly applied. High F1-scores (above 95%) and AUCs up to 91% were achieved across a range of models like SVM, RF, LSTM, and CNN. Siam et al. [102] achieved both 98.2% F1-score and accuracy using only 10 features, suggesting strong potential for real-time deployment.

RF consistently showed high performance across multiple studies, particularly excelling in both binary and three-class stress classification tasks. RF achieved an accuracy of 98.2% in one study [102] and area under the curve values of 91.5% in [88], demonstrating its robustness in handling stress detection using physiological signals. SVM also performed well, with varying levels of success depending on the study. Cruz et al. [70] reported an impressive accuracy of 96.3% using a tree-optimized SVM model for its effectiveness in stress detection during driving tasks using ECG as the single modality. Other traditional models like kNN, DT, and LR were used with similar success. For example, kNN achieved an accuracy of 91.2% in Siam et al. [102] and 80.05% in Dalmeida et al. [71], suggesting that these models can still be valuable in stress detection, albeit typically with a lower performance compared to RF and SVM.

Arya et al. [62] reported an accuracy of 95.67% using LSTM for binary classification and 88.70% for ternary classification, CNN and CNN-LSTM hybrids were also effective, with CNN achieving 94.2% accuracy in binary classification and 85.60% in three-class classification [62]. However, the CNN-LSTM hybrid did not significantly outperform the standalone LSTM model. A notable approach in some studies was the use of automated pipeline optimization for ML, such as the Tree-based Pipeline Optimization Tool. Liu et al. [88] reported that this optimization method, with an extra tree classifier, improved the AUC to 93.4%, indicating the potential of automated ML in refining model performance. Across studies, key features for stress detection included heart rate, HRV, and skin conductance features. The consistent importance of these features across different models and datasets illustrates their potential as reliable biomarkers for anxiety detection. While deep learning models, particularly LSTM, generally outperformed traditional ML models, RF consistently showed strong performance, making it a reliable choice for stress classification tasks. These results suggest that while deep learning models offer advantages, traditional models like RF can still be highly effective, especially in cases with limited data.

In addition, Dalmeida et al. [71] explored the generalization of models trained on the SRAD dataset and tested on the AffectiveRoad dataset; although they did not report performance metrics. This work highlights the ongoing challenge of ensuring that anxiety-detection models can generalize well across different datasets and real-world scenarios. The high accuracy observed in studies like Siam et al. [102] also suggests the feasibility of applying anxiety classification in real-world scenarios, such as monitoring drivers in near-real-time with short recording periods (e.g., 1 min segments). This practical application is crucial for developing effective anxiety-detection systems for high-stress environments like driving.

3.3.6. Daily-Life Stressors

The studies using daily-life stressors are listed in Table 5. Overall, 44% of studies used samples of less than 20 participants, 67% of studies reported sex composition, 1% of studies reported race or ethnicity, and 44% of the studies used a validated instrument to evaluate anxiety (see Supplementary Table S6). Model performances on daily-life stressors performance vary greatly, ranging from 31.3% [23] to 100% [77] in classification accuracy. Studies incorporating lab-based and real-world data emphasized the challenges in maintaining high model accuracy outside the lab [55,77,107,116]. Models trained on lab data often showed a reduced performance when applied in real-world scenarios [77,107], highlighting the importance of robust preprocessing and the inclusion of context-aware algorithms to handle the variability found in real-life situations. In several studies [55,60,107,116,151], they also mentioned the imbalanced samples in the data collected in the real-world (more non-anxious events than anxious events), and Başaran et al. [151] proposed semi-supervised learning to address the intensive labeling required in large in-the-wild datasets.

Real-world stress detection studies showed the broadest performance variability, heavily influenced by labeling quality, motion artifacts, and contextual heterogeneity. On structured datasets like SWEET, Al-Alim et al. [60] achieved outstanding results with RF and XGBoost, reporting F1-scores up to 98.98% and AUCs over 98% under 5-fold validation. In contrast, field deployments using naturalistic labeling struggled with generalization. Gjoreski et al. [55] showed that LOSO F1-scores improved from 71% to 95% when contextual features were included, emphasizing their critical role. Semi-supervised methods and domain adaptation approaches also helped to mitigate label scarcity: Başaran et al. [151] reported F1-scores of 82.6% for CNN-LSTM and ~76% for semi-supervised MLPs. Deep models trained on healthcare workers and nursing data (e.g., [54]) showed F1-scores of 91–99%, while [23] using multitask CNN reported much lower F1 (~41.5%), likely due to high noise and weak labels. Plarre et al. [96] and Toshnazarov et al. [107] demonstrated that physiological signals can predict stress even in noisy real-world data, achieving AUCs around 88–90% and generalizable F1-scores of 63–71% when contextual priors were integrated.

Data collected in laboratory settings typically resulted in higher accuracies compared to data collected in natural settings [23,55,77,96,116], while adding context information seemed to improve the model’s performance [55,116]. This difference underscores the challenges of applying lab-trained models to real-world environments, where unpredictable factors such as ambient noise, physical movements, and uncontrolled psychological states reduce the quality of signal detection [54,85,131].

4. Discussion

4.1. Summary

The review confirms the widespread application of traditional ML approaches in anxiety detection, with SVM being the most prevalent due to its ability to handle high-dimensional datasets and achieve high accuracy. Ensemble methods showed competitive results, indicating their effectiveness in combining multiple weak learners for robust predictions. Deep learning methods are increasingly popular for both FB and E2E approaches, achieving comparable or superior performance to traditional ML models. Widely used datasets like WESAD, SWELL-KN, and SRAD have enabled benchmarking and comparative analyses across ML models. Binary classification remains the most common approach, often achieving higher accuracies than multiclass classification due to the complexity of distinguishing subtle physiological differences across multiple states. However, the variations in features employed and model architecture across different studies make it challenging to compare across studies and identify the most reliable features or robust models for anxiety detection. This observation was supported by reviews in the relevant field, which highlight that no single model or feature set universally outperforms others due to diverse experimental conditions and participant characteristics [14,172].

Social stressors, particularly those induced by TSST, dominate the field, as seen in studies using the WESAD dataset. These studies often achieve high accuracy because of controlled experimental designs and multimodal data collection. Mental stressors are the most represented category, with studies employing tasks like SCWT to elicit cognitive stress. Driving stressors provide valuable insights into stress detection in semi-wild environments using datasets like SRAD and AffectiveROAD. Physical and emotional stressors further expand the scope, with studies focusing on tasks like physical exercise and emotional challenge to understand diverse anxiety responses. Lastly, there has been increasing interest in translating lab findings and models to in-the-wild data, highlighting future directions in anxiety detection for real-life.

4.2. Key Observations

Comparison between FB and E2E models: The effectiveness of both approaches appears to be heavily influenced by dataset size and signal type. For ECG/PPG signals, feature-based methods often show superior performance, as demonstrated by Jahanjoo et al. [79], achieving 95.55% accuracy using SVM with HRV features. In contrast, E2E methods show promise with EEG data, as the authors of [124] achieved 99.56% accuracy using kNN with PCA-reduced EEG raw data.

Real-world applications present unique challenges for both approaches. Gjoreski et al. [115] showed that while SVM performed well in laboratory conditions (67–71% accuracy), performance varied significantly in real-world settings. The study by Naegelin et al. [95] is particularly insightful, showing that combining physiological signals with contextual features (like mouse and keyboard activity) improved real-world performance, achieving F1-scores of 62.5%. This suggests that hybrid approaches, combining both feature engineering and deep learning techniques, might be the most effective for practical applications.

FB models demonstrate several key advantages. Their primary strength lies in robustness and consistency across different datasets, achieving over 90% accuracy with traditional ML models. They allow clear interpretability through feature importance analysis. These models also perform well with limited data and require less computational resources. However, FB models face limitations in requiring domain expertise for feature engineering. They may also miss complex patterns in raw data and need feature redesign for different sensor types, as shown in [112] where model performance dropped significantly when transferring between ECG and PPG signals.

E2E models offer distinct advantages in their ability to automatically learn features from raw data. They excel at capturing complex temporal patterns and show remarkable adaptability to different input data types, evidenced by Ragav et al. [150] reaching 99.86% accuracy across various physiological signals. However, E2E typically requires substantial training data, as demonstrated by poor CNN performance (40–48%) with smaller datasets [134]. E2E models also demand greater computational resources and often lack interpretability, as noted by Dziezyc et al. [11]. Lastly, Toshnazarov et al. [107] demonstrated their tendency to show performance degradation in real-world settings, with accuracy dropping from 76.4% in laboratory conditions to 64.7–70.1% in practical applications.

Stressors differentiation and generalization: While many reviewed studies incorporated multiple stressors as part of their study protocols, it is important to note that the majority of these models were not specifically designed to evaluate the distinct impact of unique stressors on physiological signals. Instead, multiple stressors were grouped together, and these studies did not fine-tune their analyses to distinguish between different types of stressors. Understanding the specific effects of each stressor could be crucial, as different stressors can elicit varied physiological responses, and accurately identifying which stressor is at play could lead to more precise anxiety-detection models and better-targeted interventions. For example, physical stressors and psychosocial stressors may induce different patterns of physiological changes [173] that are important for fine-grained anxiety detection and personalization of stress management interventions. By distinguishing between stressors, researchers can better understand how different individuals react to specific stress conditions, leading to more personalized approaches to anxiety management. Meanwhile, it is still essential to recognize the significance of the generalization of anxiety responses across a range of conditions to ensure model robustness to detect anxiety in diverse real-world scenarios, where multiple types of stressors might be present simultaneously.

Real-world challenges and model robustness: A significant challenge in anxiety-detection research is translating laboratory findings to practical, real-world applications. Although laboratory-based studies often report high accuracy rates, the MLs’ performance generally declines when applied to real-world settings [23,55,77,96]. The majority of studies focused on distinguishing between baseline and anxious states, which, while effective in controlled lab settings, may not translate well to real-world scenarios where multiple affective states are present [60,151]. This aspect of model robustness in the presence of diverse affective states has not been thoroughly tested, representing a considerable gap in the research for evaluating the testing environment in influencing model selection and performance.

The data quality particularly affects the quality and reliability of in-the-wild data, where factors like physical activities, user compliance to protocol, and proper device handling [174] can significantly degrade the ML performance. Wearable sensors could vary in their signal quality and are subject to measurement errors, especially during physical activity [175]. Physiological data are prone to contamination by artifacts resulting from movement, external noise, and other non-physiological factors (e.g., fit of wearable devices, ambient temperature and humidity, and device maintenance) [176,177]. Effective methods for filtering these artifacts are crucial for separating signals from noise to ensure data integrity. Addressing this issue could involve two primary strategies: (1) noise reduction by developing methods to remove or minimize noise from the data, (2) noise-robust detection by developing methodologies that are inherently robust to the presence of noise. Efforts have also been made to discriminate between psychological stressors and physical stressors, for instance, by using accelerometer data to categorize the physical activity state during anxiety detection [77,119].

In-the-wild data also suffers from imbalanced samples due to the nature of data collection that results in more non-anxious periods compared to anxious events, potentially leading to reduced performance [55,60,107,116,151]. Also, continuous monitoring often results in large datasets with thousands of data points and relies traditionally on intensive labeling using self-reports for ground truth. Semi-supervised learning has shown promising results for circumventing this dependence on the ground truth [151]. Future research should address the lack of in-depth, comprehensive studies in wild settings, consider a wider variety of affective states, and focus on increasing the size and diversity of participant groups to enhance the generalizability and reliability of anxiety-detection models.

Sample size and generalizability across the population: Research often involves relatively homogenous participant groups, which may not represent the diverse population well, especially for underrepresented minority groups. Many papers emphasized model enhancement, and they relied on data from a limited number of participants, often fewer than 20, which restricts the robustness and generalizability of their findings. For instance, the study by Han et al. [77], which was limited to three participants, yielded an 81% accuracy rate, and two studies by Gjoreski et al. [55,116] were limited to five participants each for the in-the-wild conditions. While these papers claimed to boost the model’s performance by considering physical activity as contextual information, the small sample sizes could limit the generalizability to new, unseen data due to overfitting. Most studies in this field predominantly involved young, healthy adults, often from similar cultural and ethnic backgrounds. This homogeneity fails to capture the variability in physiological and psychological responses to stress across different groups. Age and ethnic factors can influence stress perception and physiological responses, potentially impacting the model’s performance in a diverse population [178]. Likewise, gender differences can play a role in stress responses, with research indicating that males and females may exhibit different physiological and psychological reactions to stressors [179]. Consequently, models trained predominantly on one gender may perform less effectively when applied to the other. While datasets such as WESAD and SRAD have been commonly used, the small samples and limited sociodemographic information raise the potential risk of bias when applied to the general population.

Individual variability in stress responses: The variability in how anxiety manifests across different individuals poses a significant challenge. Genetic factors, personal health history, and previous experiences contribute to how one responds to stress [180,181], making it difficult to establish a generalized, one-size-fits-all model for detection. This variability means that physiological measures can differ widely among individuals due to factors like trait personality and resilience [31,182]. Additionally, individuals’ physiological responses to stressors may change over time as they develop coping mechanisms or due to repeated exposures [183]. Studies need to consider these adaptive responses and potentially track changes over extended periods to understand and predict anxiety reactions accurately. Anxiety response varies not only across individuals but also within an individual over time, depending on health, mood, and external circumstances. This intra-individual variability can complicate the interpretation of physiological data. Future research should focus on developing adaptive models that learn and adjust to individual baselines dynamically. Incorporating longitudinal tracking to continuously update the model based on the individual’s changing physiological and emotional state could provide more accuracy.

Ethical, clinical, and social implications of anxiety detection using ML: The risks associated with the use of ML to detect anxiety include ethical, clinical, and social factors such as bias, equity, data privacy and security, confidentiality, transparency, and accountability. The risk of bias, inequity, and lack of transparency in anxiety detection using ML is a significant concern. As highlighted throughout this review, the heterogeneity in study designs, data modalities, and experimental paradigms presents a significant barrier to fair model comparison and synthesis across the field. Studies vary widely in anxiety induction protocols, the types of physiological signals collected, and approaches to ground truth labeling, including the use of validated psychological instruments. In addition, inconsistencies in sensor types, device placement, sampling rates, signal quality, and preprocessing techniques further limit comparability and hinder the development of standardized benchmarks for anxiety detection. We therefore emphasize the need to establish methodological standards that enable fair cross-study comparisons, enhance reproducibility, support rigorous benchmarking, and ultimately facilitate the clinical translation of anxiety-detection systems into scalable and trustworthy digital health solutions. Given the implications of mental health diagnoses through the detection of anxiety together with clinical evaluation, it is important to account for risks in data privacy, security, confidentiality, and accountability associated with continuous monitoring of wearable sensor data and automatic detection of anxiety using ML. The use of these technologies in healthcare will require careful attention to the use of identifiable behavioral and physiological markers, design of a discreet wearable system, and system for providing notification to end-user or relevant parties, so as to minimize the risk of a data breach or loss of confidentiality in the public domain or in social settings.

Limitations and opportunities: The use of self-reported evaluations of anxiety, as a ground-truth for classification, while providing a validated gold standard in clinical evaluation requires both time and the burden of self-appraisal. Thus, through the integration of wearable sensing, multimodal fusion, and ML approaches, biomarker ambiguity may be mitigated. However, as noted in recent work [184,185], significant wearable integration challenges, such as motion artifacts in PPG signals during physical activity, abound in real-world conditions, which need to be accounted for in the selection of features and ML algorithms used for anxiety detection. Furthermore, future work should focus on examining clinical populations, to evaluate the generalizability of findings in predominantly healthy young adults to populations with greater need of automatic and objective anxiety measures. Moreover, the exclusion of clinical populations and individuals with comorbidities in many studies may further restrict the applicability of findings in real-world clinical settings, where symptom presentation and physiological profiles may differ substantially. Lastly, privacy and consent of continuous data collection need to be addressed in future work, as systems are deployed in clinical practice.

Toward anxiety prediction and intervention: The future of anxiety detection lies in moving beyond mere detection toward predicting anxiety onset and developing intervention strategies. There is a notable gap in methodologies for predicting anxiety onset. However, promising applications in real-time detection, such as virtual reality therapy and HRV biofeedback training [186,187], offer avenues to ameliorate anxiety levels. Leveraging existing anxiety-detection frameworks could significantly enhance the evaluation and improvement of anxiety therapies, facilitating coping mechanisms for negative emotions.

5. Conclusions

This review paper examined the current state of machine-learning methods in anxiety detection, revealing significant advancements as well as ongoing challenges in the field. The extensive review highlights a diverse range of methodologies, from traditional FB models to advanced E2E approaches, each with their own set of strengths and limitations. A key observation was the progression from traditional FB models toward more advanced FB deep learning models, reflecting the research interest toward more sophisticated, data-driven approaches and model architectures. The results have shown the high accuracy of these models in controlled laboratory settings, while also underscoring the need for expanded research in real-world environments to assess and enhance their practical applicability. The insights from studies conducted in the wild, though limited, have revealed a crucial gap in the research and a pressing need for models that can effectively operate amidst the complexities of real-life scenarios. The review also highlights the importance of understanding the physiological and psychological responses elicited by different types of stressors, including social, mental, physical, emotional, driving, and daily-life stressors. While many studies group stressors into broad binary categories, distinguishing between stressor types can enhance the precision of anxiety-detection models and support the development of more personalized interventions. Furthermore, while this review provided an opportunity for experts in neuroscience, machine learning, and psychiatry to work together and establish an organization strategy for existing work, as well as a discussion of strengths and limitations, in the future this field may benefit from the examination of objective classification using specialized knowledge graph methods. As the field progresses, future research should focus on improving the robustness, versatility, and generalizability of anxiety-detection models. Emphasizing real-world applicability, enhancing noise resilience, exploring innovative architectures, and expanding the understanding of affective state variability will be key to advancing the field for mental health promotion.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app151810099/s1, Prisma-ScR checklist [188]; Table S1: Summary of selected studies utilizing social stressors [11,21,22,24,57,59,67,68,75,78,79,95,96,98,99,100,107,111,113,114,118,119,120,121,122,127,129,131,132,135,136,140,143,145,148,149,150,152,153,166]; Table S2: Summary of selected studies utilizing mental stressors [42,55,56,58,63,64,65,66,68,69,72,74,75,77,80,81,82,83,85,86,87,89,90,95,96,97,99,101,103,104,105,106,107,108,111,112,116,117,118,119,123,124,125,126,128,129,130,132,133,134,137,139,144,147,152,167,168]; Table S3: Summary of selected studies utilizing physical stressors [22,57,61,72,77,84,96,107,111,119,152,169]; Table S4: Summary of selected studies utilizing emotional stressors [73,75,76,77,78,84,91,110,115,134,141,167]; Table S5: Summary of selected studies utilizing driving stressors [62,70,71,87,88,92,93,94,102,138,146,170]; Table S6: Summary of selected studies utilizing daily life stressors [23,54,55,60,77,96,107,115,151].

Author Contributions

Conceptualization, A.A., J.L.C., R.B.S., E.T.H.-W., M.H. and M.E.H.; methodology, A.A. and M.H.; data curation, A.A. and M.H.; writing—original draft preparation, A.A., J.L.C., R.B.S., E.T.H.-W., M.H. and M.E.H.; writing—review and editing, A.A., J.L.C., R.B.S., E.T.H.-W., M.H. and M.E.H.; visualization, A.A. and M.H.; supervision, E.T.H.-W. and M.E.H.; project administration, M.E.H.; funding acquisition, J.L.C., R.B.S., E.T.H.-W. and M.E.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jump ARCHES endowment through the Health Care Engineering Systems Center.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Rebecca Smith for her expertise and assistance in the literature review process.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
CASE	Continuously Annotated Signals of Emotion
CNN	Convolutional Neural Network
DT	Decision Tree
E2E	End-to-End
ECG	Electrocardiogram
EDA	Electrodermal Activity
EEG	Electroencephalogram
EMG	Electromyography
FB	Feature-Based
FCN	Fully Convolutional Neural Network
GCN	Graph Convolutional Network
HF	High Frequency
HPA	Hypothalamic–Pituitary–Adrenal axis
HRV	Heart Rate Variability
IAPS	International Affective Picture System
kNN	k-Nearest Neighbors
LDA	Linear Discriminant Analysis
LF	Low Frequency
LR	Linear Regression
LSTM	Long Short-Term Memory
ML	Machine Learning
MLP	Multilayer Perceptron
PPG	Photoplethysmography
RESP	Respiration
RF	Random Forest
RMSSD	Root Mean Square of Successive Differences
RNN	Recurrent Neural Network
SCWT	Stroop Color and Word Test
STAI	State–Trait Anxiety Inventory
SVM	Support Vector Machine
SWELL-KN	Smart Reasoning for Well-being at Home and at Work—Knowledge Work
TEMP	Temperature
TSST	Trier Social Stress Test
WESAD	Wearable Stress and Affect Detection

References

Manderscheid, R.W.; Ryff, C.D.; Freeman, E.J.; McKnight-Eily, L.R.; Dhingra, S.; Strine, T.W. Evolving Definitions of Mental Illness and Wellness. Prev. Chronic Dis. 2010, 7, A19. [Google Scholar]
Salari, N.; Hosseinian-Far, A.; Jalali, R.; Vaisi-Raygani, A.; Rasoulpoor, S.; Mohammadi, M.; Rasoulpoor, S.; Khaledi-Paveh, B. Prevalence of Stress, Anxiety, Depression among the General Population during the COVID-19 Pandemic: A Systematic Review and Meta-Analysis. Glob. Health 2020, 16, 57. [Google Scholar] [CrossRef]
Canals, J.; Voltas, N.; Hernández-Martínez, C.; Cosi, S.; Arija, V. Prevalence of DSM-5 Anxiety Disorders, Comorbidity, and Persistence of Symptoms in Spanish Early Adolescents. Eur. Child Adolesc. Psychiatry 2019, 28, 131–143. [Google Scholar] [CrossRef]
Wittchen, H.U.; Jacobi, F.; Rehm, J.; Gustavsson, A.; Svensson, M.; Jönsson, B.; Olesen, J.; Allgulander, C.; Alonso, J.; Faravelli, C.; et al. The Size and Burden of Mental Disorders and Other Disorders of the Brain in Europe 2010. Eur. Neuropsychopharmacol. 2011, 21, 655–679. [Google Scholar] [CrossRef]
Celano, C.M.; Daunis, D.J.; Lokko, H.N.; Campbell, K.A.; Huffman, J.C. Anxiety Disorders and Cardiovascular Disease. Curr. Psychiatry Rep. 2016, 18, 101. [Google Scholar] [CrossRef] [PubMed]
Segerstrom, S.C.; Miller, G.E. Psychological Stress and the Human Immune System: A Meta-Analytic Study of 30 Years of Inquiry. Psychol. Bull. 2004, 130, 601–630. [Google Scholar] [CrossRef] [PubMed]
Thomas, K.C.; Ellis, A.R.; Konrad, T.R.; Holzer, C.E.; Morrissey, J.P. County-Level Estimates of Mental Health Professional Shortage in the United States. Psychiatr. Serv. 2009, 60, 1323–1328. [Google Scholar] [CrossRef]
Satiani, A.; Niedermier, J.; Satiani, B.; Svendsen, D.P. Projected Workforce of Psychiatrists in the United States: A Population Analysis. Psychiatr. Serv. 2018, 69, 710–713. [Google Scholar] [CrossRef] [PubMed]
Althubaiti, A. Information Bias in Health Research: Definition, Pitfalls, and Adjustment Methods. J. Multidiscip. Healthc. 2016, 9, 211–217. [Google Scholar] [CrossRef]
Julian, L.J. Measures of Anxiety. Arthritis Care 2011, 63, S467–S472. [Google Scholar] [CrossRef]
Dziezyc, M.; Gjoreski, M.; Kazienko, P.; Saganowski, S.; Gams, M. Can We Ditch Feature Engineering? End-to-End Deep Learning for Affect Recognition from Physiological Sensor Data. Sensors 2020, 20, 6535. [Google Scholar] [CrossRef] [PubMed]
Mentis, A.F.A.; Lee, D.; Roussos, P. Applications of Artificial Intelligence−machine Learning for Detection of Stress: A Critical Overview. Mol. Psychiatry 2023, 29, 1882–1894. [Google Scholar] [CrossRef]
Spielberger, C.D. Theory and Research on Anxiety; Spielberger, C.D., Ed.; Academic Press Inc.: Oxford, UK, 1966; ISBN 9781483258362. [Google Scholar]
Daviu, N.; Bruchas, M.R.; Moghaddam, B.; Sandi, C.; Beyeler, A. Neurobiological Links between Stress and Anxiety. Neurobiol. Stress 2019, 11, 100191. [Google Scholar] [CrossRef]
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 5th ed.; American Psychiatric Association: Washington, DC, USA, 2013. [Google Scholar]
Spielberger, C.D. Notes and Comments Trait-State Anxiety and Motor Behavior. J. Mot. Behav. 1971, 3, 265–279. [Google Scholar] [CrossRef]
Spielberger, C.D.; Gonzalez-Reigosa, F.; Martinez-Urrutia, A.; Natalicio, L.F.S.; Natalicio, D.S. The State-Trait Anxiety Inventory. Rev. Interam. Psicol. J. Psychol. 1971, 5, 3–4. [Google Scholar]
Wiedemann, K. Anxiety and Anxiety Disorders. In International Encyclopedia of the Social & Behavioral Sciences; Elsevier: Amsterdam, The Netherlands, 2001; pp. 560–567. [Google Scholar]
Duval, E.R.; Javanbakht, A.; Liberzon, I. Neural Circuits in Anxiety and Stress Disorders: A Focused Review. Ther. Clin. Risk Manag. 2015, 11, 115–126. [Google Scholar] [CrossRef]
Ding, Y.; Cao, Y.; Duffy, V.G.; Wang, Y.; Zhang, X. Measurement and Identification of Mental Workload during Simulated Computer Tasks with Multimodal Methods and Machine Learning. Ergonomics 2020, 63, 896–908. [Google Scholar] [CrossRef] [PubMed]
Mozos, O.M.; Sandulescu, V.; Andrews, S.; Ellis, D.; Bellotto, N.; Dobrescu, R.; Ferrandez, J.M. Stress Detection Using Wearable Physiological and Sociometric Sensors. Int. J. Neural Syst. 2017, 27, 1650041. [Google Scholar] [CrossRef]
Sandulescu, V.; Dobrescu, R. Wearable System for Stress Monitoring of Firefighters in Special Missions. In Proceedings of the 2015 E-Health and Bioengineering Conference, EHB 2015, Iasi, Romania, 19–21 November 2015; pp. 1–4. [Google Scholar]
Schmidt, P.; Dürichen, R.; Reiss, A.; Van Laerhoven, K.; Plötz, T. Multi-Target Affect Detection in the Wild: An Exploratory Study. In Proceedings of the Proceedings-International Symposium on Wearable Computers, ISWC, London, UK, 9–13 September 2019; pp. 211–219. [Google Scholar]
Vaz, M.; Summavielle, T.; Sebastião, R.; Ribeiro, R.P. Multimodal Classification of Anxiety Based on Physiological Signals. Appl. Sci. 2023, 13, 6368. [Google Scholar] [CrossRef]
Jiao, Y.; Wang, X.; Liu, C.; Du, G.; Zhao, L.; Dong, H.; Zhao, S.; Liu, Y. Feasibility Study for Detection of Mental Stress and Depression Using Pulse Rate Variability Metrics via Various Durations. Biomed. Signal Process. Control 2023, 79, 104145. [Google Scholar] [CrossRef]
Bystritsky, A.; Kronemyer, D. Stress and Anxiety: Counterpart Elements of the Stress/Anxiety Complex. Psychiatr. Clin. N. Am. 2014, 37, 489–518. [Google Scholar] [CrossRef]
Taschereau-Dumouchel, V.; Michel, M.; Lau, H.; Hofmann, S.G.; LeDoux, J.E. Putting the “Mental” Back in “Mental Disorders”: A Perspective from Research on Fear and Anxiety. Mol. Psychiatry 2022, 27, 1322–1330. [Google Scholar] [CrossRef]
Beck, A.T.; Epstein, N.; Brown, G.; Steer, R.A. An Inventory for Measuring Clinical Anxiety: Psychometric Properties. J. Consult. Clin. Psychol. 1988, 56, 893–897. [Google Scholar] [CrossRef] [PubMed]
Demetriou, C.; Ozer, B.U.; Essau, C.A. Self-Report Questionnaires. Encycl. Clin. Psychol. 2015, 1–6. [Google Scholar] [CrossRef]
Arikian, S.R.; German, J.M. A Review of the Diagnosis, Pharmacologie Treatment, and Economic Aspects of Anxiety Disorders. Prim. Care Companion J. Clin. Psychiatry 2001, 3, 110–117. [Google Scholar] [CrossRef]
Weinberger, D.A.; Schwartz, G.E.; Davidson, R.J. Low-Anxious, High-Anxious, and Repressive Coping Styles: Psychometric Patterns and Behavioral and Physiological Responses to Stress. J. Abnorm. Psychol. 1979, 88, 369–380. [Google Scholar]
Kaplan, R.; Saccuzzo, D. Psychological Testing: Principles, Applications, and Issues; Cengage Learning: Boston, MA, USA, 1982; ISBN 9781337517065. [Google Scholar]
Ancillon, L.; Elgendi, M.; Menon, C. Machine Learning for Anxiety Detection Using Biosignals: A Review. Diagnostics 2022, 12, 1794. [Google Scholar] [CrossRef] [PubMed]
Kim, H.G.; Cheon, E.J.; Bai, D.S.; Lee, Y.H.; Koo, B.H. Stress and Heart Rate Variability: A Meta-Analysis and Review of the Literature. Psychiatry Investig. 2018, 15, 235–245. [Google Scholar] [CrossRef]
Giannakakis, G.; Grigoriadis, D.; Giannakaki, K.; Simantiraki, O.; Roniotis, A.; Tsiknakis, M. Review on Psychological Stress Detection Using Biosignals. IEEE Trans. Affect. Comput. 2022, 13, 440–460. [Google Scholar] [CrossRef]
Merletti, R.; Aventaggiato, M.; Botter, A.; Holobar, A.; Marateb, H.; Vieira, T.M.M. Advances in Surface EMG: Recent Progress in Detection and Processing Techniques. Crit. Rev. Biomed. Eng. 2010, 38, 305–345. [Google Scholar] [CrossRef]
Hidaka, O.; Yanagi, M.; Takada, K. Mental Stress-Induced Physiological Changes in the Human Masseter Muscle. J. Dent. Res. 2004, 83, 227–231. [Google Scholar] [CrossRef] [PubMed]
Wijsman, J.; Grundlehner, B.; Penders, J.; Hermens, H. Trapezius Muscle EMG as Predictor of Mental Stress. Trans. Embed. Comput. Syst. 2013, 12, 1–20. [Google Scholar] [CrossRef]
Tsai, C.M.; Chou, S.L.; Gale, E.N.; Mccall, W.D. Human Masticatory Muscle Activity and Jaw Position under Experimental Stress. J. Oral Rehabil. 2002, 29, 44–51. [Google Scholar] [CrossRef]
Turpin, G.; Grandfield, T. Electrodermal Activity. Encycl. Stress 2007, 899–902. [Google Scholar] [CrossRef]
Ren, P.; Barreto, A.; Huang, J.; Gao, Y.; Ortega, F.R.; Adjouadi, M. Off-Line and on-Line Stress Detection through Processing of the Pupil Diameter Signal. Ann. Biomed. Eng. 2014, 42, 162–176. [Google Scholar] [CrossRef]
Palanisamy, K.; Murugappan, M.; Sazali, Y. Descriptive Analysis of Skin Temperature Variability of Sympathetic Nervous System Activity in Stress. J. Phys. Ther. Sci. 2012, 24, 1341–1344. [Google Scholar] [CrossRef]
Long, N.; Lei, Y.; Peng, L.; Xu, P.; Mao, P. A Scoping Review on Monitoring Mental Health Using Smart Wearable Devices. Math. Biosci. Eng. 2022, 19, 7899–7919. [Google Scholar] [CrossRef]
Gedam, S.; Paul, S. A Review on Mental Stress Detection Using Wearable Sensors and Machine Learning Techniques. IEEE Access 2021, 9, 84045–84066. [Google Scholar] [CrossRef]
Vizer, L.M.; Zhou, L.; Sears, A. Automated Stress Detection Using Keystroke and Linguistic Features: An Exploratory Study. Int. J. Hum. Comput. Stud. 2009, 67, 870–886. [Google Scholar] [CrossRef]
Allen, A.P.; Kennedy, P.J.; Dockray, S.; Cryan, J.F.; Dinan, T.G.; Clarke, G. The Trier Social Stress Test: Principles and Practice. Neurobiol. Stress 2017, 6, 113–126. [Google Scholar] [CrossRef]
Scarpina, F.; Tagini, S. The Stroop Color and Word Test. Front. Psychol. 2017, 8, 557. [Google Scholar] [CrossRef]
Tulen, J.H.M.; Moleman, P.; van Steenis, H.G.; Boomsma, F. Characterization of Stress Reactions to the Stroop Color Word Test. Pharmacol. Biochem. Behav. 1989, 32, 9–15. [Google Scholar] [CrossRef]
Beh, W.-K.; Wu, Y.-H.; Wu, A.-Y. MAUS: A Dataset for Mental Workload Assessmenton N-Back Task Using Wearable Sensor. arXiv 2021, arXiv:2111.02561. [Google Scholar] [CrossRef]
Lovallo, W. The Cold Pressor Test and Autonomic Function: A Review and Integration. Psychophysiology 1975, 12, 268–282. [Google Scholar] [CrossRef]
Lang, P.J.; Bradley, M.M.; Cuthbert, B.N. International Affective Picture System (IAPS): Technical Manual and Affective Ratings. NIMH Cent. Study Emot. Atten. 1997, 39–58. Available online: https://acordo.net/acordo/wp-content/uploads/2020/08/instructions.pdf (accessed on 8 September 2025).
Magaña, V.C.; Pañeda, X.G.; Garcia, R.; Paiva, S.; Pozueco, L. Beside and behind the Wheel: Factors That Influence Driving Stress and Driving Behavior. Sustainability 2021, 13, 4775. [Google Scholar] [CrossRef]
Chung, W.-Y.; Chong, T.-W.; Lee, B.-G. Methods to Detect and Reduce Driver Stress: A Review. Int. J. Automot. Technol. 2019, 20, 1051–1063. [Google Scholar] [CrossRef]
Pasha, S.T.; Halder, N.; Badrul, T.; Setu, J.H.; Islam, A.; Alam, M.Z. Physiological Signal Data-Driven Workplace Stress Detection Among Healthcare Professionals Using BiLSTM-AM and Ensemble Stacking Models. In Proceedings of the Advances in Science and Engineering Technology International Conferences ASET, Abu Dhabi, United Arab Emirates, 3–5 June 2024; pp. 1–10. [Google Scholar] [CrossRef]
Gjoreski, M.; Luštrek, M.; Gams, M.; Gjoreski, H. Monitoring Stress with a Wrist Device Using Context. J. Biomed. Inform. 2017, 73, 159–170. [Google Scholar] [CrossRef] [PubMed]
Chandra, V.; Sethia, D. Machine Learning-Based Stress Classification System Using Wearable Sensor Devices. IAES Int. J. Artif. Intell. 2024, 13, 337–347. [Google Scholar] [CrossRef]
Ahmad, Z.; Rabbani, S.; Zafar, M.R.; Ishaque, S.; Krishnan, S.; Khan, N. Multilevel Stress Assessment from ECG in a Virtual Reality Environment Using Multimodal Fusion. IEEE Sens. J. 2023, 23, 29559–29570. [Google Scholar] [CrossRef]
Agarwal, S.; Sharma, S.; Faisal, K.N.; Sharma, R.R. Induced Stress Identification Using EEG: A Framework Based on MVMD and Machine Learning. In Proceedings of the 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science SCEECS, Bhopal, India, 24–25 February 2024; pp. 1–5. [Google Scholar] [CrossRef]
Akella, A.; Singh, A.K.; Leong, D.; Lal, S.; Newton, P.; Clifton-Bligh, R.; McLachlan, C.S.; Gustin, S.M.; Maharaj, S.; Lees, T.; et al. Classifying Multi-Level Stress Responses from Brain Cortical EEG in Nurses and Non-Health Professionals Using Machine Learning Auto Encoder. IEEE J. Transl. Eng. Health Med. 2021, 9, 1–9. [Google Scholar] [CrossRef]
Abd Al-Alim, M.; Mubarak, R.; Salem, N.M.; Sadek, I. A Machine-Learning Approach for Stress Detection Using Wearable Sensors in Free-Living Environments. Comput. Biol. Med. 2024, 179, 108918. [Google Scholar] [CrossRef]
AlShorman, O.; Masadeh, M.; Heyat, M.B.B.; Akhtar, F.; Almahasneh, H.; Ashraf, G.M.; Alexiou, A. Frontal Lobe Real-Time EEG Analysis Using Machine Learning Techniques for Mental Stress Detection. J. Integr. Neurosci. 2022, 21, 20. [Google Scholar] [CrossRef]
Arya, L.; Chowdhary, H.; Agrawal, I.; Sreedevi, I. Towards Accurate Stress Classification: Combining Advanced Feature Selection and Deep Learning. In Proceedings of the 2023 3rd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2023, Xiamen, China, 16–18 June 2023; pp. 47–52. [Google Scholar]
Badr, Y.; Al-Shargie, F.; Tariq, U.; Babiloni, F.; Al Mughairbi, F.; Al-Nashash, H. Classification of Mental Stress Using Dry EEG Electrodes and Machine Learning. In Proceedings of the 2023 Advances in Science and Engineering Technology International Conferences, ASET 2023, Dubai, United Arab Emirates, 20–23 February 2023. [Google Scholar]
Badr, Y.; Al-Shargie, F.; Tariq, U.; Babiloni, F.; Al-Mughairbi, F.; Al-Nashash, H. Mental Stress Detection and Mitigation Using Machine Learning and Binaural Beat Stimulation. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Bahameish, M.; Stockman, T.; Requena Carrión, J. Strategies for Reliable Stress Recognition: A Machine Learning Approach Using Heart Rate Variability Features. Sensors 2024, 24, 3210. [Google Scholar] [CrossRef] [PubMed]
Beh, W.-K.; Wu, Y.-H.; Wu, A.-Y. Robust PPG-Based Mental Workload Assessment System Using Wearable Devices. IEEE J. Biomed. Health Inform. 2023, 27, 2323–2333. [Google Scholar] [CrossRef]
Bobade, P.; Vani, M. Stress Detection with Machine Learning and Deep Learning Using Multimodal Physiological Data. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 51–57. [Google Scholar] [CrossRef]
Campanella, S.; Altaleb, A.; Belli, A.; Pierleoni, P.; Palma, L. A Method for Stress Detection Using Empatica E4 Bracelet and Machine-Learning Techniques. Sensors 2023, 23, 3565. [Google Scholar] [CrossRef]
Cui, Z.; Ma, Y.; Ma, M.; Huang, R.; Du, B. Towards a Lightweight Stress Prediction Model: A Study on Dimension Reduction and Individual Models in HRV Analysis. In Proceedings of the 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), Ocean Flower Island, China, 17–21 December 2023; pp. 1709–1716. [Google Scholar] [CrossRef]
Cruz, A.P.; Pradeep, A.; Sivasankar, K.R.; Krishnaveni, K. A Decision Tree Optimised SVM Model for Stress Detection Using Biosignals. In Proceedings of the 2020 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 28–30 July 2020; pp. 841–845. [Google Scholar]
Dalmeida, K.M.; Masala, G.L. Hrv Features as Viable Physiological Markers for Stress Detection Using Wearable Devices. Sensors 2021, 21, 2873. [Google Scholar] [CrossRef]
Delmastro, F.; Di Martino, F.; Dolciotti, C. Cognitive Training and Stress Detection in MCI Frail Older People through Wearable Sensors and Machine Learning. IEEE Access 2020, 8, 65573–65590. [Google Scholar] [CrossRef]
Erkus, E.C.; Purutcuoglu, V.; Ari, F.; Gokcay, D. Comparison of Several Machine Learning Classifiers for Arousal Classification: A Preliminary Study. In Proceedings of the 2020 Medical Technologies Congress (TIPTEKNO), Antalya, Turkey, 19–20 November 2020; pp. 1–7. [Google Scholar]
Fernandez, J.; Martínez, R.; Innocenti, B.; López, B. Contribution of EEG Signals for Students’ Stress Detection. IEEE Trans. Affect. Comput. 2025, 16, 1235–1246. [Google Scholar] [CrossRef]
Giannakakis, G.; Pediaditis, M.; Manousos, D.; Kazantzaki, E.; Chiarugi, F.; Simos, P.G.; Marias, K.; Tsiknakis, M. Stress and Anxiety Detection Using Facial Cues from Videos. Biomed. Signal Process. Control 2017, 31, 89–101. [Google Scholar] [CrossRef]
Hag, A.; Al-Shargie, F.; Handayani, D.; Asadi, H. Mental Stress Classification Based on Selected Electroencephalography Channels Using Correlation Coefficient of Hjorth Parameters. Brain Sci. 2023, 13, 1340. [Google Scholar] [CrossRef]
Han, H.J.; Labbaf, S.; Borelli, J.L.; Dutt, N.; Rahmani, A.M. Objective Stress Monitoring Based on Wearable Sensors in Everyday Settings. J. Med. Eng. Technol. 2020, 44, 177–189. [Google Scholar] [CrossRef]
Henry, J.; Lloyd, H.; Turner, M.; Kendrick, C. On the Robustness of Machine Learning Models for Stress and Anxiety Recognition from Heart Activity Signals. IEEE Sens. J. 2023, 23, 14428–14436. [Google Scholar] [CrossRef]
Jahanjoo, A.; Taherinejad, N.; Aminifar, A. High-Accuracy Stress Detection Using Wrist-Worn PPG Sensors. In Proceedings of the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore, 19–22 May 2024; pp. 1–5. [Google Scholar] [CrossRef]
Abdul Kader, L.; Al-Shargie, F.; Tariq, U.; Al-Nashash, H. One-Channel Wearable Mental Stress State Monitoring System. Sensors 2024, 24, 5373. [Google Scholar] [CrossRef]
Kalra, P.; Sharma, V. Mental Stress Assessment Using PPG Signal a Deep Neural Network Approach. IETE J. Res. 2023, 69, 879–885. [Google Scholar] [CrossRef]
Kim, N.; Seo, W.; Kim, S.; Park, S.M. Electrogastrogram: Demonstrating Feasibility in Mental Stress Assessment Using Sensor Fusion. IEEE Sens. J. 2021, 21, 14503–14514. [Google Scholar] [CrossRef]
Kim, H.; Kim, M.; Park, K.; Kim, J.; Yoon, D.; Kim, W.; Park, C.H. Machine Learning-Based Classification Analysis of Knowledge Worker Mental Stress. Front. Public Health 2023, 11, 1302794. [Google Scholar] [CrossRef]
Kim, N.; Lee, S.; Kim, J.; Choi, S.Y.; Park, S.M. Shuffled ECA-Net for Stress Detection from Multimodal Wearable Sensor Data. Comput. Biol. Med. 2024, 183, 109217. [Google Scholar] [CrossRef]
Konar, D.; De, S.; Mukherjee, P.; Roy, A.H. A Novel Human Stress Level Detection Technique Using EEG. In Proceedings of the 2023 International Conference on Network, Multimedia and Information Technology (NMITCON), Bengaluru, India, 1–2 September 2023; pp. 1–6. [Google Scholar] [CrossRef]
Kurniawan, H.; Maslov, A.V.; Pechenizkiy, M. Stress Detection from Speech and Galvanic Skin Response Signals. In Proceedings of the CBMS 2013—26th IEEE International Symposium on Computer-Based Medical Systems, Porto, Portugal, 20–22 June 2013; pp. 209–214. [Google Scholar]
Lingelbach, K.; Bui, M.; Diederichs, F.; Vukelic, M. Exploring Conventional, Automated and Deep Machine Learning for Electrodermal Activity-Based Drivers’ Stress Recognition. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 1339–1344. [Google Scholar]
Liu, Y.; Li, H.; Wang, J.; Zhang, H.; Zheng, X. Psychological Stress Detection Based on Heart Rate Variability. In Proceedings of the International Conference on Electronic Information Engineering and Computer Science (EIECS 2022), Changchun, China, 16–18 September 2022; Yue, Y., Ed.; SPIE: Cergy, France, 2023; Volume 12602, p. 150. [Google Scholar]
Mamdouh, M.; Mahmoud, R.; Attallah, O.; Al-Kabbany, A. Stress Detection in the Wild: On the Impact of Cross-Training on Mental State Detection. In Proceedings of the 2023 40th National Radio Science Conference (NRSC), Giza, Egypt, 30 May–1 June 2023; pp. 150–158. [Google Scholar] [CrossRef]
Marthinsen, A.J.; Galtung, I.T.; Cheema, A.; Sletten, C.M.; Andreassen, I.M.; Sletta, Ø.; Soler, A.; Molinas, M. Psychological Stress Detection with Optimally Selected EEG Channel Using Machine Learning Techniques. CEUR Workshop Proc. 2023, 3576, 53–68. [Google Scholar]
Mevlevioğlu, D.; Tabirca, S.; Murphy, D. Real-Time Classification of Anxiety in Virtual Reality Therapy Using Biosensors and a Convolutional Neural Network. Biosensors 2024, 14, 131. [Google Scholar] [CrossRef]
Meteier, Q.; Capallera, M.; Ruffieux, S.; Angelini, L.; Abou Khaled, O.; Mugellini, E.; Widmer, M.; Sonderegger, A. Classification of Drivers’ Workload Using Physiological Signals in Conditional Automation. Front. Psychol. 2021, 12, 596038. [Google Scholar] [CrossRef] [PubMed]
Meteier, Q.; De Salis, E.; Capallera, M.; Widmer, M.; Angelini, L.; Abou Khaled, O.; Sonderegger, A.; Mugellini, E. Relevant Physiological Indicators for Assessing Workload in Conditionally Automated Driving, Through Three-Class Classification and Regression. Front. Comput. Sci. 2022, 3, 775282. [Google Scholar] [CrossRef]
Meteier, Q.; Capallera, M.; de Salis, E.; Angelini, L.; Carrino, S.; Widmer, M.; Abou Khaled, O.; Mugellini, E.; Sonderegger, A. A Dataset on the Physiological State and Behavior of Drivers in Conditionally Automated Driving. Data Br. 2023, 47, 109027. [Google Scholar] [CrossRef]
Naegelin, M.; Weibel, R.P.; Kerr, J.I.; Schinazi, V.R.; La Marca, R.; von Wangenheim, F.; Hoelscher, C.; Ferrario, A. An Interpretable Machine Learning Approach to Multimodal Stress Detection in a Simulated Office Environment. J. Biomed. Inform. 2023, 139, 104299. [Google Scholar] [CrossRef] [PubMed]
Plarre, K.; Raij, A.; Hossain, S.M.; Ali, A.A.; Nakajima, M.; Al’Absi, M.; Ertin, E.; Kamarck, T.; Kumar, S.; Scott, M.; et al. Continuous Inference of Psychological Stress from Sensory Measurements Collected in the Natural Environment. In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN’11, Chicago, IL, USA, 12–14 April 2011; pp. 97–108. [Google Scholar]
Rajendran, V.G.; Jayalalitha, S.; Adalarasu, K.; Thalamalaichamy, M. Analysis and Classification of EEG Data When Playing Video Games and Relax Using EEG Biomarkers. AIP Conf. Proc. 2024, 3180, 040002. [Google Scholar] [CrossRef]
Sandulescu, V.; Andrews, S.; Ellis, D.; Bellotto, N.; Mozos, O.M. Stress Detection Using Wearable Physiological Sensors. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 2015, 9107, 526–532. [Google Scholar]
Setz, C.; Arnrich, B.; Schumm, J.; La Marca, R.; Tr, G.; Ehlert, U. Discriminating Stress From Cognitive Load Using a Wearable EDA Device. Technology 2010, 14, 410–417. [Google Scholar] [CrossRef]
Sharisha Shanbhog, M.; Medikonda, J.; Rai, S. Unsupervised Machine Learning Approach for Stress Level Classification Using Electrodermal Activity Signals. In Proceedings of the 2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 12–14 July 2024; pp. 1–6. [Google Scholar] [CrossRef]
Shaposhnyk, O.; Yanushkevich, S.; Babenko, V.; Chernykh, M.; Nastenko, I. Inferring Cognitive Load Level from Physiological and Personality Traits. In Proceedings of the 2023 International Conference on Information and Digital Technologies (IDT), Zilina, Slovakia, 20–22 June 2023; pp. 233–242. [Google Scholar]
Siam, A.I.; Gamel, S.A.; Talaat, F.M. Automatic Stress Detection in Car Drivers Based on Non-Invasive Physiological Signals Using Machine Learning Techniques. Neural Comput. Appl. 2023, 35, 12891–12904. [Google Scholar] [CrossRef]
Silva, E.; Aguiar, J.; Reis, L.P.; Sá, J.O.E.; Gonçalves, J.; Carvalho, V. Stress among Portuguese Medical Students: The EuStress Solution. J. Med. Syst. 2020, 44, 45. [Google Scholar] [CrossRef]
Souchet, A.D.; Lamarana Diallo, M.; Lourdeaux, D. Acute Stress Classification with a Stroop Task and In-Office Biophilic Relaxation in Virtual Reality Based on Behavioral and Physiological Data. In Proceedings of the 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Sydney, Australia, 16–20 October 2023; pp. 537–542. [Google Scholar] [CrossRef]
Subhani, A.R.; Mumtaz, W.; Saad, M.N.B.M.; Kamel, N.; Malik, A.S. Machine Learning Framework for the Detection of Mental Stress at Multiple Levels. IEEE Access 2017, 5, 13545–13556. [Google Scholar] [CrossRef]
Swapnil, S.S.; Nuhi-Alamin, M.; Rahman, K.M.; Sarkar, A.K.; Siam, M.Z.H. An Ensemble Approach to Classify Mental Stress Using EEG Based Time-Frequency and Non-Linear Features. In Proceedings of the 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), Gazipur, Bangladesh, 25–27 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Toshnazarov, K.; Lee, U.; Kim, B.H.; Mishra, V.; Najarro, L.A.C.; Noh, Y. SOSW: Stress Sensing with Off-the-Shelf Smartwatches in the Wild. IEEE Internet Things J. 2024, 11, 21527–21545. [Google Scholar] [CrossRef]
Troyee, T.G.; Chowdhury, M.H.; Khondakar, M.F.K.; Hasan, M.; Hossain, M.A.; Hossain, Q.D.; Ali Akber Dewan, M. Stress Detection and Audio-Visual Stimuli Classification from Electroencephalogram. IEEE Access 2024, 12, 145417–145427. [Google Scholar] [CrossRef]
Troyee, T.G.; Karim Khondakar, M.F.; Hasan, M.; Chowdhury, M.H. A Comparative Analysis of Different Preprocessing Pipelines for EEG-Based Mental Stress Detection. In Proceedings of the 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), Dhaka, Bangladesh, 2–4 May 2024; pp. 370–375. [Google Scholar] [CrossRef]
Xing, M.; Fitzgerald, J.M.; Klumpp, H. Classification of Social Anxiety Disorder With Support Vector Machine Analysis Using Neural Correlates of Social Signals of Threat. Front. Psychiatry 2020, 11, 144. [Google Scholar] [CrossRef] [PubMed]
Zhu, L.; Spachos, P.; Ng, P.C.; Yu, Y.; Wang, Y.; Plataniotis, K.; Hatzinakos, D. Stress Detection Through Wrist-Based Electrodermal Activity Monitoring and Machine Learning. IEEE J. Biomed. Health Inform. 2023, 27, 2155–2165. [Google Scholar] [CrossRef]
Benchekroun, M.; Chevallier, B.; Beaouiss, H.; Istrate, D.; Zalc, V.; Khalil, M.; Lenne, D. Comparison of Stress Detection through ECG and PPG Signals Using a Random Forest-Based Algorithm. In Proceedings of the 2022 44th annual international conference of the IEEE engineering in medicine & Biology society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 3150–3153. [Google Scholar] [CrossRef]
Chauhan, A.R.; Akhil; Kumar, S. Analysing Effectiveness of Different Physiological Biomarkers in Detecting Stress. In Proceedings of the 2023 IEEE World Conference on Applied Intelligence and Computing (AIC), Sonbhadra, India, 29–30 July 2023; pp. 71–75. [Google Scholar] [CrossRef]
Dahal, K.; Bogue-Jimenez, B.; Doblas, A. Global Stress Detection Framework Combining a Reduced Set of HRV Features and Random Forest Model. Sensors 2023, 23, 5220. [Google Scholar] [CrossRef]
Gazi, A.H.; Lis, P.; Mohseni, A.; Ompi, C.; Giuste, F.O.; Shi, W.; Inan, O.T.; Wang, M.D. Respiratory Markers Significantly Enhance Anxiety Detection Using Multimodal Physiological Sensing. In Proceedings of the 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Athens, Greece, 27–30 July 2021; pp. 1–4. [Google Scholar]
Gjoreski, M.; Gjoreski, H.; Luštrek, M.; Gams, M. Continuous Stress Detection Using a Wrist Device-in Laboratory and Real Life. In Proceedings of the UbiComp 2016 Adjunct—Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September 2016; pp. 1185–1193. [Google Scholar]
Iyer, G.G.; Udhayakumar, R.; Gopakumar, S.; Karmakar, C. Optimizing Temporal Segmentation of Multi-Modal Non-EEG Signals for Human Stress Analysis. In Proceedings of the 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 26–28 August 2024; pp. 494–499. [Google Scholar] [CrossRef]
Morshed, M.B.; Rahman, M.M.; Nathan, V.; Zhu, L.; Bae, J.; Rosa, C.; Mendes, W.B.; Kuang, J.; Gao, A. Core Body Temperature and Its Role in Detecting Acute Stress: A Feasibility Study. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 1606–1610. [Google Scholar] [CrossRef]
Pinge, A.; Bandyopadhyay, S.; Ghosh, S.; Sen, S. A Comparative Study between ECG-Based and PPG-Based Heart Rate Monitors for Stress Detection. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India, 4–8 January 2022; pp. 84–89. [Google Scholar] [CrossRef]
Quadir, M.A.; Bhardwaj, S.; Verma, N.; Sivaraman, A.K.; Tee, K.F. IoT-Based Mental Health Monitoring System Using Machine Learning Stress Prediction Algorithm in Real-Time Application. Lect. Notes Electr. Eng. 2023, 1021 LNEE, 249–263. [Google Scholar] [CrossRef]
Rashid, N.; Mortlock, T.; Faruque, M.A. Al Stress Detection Using Context-Aware Sensor Fusion from Wearable Devices. IEEE Internet Things J. 2023, 10, 14114–14127. [Google Scholar] [CrossRef]
Schmidt, P.; Reiss, A.; Duerichen, R.; Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder, CO, USA, 16–20 October 2018; pp. 400–408. [Google Scholar] [CrossRef]
Karthikeyan, P.; Murugappan, M.; Yaacob, S. EMG Signal Based Human Stress Level Classification Using Wavelet Packet Transform. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 330 CCIS, pp. 236–243. ISBN 9783642351969. [Google Scholar]
Mazlan, M.R.B.; Sukor, A.S.B.A.; Adom, A.H.B.; Jamaluddin, R.B.; Awang, S.A.B. Investigation of Different Classifiers for Stress Level Classification Using PCA-Based Machine Learning Method. In Proceedings of the 2023 19th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Kedah, Malaysia, 3–4 March 2023; pp. 168–173. [Google Scholar] [CrossRef]
Wijsman, J.; Grundlehner, B.; Liu, H.; Hermens, H.; Penders, J. Towards Mental Stress Detection Using Wearable Physiological Sensors. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Boston, MA, USA, 30 August–3 September 2011; pp. 1798–1801. [Google Scholar]
Kader, L.A.; Yahya, F.; Tariq, U.; Al-Nashash, H. Mental Stress Assessment Using Low in Cost Single Channel EEG System. In Proceedings of the 2023 Advances in Science and Engineering Technology International Conferences, ASET 2023, Dubai, United Arab Emirates, 20–23 February 2023. [Google Scholar]
Jain, A.; Kumar, R. Machine Learning Based Anxiety Detection Using Physiological Signals and Context Features. In Proceedings of the 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2–3 May 2024; pp. 116–121. [Google Scholar] [CrossRef]
Sim, D.Y.Y.; Chong, C.K. Effects of Dimension Reduction Methods on Boosting Algorithms for Better Prediction Accuracies on Classifications of Stress EEGs. In Proceedings of the 2023 6th International Conference on Electronics and Electrical Engineering Technology (EEET), Nanjing, China, 1–3 December 2023; pp. 49–54. [Google Scholar] [CrossRef]
Choi, J.; Ahmed, B.; Gutierrez-Osuna, R. Development and Evaluation of an Ambulatory Stress Monitor Based on Wearable Sensors. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 279–286. [Google Scholar] [CrossRef]
Mozafari, M.; Goubran, R.; Green, J.R. A Fusion Model for Cross-Subject Stress Level Detection Based on Transfer Learning. In Proceedings of the 2021 IEEE Sensors Applications Symposium, SAS 2021-Proceedings, Sundsvall, Sweden, 23–25 August 2021; pp. 1–6. [Google Scholar]
Masrur, N.; Halder, N.; Rashid, S.; Setu, J.H.; Islam, A.; Ahmed, T. Performance Analysis of Ensemble and DNN Models for Decoding Mental Stress Utilizing ECG-Based Wearable Data Fusion. In Proceedings of the 2024 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), Tbilisi, Georgia, 24–27 June 2024; pp. 276–279. [Google Scholar] [CrossRef]
Adarsh, V.; Gangadharan, G.R. Mental Stress Detection from Ultra-Short Heart Rate Variability Using Explainable Graph Convolutional Network with Network Pruning and Quantisation. Mach. Learn. 2024, 113, 5467–5494. [Google Scholar] [CrossRef]
Al-Shargie, F.; Badr, Y.; Tariq, U.; Babiloni, F.; Al-Mughairbi, F.; Al-Nashash, H. Classification of Mental Stress Levels Using EEG Connectivity and Convolutional Neural Networks. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Appriou, A.; Cichocki, A.; Lotte, F. Modern Machine-Learning Algorithms: For Classifying Cognitive and Affective States From Electroencephalography Signals. IEEE Syst. Man Cybern. Mag. 2020, 6, 29–38. [Google Scholar] [CrossRef]
Shirley Benita, D.; Shamila Ebenezer, A.; Susmitha, L.; Subathra, M.S.P.; Jeba Priya, S. Stress Detection Using CNN on the WESAD Dataset. In Proceedings of the 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC), Bhubaneswar, India, 9–10 February 2024; pp. 308–313. [Google Scholar] [CrossRef]
Chatterjee, D.; Dutta, S.; Shaikh, R.; Saha, S.K. A Lightweight Deep Neural Network for Detection of Mental States from Physiological Signals. Innov. Syst. Softw. Eng. 2022, 20, 405–412. [Google Scholar] [CrossRef]
Mortensen, J.A.; Mollov, M.E.; Chatterjee, A.; Ghose, D.; Li, F.Y. Multi-Class Stress Detection Through Heart Rate Variability: A Deep Neural Network Based Study. IEEE Access 2023, 11, 57470–57480. [Google Scholar] [CrossRef]
Zontone, P.; Affanni, A.; Piras, A.; Rinaldo, R. Convolutional Neural Networks Using Scalograms for Stress Recognition in Drivers. In Proceedings of the 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023; pp. 1185–1189. [Google Scholar] [CrossRef]
Dhaouadi, S.; Ben Khelifa, M.M. A Multimodal Physiological-Based Stress Recognition: Deep Learning Models’ Evaluation in Gamers’ Monitoring Application. In Proceedings of the 2020 International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2020, Sousse, Tunisia, 2–5 September 2020; pp. 1–6. [Google Scholar]
Praveenkumar, S.; Karthick, T. Automatic Stress Recognition System with Deep Learning Using Multimodal Psychological Data. In Proceedings of the 2022 International Conference on Electronic Systems and Intelligent Computing, ICESIC 2022, Chennai, India, 22–23 April 2022; pp. 122–127. [Google Scholar]
Uddin, J. An Autoencoder Based Emotional Stress State Detection Approach Using Electroencephalography Signals. J. Inf. Syst. Telecommun. 2023, 11, 24–30. [Google Scholar] [CrossRef]
Eisenbarth, H.; Chang, L.J.; Wager, T.D. Multivariate Brain Prediction of Heart Rate and Skin Conductance Responses to Social Threat. J. Neurosci. 2016, 36, 11987–11998. [Google Scholar] [CrossRef] [PubMed]
Onim, M.S.H.; Thapliyal, H. Predicting Stress in Older Adults with RNN and LSTM from Time Series Sensor Data and Cortisol. In Proceedings of the 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Knoxville, TN, USA, 1–3 July 2024; pp. 300–306. [Google Scholar] [CrossRef]
Rashmi, C.R.; Shantala, C.P. Cognitive Stress Recognition During Mathematical Task and EEG Changes Following Audio-Visual Stimuli for Relaxation. In Proceedings of the 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), Theni, India, 15–17 November 2023; pp. 612–617. [Google Scholar] [CrossRef]
Tigranyan, S.; Martirosyan, A. Breaking Barriers in Stress Detection: An Inter-Subject Approach Using ECG Signals. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; pp. 1850–1855. [Google Scholar] [CrossRef]
Amin, M.; Ullah, K.; Asif, M.; Shah, H.; Mehmood, A.; Khan, M.A. Real-World Driver Stress Recognition and Diagnosis Based on Multimodal Deep Learning and Fuzzy EDAS Approaches. Diagnostics 2023, 13, 1897. [Google Scholar] [CrossRef] [PubMed]
Barki, H.; Chung, W.Y. Mental Stress Detection Using a Wearable In-Ear Plethysmography. Biosensors 2023, 13, 397. [Google Scholar] [CrossRef]
Fan, T.; Qiu, S.; Wang, Z.; Zhao, H.; Jiang, J.; Wang, Y.; Xu, J.; Sun, T.; Jiang, N. A New Deep Convolutional Neural Network Incorporating Attentional Mechanisms for ECG Emotion Recognition. Comput. Biol. Med. 2023, 159, 106938. [Google Scholar] [CrossRef]
Huynh, L.; Nguyen, T.; Nguyen, T.; Pirttikangas, S.; Siirtola, P. StressNAS: Affect State and Stress Detection Using Neural Architecture Search. In Proceedings of the Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers, Virtual, 21–26 September 2021; ACM: New York, NY, USA, 2021; pp. 121–125. [Google Scholar]
Ragav, A.; Krishna, N.H.; Narayanan, N.; Thelly, K.; Vijayaraghavan, V. Scalable Deep Learning for Stress and Affect Detection on Resource-Constrained Devices. In Proceedings of the Proceedings-18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, Boca Raton, FL, USA, 6–19 December 2019; pp. 1585–1592. [Google Scholar]
Başaran, O.T.; Can, Y.S.; André, E.; Ersoy, C. Relieving the Burden of Intensive Labeling for Stress Monitoring in the Wild by Using Semi-Supervised Learning. Front. Psychol. 2023, 14, 1293513. [Google Scholar] [CrossRef]
Halder, N.; Setu, J.H.; Rafid, L.; Islam, A.; Amin, M.A. Smartwatch-Based Human Stress Diagnosis Utilizing Physiological Signals and LSTM-Driven Machine Intelligence. In Proceedings of the 2024 Advances in Science and Engineering Technology International Conferences (ASET), Abu Dhabi, United Arab Emirates, 3–5 June 2024; pp. 1–8. [Google Scholar] [CrossRef]
Tanwar, R.; Singh, G.; Pal, P.K. FuSeR: Fusion of Wearables Data for StrEss Recognition Using Explainable Artificial Intelligence Models. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
Shrestha, A.; Mahmood, A. Review of Deep Learning Algorithms and Architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Koldijk, S.; Sappelli, M.; Verberne, S.; Neerincx, M.A.; Kraaij, W. The Swell Knowledge Work Dataset for Stress and User Modeling Research. In Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Turkey, 12–16 November 2014; pp. 291–298. [Google Scholar] [CrossRef]
Gjoreski, M.; Kolenik, T.; Knez, T.; Luštrek, M.; Gams, M.; Gjoreski, H.; Pejović, V. Datasets for Cognitive Load Inference Using Wearable Sensors and Psychological Traits. Appl. Sci. 2020, 10, 3843. [Google Scholar] [CrossRef]
Zyma, I.; Tukaev, S.; Seleznov, I.; Kiyono, K.; Popov, A.; Chernykh, M.; Shpenkov, O. Electroencephalograms during Mental Arithmetic Task Performance. Data 2019, 4, 14. [Google Scholar] [CrossRef]
Ghosh, R.; Deb, N.; Sengupta, K.; Phukan, A.; Choudhury, N.; Kashyap, S.; Phadikar, S.; Saha, R.; Das, P.; Sinha, N.; et al. SAM 40: Dataset of 40 Subject EEG Recordings to Monitor the Induced-Stress While Performing Stroop Color-Word Test, Arithmetic Task, and Mirror Image Recognition Task. Data Br. 2022, 40, 107772. [Google Scholar] [CrossRef]
Healey, J.A.; Picard, R.W. Detecting Stress during Real-World Driving Tasks Using Physiological Sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef]
Haouij, N.E.; Poggi, J.M.; Sevestre-Ghalila, S.; Ghozi, R.; Jadane, M. AffectiveROAD System and Database to Assess Driver’s Attention. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, Pau, France, 9–13 April 2018; pp. 800–803. [Google Scholar] [CrossRef]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
Sharma, K.; Castellini, C.; van den Broek, E.L.; Albu-Schaeffer, A.; Schwenker, F. A Dataset of Continuous Affect Annotations and Physiological Signals for Emotion Analysis. Sci. Data 2019, 6, 196. [Google Scholar] [CrossRef]
Hosseini, S.; Gottumukkala, R.; Katragadda, S.; Bhupatiraju, R.T.; Ashkar, Z.; Borst, C.W.; Cochran, K. A Multimodal Sensor Dataset for Continuous Stress Detection of Nurses in a Hospital. Sci. Data 2022, 9, 255. [Google Scholar] [CrossRef]
Birjandtalab, J.; Cogan, D.; Pouyan, M.B.; Nourani, M. A Non-EEG Biosignals Dataset for Assessment and Visualization of Neurological Status. In Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems (SiPS), Dallas, TX, USA, 26–28 October 2016; pp. 110–114. [Google Scholar] [CrossRef]
Sah, R.K.; Cleveland, M.J.; Habibi, A.; Ghasemzadeh, H. Stressalyzer: Convolutional Neural Network Framework for Personalized Stress Classification. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 4658–4663. [Google Scholar] [CrossRef]
Ding, Y.; Liu, J.; Zhang, X.; Yang, Z. Dynamic Tracking of State Anxiety via Multi-Modal Data and Machine Learning. Front. Psychiatry 2022, 13, 757961. [Google Scholar] [CrossRef] [PubMed]
Jiao, Y.; Wang, X.; Zhao, L.; Dong, H.; Du, G.; Zhao, S.; Liu, Y.; Liu, C.; Wang, D.; Liang, W. An Improved Sequence Coding-Based Gray Level Co-Occurrence Matrix for Mild Stress Assessment. Biomed. Signal Process. Control 2024, 95, 106357. [Google Scholar] [CrossRef]
Ribeiro, G.; Postolache, O.; Ferrero, F. A New Intelligent Approach for Automatic Stress Level Assessment Based on Multiple Physiological Parameters Monitoring. IEEE Trans. Instrum. Meas. 2024, 73, 3342218. [Google Scholar] [CrossRef]
Akbas, A. Evaluation of the Physiological Data Indicating the Dynamic Stress Level of Drivers. Sci. Res. Essays 2011, 6, 430–439. [Google Scholar] [CrossRef]
Bh, S.; Neelima, K.; Deepanjali, C.; Bhuvanashree, P.; Duraipandian, K.; Rajan, S.; Sathiyanarayanan, M. Mental Health Analysis of Employees Using Machine Learning Techniques. In Proceedings of the 2022 14th International Conference on COMmunication Systems and NETworkS, COMSNETS 2022, Bangalore, India, 4–8 January 2022; pp. 1–6. [Google Scholar]
Saganowski, S.; Perz, B.; Polak, A.G.; Kazienko, P. Emotion Recognition for Everyday Life Using Physiological Signals From Wearables: A Systematic Literature Review. IEEE Trans. Affect. Comput. 2023, 14, 1876–1897. [Google Scholar] [CrossRef]
Dedovic, K.; Duchesne, A.; Andrews, J.; Engert, V.; Pruessner, J.C. The Brain and the Stress Axis: The Neural Correlates of Cortisol Regulation in Response to Stress. Neuroimage 2009, 47, 864–871. [Google Scholar] [CrossRef] [PubMed]
Stuart, T.; Hanna, J.; Gutruf, P. Wearable Devices for Continuous Monitoring of Biosignals: Challenges and Opportunities. APL Bioeng. 2022, 6, 021502. [Google Scholar] [CrossRef]
Boudreaux, B.D.; Hebert, E.P.; Hollander, D.B.; Williams, B.M.; Cormier, C.L.; Naquin, M.R.; Gillan, W.W.; Gusew, E.E.; Kraemer, R.R. Validity of Wearable Activity Monitors during Cycling and Resistance Exercise. Med. Sci. Sports Exerc. 2018, 50, 624–633. [Google Scholar] [CrossRef] [PubMed]
Smets, E.; De Raedt, W.; Van Hoof, C. Into the Wild: The Challenges of Physiological Stress Detection in Laboratory and Ambulatory Settings. IEEE J. Biomed. Health Inform. 2019, 23, 463–473. [Google Scholar] [CrossRef]
Can, Y.S.; Gokay, D.; Kılıç, D.R.; Ekiz, D.; Chalabianloo, N.; Ersoy, C. How Laboratory Experiments Can Be Exploited for Monitoring Stress in the Wild: A Bridge between Laboratory and Daily Life. Sensors 2020, 20, 838. [Google Scholar] [CrossRef] [PubMed]
Choi, J.B.; Hong, S.; Nelesen, R.; Bardwell, W.A.; Natarajan, L.; Schubert, C.; Dimsdale, J.E. Age and Ethnicity Differences in Short-Term Heart-Rate Variability. Psychosom. Med. 2006, 68, 421–426. [Google Scholar] [CrossRef]
Graves, B.S.; Hall, M.E.; Dias-Karch, C.; Haischer, M.H.; Apter, C. Gender Differences in Perceived Stress and Coping among College Students. PLoS ONE 2021, 16, e0255634. [Google Scholar] [CrossRef]
Mueller, A.; Strahler, J.; Armbruster, D.; Lesch, K.P.; Brocke, B.; Kirschbaum, C. Genetic Contributions to Acute Autonomic Stress Responsiveness in Children. Int. J. Psychophysiol. 2012, 83, 302–308. [Google Scholar] [CrossRef]
Ellis, B.J.; Jackson, J.J.; Boyce, W.T. The Stress Response Systems: Universality and Adaptive Individual Differences. Dev. Rev. 2006, 26, 175–212. [Google Scholar] [CrossRef]
McEwen, B.S. The Neurobiology of Stress: From Serendipity to Clinical Relevance. Brain Res. 2000, 886, 172–189. [Google Scholar] [CrossRef]
Grissom, N.; Bhatnagar, S. Habituation to Repeated Stress: Get Used to It. Neurobiol. Learn. Mem. 2009, 92, 215–224. [Google Scholar] [CrossRef]
Alkurdi, A.; He, M.; Cerna, J.; Clore, J.; Sowers, R.; Hsiao-Wecksler, E.T.; Hernandez, M.E. Extending Anxiety Detection from Multimodal Wearables in Controlled Conditions to Real-World Environments. Sensors 2025, 25, 1241. [Google Scholar] [CrossRef]
Alkurdi, A.; Clore, J.; Sowers, R.; Hsiao-Wecksler, E.T.; Hernandez, M.E. Resilience of Machine Learning Models in Anxiety Detection: Assessing the Impact of Gaussian Noise on Wearable Sensors. Appl. Sci. 2025, 15, 88. [Google Scholar] [CrossRef]
Kothgassner, O.D.; Goreis, A.; Bauda, I.; Ziegenaus, A.; Glenk, L.M.; Felnhofer, A. Virtual Reality Biofeedback Interventions for Treating Anxiety: A Systematic Review, Meta-Analysis and Future Perspective. Wien. Klin. Wochenschr. 2022, 134, 49. [Google Scholar] [CrossRef] [PubMed]
Gradl, S.; Wirth, M.; Zillig, T.; Eskofier, B.M. Visualization of Heart Activity in Virtual Reality: A Biofeedback Application Using Wearable Sensors. In Proceedings of the 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks, BSN 2018, Las Vegas, NV, USA, 4–7 March 2018; Volume 2018-Janua, pp. 152–155. [Google Scholar]
Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMAScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Typical process of developing feature-based anxiety-detection models. End-to-end models bypass feature extraction and selection steps (Step 3 and 4) by directly utilizing raw signals.

Figure 2. Flowchart of database results and screening methods for articles, segregated based on the machine-learning approaches used: feature-based (FB) and end-to-end (E2E). Seven selected papers used both FB and E2E models.

Table 1. Search string used for each database.

Database	Search String
PubMed	(“machine learning”) AND “anxiety” NOT (“depression” OR “Autism” OR Stroke OR “depressive” OR phobia)
IEEE Xplore	(“machine learning”) AND ((“psychological stress” OR “mental stress” OR “emotional stress” OR “mental workload” OR “stressful) OR “anxiety”)
Scopus	TITLE-ABS-KEY (“machine learning” AND (“psychological stress” OR “mental stress” OR “emotional stress” OR “mental workload” OR “cognitive workload” OR “Cognitive stress” OR “anxiety”)) AND NOT TITLE-ABS (review OR survey OR scoping OR autism OR autistic OR diabetic) AND NOT TITLE (treatment OR suicide OR surgery OR depression OR depressed OR “anxiety disorders” OR vaccine OR child OR children OR cells OR glycemia OR tumor OR tremor OR gender OR wealth OR “mental illness” OR disorder OR “management system” OR “intelligence” OR disease) AND (LIMIT-TO (PUBSTAGE, “final”)) AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”))

Table 2. Articles that used feature-based models.

FB Category	Models	References
Traditional models	SVM	[21,22,24,25,55,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111]
	RF	[23,24,55,56,57,59,60,62,65,68,69,71,72,78,79,82,83,84,87,88,92,93,94,95,101,102,103,104,105,111,112,113,114,115,116,117,118,119,120,121,122]
	kNN	[21,42,55,56,57,59,62,63,67,71,73,74,75,76,77,79,80,82,83,85,87,89,91,97,100,101,102,103,106,108,109,111,121,122,123,124,125]
	Naïve Bayes	[55,63,64,65,69,71,73,75,77,80,85,100,101,103,105,111,113,125,126]
	Boost	[21,24,55,57,59,60,62,67,72,74,75,78,79,87,95,96,101,104,106,107,113,118,121,122,127,128]
	DT	[24,55,58,60,62,63,64,65,67,69,79,80,82,83,86,96,97,100,101,102,106,108,109,113,121,122,127]
	LR	[24,62,65,68,69,81,82,83,84,87,100,101,102,103,104,105,107,111,117,118,129]
	LDA	[24,59,63,64,67,73,76,79,80,85,87,89,99,108,109,121,122,130]
	Ensemble	[54,55,85,97,106,131]
Deep learning models	CNN	[23,57,62,69,74,84,90,91,132,133,134,135,136,137,138]
	ANN	[42,67,81,91,101,103,106,113,131,139,140,141]
	MLP	[62,69,71,78,81,83,102,142]
	LSTM	[54,62,139,140,143,144]
	ResNet	[78,145]
	RNN	[143]
	CNN-LSTM	[24,55,58,60,62,63,64,65,67,69,79,80,82,83,86,96,97,100,101,102,106,108,109,113,121,122,127]
	GCN	[132]

SVM: support vector machine; RF: random forest; kNN: k-nearest neighbors; DT: decision tree; LDA: linear discriminant analysis; LR: linear regression; CNN: convolutional neural network; ANN: artificial neural network; RNN: recurrent neural network; MLP: multilayer perceptron; LSTM: long short-term memory, GCN: graph convolutional network.

Table 3. Articles that used end-to-end models.

E2E Models	References
CNN	[11,23,57,69,146,147,148]
FCN	[11,149]
Inception Time	[11]
LSTM	[150,151,152]
Multi-ResNet	[11]
ResNet	[57,87,149]
Encoder	[11]
Time CNN	[11]
CNN-LSTM	[11,146,151]
MLP	[11,124,149,151]
MLP-LSTM	[151]
RF and kNN	[56]
SVM, kNN, NB, LDA	[73]
Boost	[153]

CNN: convolutional neural network; FCN: fully convolutional neural network; MLP: multilayer perceptron; LSTM: long short-term memory.

Table 4. Articles grouped by stressors used to induce anxiety.

Stressor Type	References
Social Stressors	[11,21,22,24,57,59,67,68,75,78,79,95,96,98,99,100,107,111,113,114,118,119,120,121,122,127,129,131,132,135,136,140,143,145,148,149,150,152,153,166]
Mental Stressors	[42,55,56,58,63,64,65,66,68,69,72,74,75,77,80,81,82,83,85,86,87,89,90,95,96,97,99,101,103,104,105,106,107,108,109,111,112,115,116,117,118,119,123,124,125,126,128,129,130,132,133,134,137,139,144,147,152,167,168]
Physical Stressors	[22,57,61,72,77,84,96,107,111,119,152,169]
Emotional Stressors	[73,75,76,77,78,84,91,110,115,134,141,153]
Driving Stressors	[62,70,71,87,88,92,93,94,102,138,146,170]
Daily-Life Stressors	[23,54,55,60,77,96,107,116,151]

Table 5. Articles grouped by the experimental environment.

Condition	References
Laboratory setting	[11,21,22,24,55,56,57,58,59,61,63,64,65,66,67,68,69,72,73,74,75,77,78,79,80,81,82,83,84,85,86,87,89,90,92,93,94,95,96,97,98,99,100,101,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,126,127,128,129,130,131,132,133,134,135,136,137,139,140,143,144,145,147,148,149,150,152,153,166,167,168,169]
Semi-wild setting	[62,70,71,88,102,138,146,170]
In-the-wild setting	[23,54,55,60,77,96,107,116,151]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, M.; Alkurdi, A.; Clore, J.L.; Sowers, R.B.; Hsiao-Wecksler, E.T.; Hernandez, M.E. Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild. Appl. Sci. 2025, 15, 10099. https://doi.org/10.3390/app151810099

AMA Style

He M, Alkurdi A, Clore JL, Sowers RB, Hsiao-Wecksler ET, Hernandez ME. Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild. Applied Sciences. 2025; 15(18):10099. https://doi.org/10.3390/app151810099

Chicago/Turabian Style

He, Maxine, Abdulrahman Alkurdi, Jean L. Clore, Richard B. Sowers, Elizabeth T. Hsiao-Wecksler, and Manuel E. Hernandez. 2025. "Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild" Applied Sciences 15, no. 18: 10099. https://doi.org/10.3390/app151810099

APA Style

He, M., Alkurdi, A., Clore, J. L., Sowers, R. B., Hsiao-Wecksler, E. T., & Hernandez, M. E. (2025). Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild. Applied Sciences, 15(18), 10099. https://doi.org/10.3390/app151810099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scoping Review of ML Approaches in Anxiety Detection from In-Lab to In-the-Wild

Abstract

1. Introduction

1.1. Defining Anxiety

1.2. Measuring Anxiety

1.3. Inducing Anxiety

1.4. Detecting Anxiety

2. Review of Anxiety Detection Using ML

2.1. Methods

2.2. ML Models and Architectures

2.2.1. FB Models

2.2.2. E2E Models

3. Results

3.1. ML Techniques and Performances

3.2. Open Datasets for Anxiety Detection

3.3. Model Performances Based on Stressor Types

3.3.1. Social Stressors

3.3.2. Mental Stressors

3.3.3. Physical Stressors

3.3.4. Emotional Stressors

3.3.5. Driving Stressors

3.3.6. Daily-Life Stressors

4. Discussion

4.1. Summary

4.2. Key Observations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI