3.1. Sensor-Assisted Unity Architecture
The architectural overview of the proposed stress detection framework, which extends the existing Sensor-Assisted Unity Architecture, is depicted in
Figure 1. This figure schematically illustrates the key methodological innovation: a dynamic signal calibration pipeline designed to mitigate the influence of individual physiological variability. It shows the end-to-end process of the adaptive stress detection pipeline. Each block in the figure represents a logical stage in the workflow. Data acquisition feeds the calibration stage, where raw GSR and behavioral inputs are preprocessed and aligned. The dynamically calibrated signals then move into the weighted integration stage, which produces the final stress score used for classification. This connection between stages establishes a direct logical flow that helps visualize how raw data are progressively transformed into interpretable stress measures. Initially, raw physiological data such as GSR and behavioral data such as hesitation, tremor are captured within the VR environment. This raw data is then subjected to the Dynamic Calibration stage, which is the core innovation. In this stage, the raw GSR signals are compared against three distinct, concurrently calculated baselines: Global, Individual, and Pre-task. The Weighted Integration stage then utilizes a correlation-based analysis to automatically assign weights to each of these three baselines. This weighting ensures the final calculated Stress Score is robustly normalized against the participant’s unique physiological signature, moving away from conventional, less accurate fixed-threshold methods.
The proposed stress detection framework extends the Sensor-Assisted Unity Architecture by introducing a novel preprocessing pipeline for handling individual physiological variability. Implemented within a Unity-based VR environment, the system continuously monitors user behavior, including hesitation, tremor intensity, and inactivity. The comprehensive dataset combines baseline measurements, behavioral indicators, and participant identification variables to enable robust stress detection through improved normalization across simulated participants and experimental conditions.
The outputs of this architectural model form the input variables for the subsequent calibration process described in
Section 3.2. This establishes a continuous workflow linking Unity-based behavioral monitoring, GSR normalization, and final decision making.
To avoid hard-coded constants and enable sensitivity analysis, all model terms were expressed symbolically as demonstrated in
Table 1. This table presents the key parameters governing the stress detection framework, along with their physical interpretations and default values used throughout the experiments. The parameter
determines the proportion of initial trials allocated for establishing the pre-task baseline, with a default value of 0.2 representing 20% of the session. The behavioral indicator count
is set to 4, encompassing hesitation, tremor intensity, inactivity, and repeated errors.
As in
Table 2, the feature set comprises distinct variables organized into two categories.
Once the calibration weights are computed, the resulting normalized stress scores serve as the basis for the three-tier decision model introduced next. This transition maintains the methodological continuity between the calibration and classification processes.
3.2. Dataset
The foundation for the reality-aligned dataset was established using the WESAD dataset [
17], a comprehensive multimodal dataset containing physiological and motion data collected from wearable devices. The original WESAD dataset comprises data from 15 subjects (after exclusion of S1 and S12 due to sensor malfunction) who participated in a controlled laboratory study designed to elicit different affective states. The dataset incorporates synchronized measurements from two primary sensing platforms: a chest-worn RespiBAN device sampling at 700 Hz, and a wrist-worn Empatica E4 device with sensor-specific sampling rates. The original protocol included baseline (label 1), stress (label 2), amusement (label 3), and meditation periods (labels 4–5), with transitional segments (label 0), validated using PANAS, STAI, SAM, and SSSQ questionnaires [
17].
The chest device (RespiBAN) samples electrodermal activity at 700 Hz, offering laboratory-grade precision with minimal motion artefacts, whereas the wrist sensor (Empatica E4) operates at 4 Hz and is more susceptible to movement and peripheral thermal variation. To construct the reality-aligned dataset, the chest-based GSR was used as the reference signal due to its high fidelity. However, our WBD does not rely on raw amplitude; instead, it uses GSR (deviation from each participant’s baseline), which inherently compensates for magnitude differences between sensor sites and captures sympathetic activation dynamics rather than device-specific amplitude.
Prior work has shown that, once baseline-normalized, wrist- and chest-recorded EDA signals exhibit highly correlated temporal stress-response patterns [
8]. Accordingly, while chest-mounted sensors provide laboratory precision, the wrist-based VR configuration prioritizes ecological validity, comfort, and real-world applicability, delivering comparable stress-trend interpretation with minimal performance loss.
To further quantify wrist–chest consistency under wearable VR conditions, we analyzed the wrist channel in the reality-aligned WESAD subset. A moderate negative correlation was observed between GSR and skin temperature (, ), indicating that higher local temperature slightly reduces GSR amplitude due to peripheral vasodilation. Importantly, wrist temperature remained stable within 33–35 °C, while GSR exhibited rapid fluctuations associated with transient sympathetic activation. Because the WBD model operates on baseline deviation and includes temperature-adjusted calibration, this slow-varying peripheral effect is effectively compensated. Consequently, the wrist-worn configuration remains reliable and representative for VR stress detection, supporting wearable comfort and immersive-system operation.
3.3. Motion Artefact Filtering and Validation
To ensure that stress-related electrodermal activity signals were not distorted by rapid motion artefacts caused by head or controller movement during VR tasks, a third-order Butterworth low-pass filter with a cutoff frequency of approximately 1 Hz was applied to the raw GSR data. This frequency corresponds to the physiological bandwidth of electrodermal activity and effectively attenuates high-frequency noise while preserving true stress responses.
To further validate the robustness of wrist-based EDA signals in VR, an adaptive filtering model was applied to attenuate motion- and temperature-related artifacts while preserving true sympathetic dynamics. The filtering model can be expressed as follows:
where
denotes wrist-temperature deviation,
represents accelerometer-derived movement, and
is residual stochastic noise. Equation (
1) isolates the physiological component of the EDA signal, while Equation (
2) models peripheral temperature vasodilation and motion-induced artifacts, consistent with established EDA signal physiology.
As summarized in
Table 3, adaptive filtering removed temperature and motion-linked noise components while preserving true sympathetic responses, leading to substantial improvements in signal quality and classification accuracy. The removed-energy value in
Table 3 exceeded 100%, which is expected given the mathematical definition of signal energy. In this study, energy was computed using the standard formulation shown in Equation (
3):
where
denotes the GSR amplitude at time
t. Because energy is calculated from the squared amplitude, motion and temperature related artefacts often contain sharp transient peaks that contribute disproportionately large energy values when squared. Consequently, the total squared energy of the removed component can exceed the energy of the raw signal itself, as observed in our artefact component (mean = 0.1643
S, SD = 1.1462
S). This phenomenon is well-established in physiological signal processing and is typical when filtering high-amplitude motion spikes or large transient artefacts.
Notably, the filtered component exhibited a mean of 0.1643 S and a standard deviation of 1.1462 S, accounting for 162.9% of the total signal energy, confirming that the removed portion primarily represented noise rather than physiologically meaningful stress responses.
These results demonstrate that the adaptive filtering approach effectively suppressed noise-induced fluctuations such as rapid micro-motion and thermoregulatory drift without attenuating genuine stress responses, enabling reliable real-time stress detection in wearable VR settings.
Importantly, adaptive filtering substantially improved recognition performance. Prior to filtering, the raw wrist GSR produced a 38.97% false-positive rate and an overall accuracy of 61.00%. After filtering, the false-positive rate decreased to 13.15% and overall accuracy increased to 86.85%. The false-negative rate remained 0% in both cases, demonstrating that true stress responses were preserved while noise-induced activations were removed. These findings confirm that the adaptive filter enhances signal fidelity and supports reliable real-time VR stress detection.
Table 4 summarizes the quantitative impact of filtering. After filtering, the signal’s standard deviation decreased from 1.494 µS to 1.456 µS (a 2.55% reduction), while maintaining a high Pearson correlation of 0.982 with the unfiltered signal. These results confirm that motion artefacts were minor under our controlled VR setup and that filtering successfully suppressed transient noise without altering the slow physiological pattern associated with stress. The adopted method, therefore, provides a lightweight, real-time solution suitable for Unity-based VR environments, ensuring high signal fidelity with minimal computational overhead.
From the original WESAD multimodal sensor data, we extracted and processed key physiological and behavioral indicators to create a more focused dataset aligned with real-world stress detection scenarios. The processing pipeline involved several critical steps. During data integration, we selected primary stress-sensitive modalities from both sensing platforms, specifically focusing on galvanic skin response (GSR/EDA) data from the chest-worn device, which provides robust stress indicators with proven ecological validity in naturalistic settings.
This reality-aligned dataset integrates both physiological measurements and derived behavioral indicators. The physiological features include GSR values in microsiemens (
S), representing electrodermal activity converted from raw sensor outputs using the formula
, which captures skin conductance variations indicative of autonomic nervous system activation. The behavioral indicators comprise hesitation time, measured in seconds and ranging from 0.5 to 1.5 s, representing a derived behavioral metric that reflects decision-making latency patterns across affective states. Tremble amplitude denotes a motion-derived feature quantifying physiological tremor characteristics, extracted from accelerometer data and normalized to represent stress-induced motor variations.
Table 5 summarizes the feature categories extracted from the WESAD dataset, organized into physiological and behavioral domains. The physiological domain encompasses GSR measurements representing electrodermal activity, whereas the behavioral domain includes hesitation time, capturing decision-making latency, and tremble amplitude, quantifying tremor-related motor responses.
We transformed the original WESAD multi-class protocol labels into a three-class state classification system aligned with fundamental emotional valence theory. The Negative class was derived primarily from the original stress condition (label 2), representing negative affective states characterized by elevated arousal and negative valence. The Neutral class corresponds to baseline conditions (label 1), representing calm, non-activated emotional states. The Positive class was adapted from amusement conditions (label 3), representing positive affective states with elevated arousal and positive valence.
Table 6 presents the mapping strategy employed to transform the original WESAD multi-class protocol labels into a three-class affective state system. The table establishes the correspondence between each affective class and its original WESAD label, along with the associated affective characteristics defined by arousal and valence dimensions. This mapping enables alignment with fundamental emotional valence theory while maintaining traceability to the original experimental protocol.
Note that the original WESAD protocol included additional conditions (meditation with labels 4–5 and transitional periods with label 0 that were excluded from this three-class affective state classification.
To enhance dataset utility for machine learning applications and improve generalizability, we implemented a systematic data expansion approach. For participant scaling, the original dataset was expanded through validated data augmentation techniques that preserve physiological signal characteristics while introducing realistic inter-individual variation.
To further clarify the data expansion process, the original WESAD dataset containing 15 subjects was systematically extended to 100 virtual participants through a controlled stochastic replication method implemented in Python (version 3.11 (64-bit)). Each synthetic participant was generated by resampling an existing real participant’s data and applying small Gaussian perturbations to both physiological and behavioral features while preserving their global statistical structure. For each feature x, the perturbation was defined as , where represents the standard deviation of that feature in the base dataset. Baseline parameters such as Global, Individual, and Pre-Task were independently adjusted using random offsets drawn from , and a fixed random seed was used to ensure full reproducibility.
Quantitative validation confirmed that the augmented dataset maintained statistical fidelity with the original data for GSR, Hesitation, and Tremble, showing nearly complete overlap between distributions. The mean and standard deviation shifts were negligible (
GSR = +0.89%,
Hesitation = +0.10%,
Tremble = 0.00%), indicating that the synthetic expansion preserved physiological plausibility and inter-participant variability. Computed averages also remained consistent with the 15-participant baseline, as summarized in
Table 7, with overall deviation below 1 %. These findings confirm that virtual participant generation successfully expanded sample diversity while maintaining the integrity of the physiological and behavioral signal structure.
The resulting dataset comprised 12,000 labeled samples from 100 virtual participants, maintaining equal representation of negative, neutral, and positive affective states.
For sample balancing, each class contains exactly 4000 samples representing 33.3 percent of the total, with 120 samples per participant across 12,000 total observations, ensuring balanced representation for supervised learning applications. Regarding session structure, data is organized into participant-specific sessions, maintaining traceability to original conditions while supporting cross-validation and participant-independent evaluation protocols.
While the dataset includes three distinct affective states (Negative, Neutral, Positive) for comprehensive analysis, the practical VR stress detection system adopts a binary classification framework, distinguishing Stress from No Stress to align with real-world intervention requirements. In this binary scheme, the Negative class (label 2, stress condition) is mapped to Stress due to its combination of elevated arousal and negative valence, representing a psychologically demanding state that requires intervention. Both the Neutral (label 1, baseline) and Positive (label 3, amusement) classes are combined and mapped to No Stress. Although the Positive state exhibits elevated arousal, its positive valence indicates a beneficial rather than detrimental affective experience. This binary mapping prioritizes the detection of negative stress states that necessitate adaptive intervention in VR scenarios, as neither neutral calm nor positive engagement warrants corrective action.
Consistent with prior stress-detection literature and the WESAD protocol, Neutral and Positive conditions were therefore merged into a single No-Stress class, reflecting their shared low-arousal parasympathetic profiles and absence of physiological stress activation. Although the Positive condition exhibits high physiological arousal comparable to stress, it differs fundamentally in its emotional valence; stress is defined by high arousal and negative valence, whereas amusement reflects high arousal and positive valence. Therefore, Positive and Neutral conditions were both classified as No-Stress, consistent with established WESAD mappings and prior stress-detection literature.
Multiple baseline reference points were computed to support various normalization approaches. The global baseline represents population-level reference values calculated across all participants and conditions. The individual baseline captures participant-specific baseline values accounting for inter-individual physiological differences. The pre-task baseline represents session-specific baseline measurements capturing immediate pre-stimulus states.
Table 8 describes the three baseline reference types employed in the normalization strategy, each serving a distinct role in accounting for physiological variability. The global baseline provides a universal population-level reference, the individual baseline captures participant-specific physiological characteristics, and the pre-task baseline represents session-specific resting states measured immediately before experimental stimuli. These multiple baseline approaches enable robust normalization across diverse sources of physiological variation.
To enhance stress detection sensitivity, delta features representing deviations from baseline conditions were computed.
Table 9 presents the three delta features: Delta_GSR_Global measures deviation from the overall population average, Delta_GSR_Individual quantifies deviation from each participant’s unique physiological baseline, and Delta_GSR_PreTask captures the change from the resting state measured immediately before each experimental session. These delta features provide normalized stress indicators that account for individual differences and contextual variation, enabling more robust cross-participant generalization.
The final reality-aligned dataset comprises 12,000 temporally ordered observations across 19 features, representing a balanced three-class classification problem with equal representation of Negative, Neutral, and Positive affective states. The dataset preserves physiological plausibility while offering sufficient scale and complexity for modern machine learning applications, effectively bridging the gap between controlled laboratory conditions and real-world stress and affect detection scenarios. The reality-aligned dataset derived from WESAD retains three affective states for comprehensive physiological validation in
Section 4.2 and
Section 4.3. Following dataset characterization, the proposed Weighted Baseline Detector transitions to binary classification in
Section 4.4 onward, treating Negative (label 2) as “Stress” and combining Neutral (label 1) and Positive (label 3) as “No Stress.” This binary mapping emphasizes the detection of negative stress states that warrant intervention in VR training scenarios, as positive affective states (amusement) do not require corrective action despite elevated arousal, thereby aligning with operational VR stress detection objectives.