1. Introduction
In modern technological systems, humans are exposed to more information than ever before, shifting system limitations from computational power to human capacity [
1]. Research in human factors has shown that task-induced workload can substantially impair performance and cognitive resource allocation [
2,
3,
4]. Automation is often introduced in these information-heavy systems to reduce workload, but ironically, it often ends up generating more complex technical challenges than the ones it was intended to solve [
2,
4,
5,
6]. With the shift from whether to how much automation to implement, careful consideration of human factors has become more crucial than ever for system designers [
7].
Aviation exemplifies a domain where technological advancements have expanded the operator’s access to information, yet this abundance more often leads to information overload than to information dominance [
8]. Automation has been implemented in 5th-gen aircraft that reduces information overload through information fusion and automated sensor management, which allows the pilot to focus on tactical decision-making. However, automating tasks previously performed by humans has been shown to have negative effects, such as reducing the situational awareness of the operator [
9,
10,
11]. The implications of transitioning between autonomous and manual systems introduce the potential for degraded operator states, either in cognitive overload (e.g., fully manual operations) or cognitive underload (e.g., fully autonomous operations), and it is important to design systems such that human agents and automated agents can collaborate optimally to achieve decision making superiority [
7,
12]. Since neither full automation nor full manual control is universally practical, the path toward adaptive hybrid systems begins with understanding and quantifying the operator’s cognitive state.
Traditional methods of assessing cognitive workload often rely on subjective self-report tools such as the NASA-TLX [
13], which capture an individual’s perceived workload during specific experimental periods. However, these active assessments are not only time-consuming and intrusive for participants but also pose challenges for experimental validity, particularly in studies with small sample sizes. Subjective measures are prone to bias, as workload ratings have been shown to increase non-linearly with actual cognitive demand [
14]. Although some other performance-based measures have been used (e.g., reaction time, response time, task completion time, time to transition between operating modes or between tasks, etc.), these measures lack validity when performance is shared by the operator and automation.
Physiological measures such as heart rate, respiration, and brain activity provide continuous, objective insights into workload and arousal [
15,
16,
17]. Unlike subjective ratings or task-based measures, they capture high-frequency dynamics but are noisy and individualized, posing statistical challenges. Physiological monitoring is a promising approach to gathering more data in real time for human factors studies, especially in specialized environments such as aviation, where access to qualified participants is limited and simulations are resource-intensive. Recent research has examined the relationships between physiological indicators and traditional subjective measures, such as correlations between NASA-TLX scores and heart rate variability. These studies have revealed complex, nonlinear patterns [
18] and strong inter-correlations among subjective scales, yet often report weak or no associations with physiological measures [
19], highlighting the nuanced challenges in aligning self-reported and physiological data.
Deterministic approaches often fall short in complex man-made systems, where intra- and inter-individual differences render behavioral data fundamentally statistical in nature [
20,
21]. Traditional frequentist approaches commonly assume that repeated observations across individuals follow the same distribution, often removing outliers to preserve population-level assumptions. However, in the context of physiological data where responses are highly individualized and inherently complex, this “one-size-fits-all” strategy loses power. What looks like noise at the population level may reflect a meaningful signal. To produce robust and reliable predictions, statistical methods must account for individual variability rather than obscure and smooth it away. Rather, statistical methods used to analyze physiological data should be tailored to the individual to ensure robust and reliable predictions. Bayesian statistics offer a compelling alternative, providing a more ecologically valid framework for analyzing physiological data.
Unlike frequentist approaches, which focus on point estimates and often overlook uncertainty, Bayesian methods explicitly model uncertainty and differ in conceptualization, calculations, and functionality from traditional statistical approaches [
22]. By incorporating prior knowledge and updating beliefs as new data becomes available, Bayesian inference naturally accommodates both intra- and inter-subject variability [
23]. Epistemic uncertainty is quantified and reduced through posterior distributions as more evidence accumulates [
24]. This probabilistic foundation enables researchers to analyze complex, individualized, and often sparse psychophysiological datasets without overfitting or oversimplifying the underlying patterns. Recent publications in human factors have used Bayesian analyses for modeling driver response times during automated vehicle takeovers [
25,
26,
27], the effects of distraction and vigilance [
28,
29], and human automation interaction [
30,
31,
32]. Bayesian analyses of small-sample-size factorial designs have been explored in applications involving pilot incapacitation and flight trajectory predictions, and Bayesian methods were shown to increase the reliability and validity of results [
33,
34].
Recent studies highlight that pilot workload, fatigue, and situational awareness emerge from dynamic interactions among physiological, contextual, and operational factors rather than from any single measure. Physiological and subjective workload indicators in flight operations have been shown to diverge in nonlinear ways, and scheduling variables such as multi-day shifts and limited rest have been linked to measurable fatigue [
18,
35]. Undetected automation faults have been demonstrated to degrade situational awareness and decision-making, especially among less experienced pilots [
36,
37]. Collectively, these findings underscore that cognitive states fluctuate with context and time, motivating the need for probabilistic models to capture how physiological and performance patterns evolve under varying workload and automation conditions.
Despite growing interest in using physiology to assess cognitive states, most studies either treat psychophysiological signals as windowed features for classification or simple regression or apply Bayesian methods to discrete outcomes or small factorial designs. What is largely missing are probabilistic, time-sensitive models that operate directly on continuous multivariate physiological streams and capture both inter- and intra-subject variability. This gap is especially apparent in aviation contexts where sample sizes are small, performance is shared with automation, and subjective ratings are sparse and biased. Methods that exploit longitudinal, continuous time series per participant while borrowing strength across subjects are rare. As a result, current approaches provide limited individualized inference, weak uncertainty quantification, and little guidance for closing the loop between physiology and system design.
This study develops a hierarchical Bayesian model to analyze continuous physiological data from pilots in a flight simulator, addressing small-sample challenges and individual variability. By modeling heart rate, respiration, and EEG-derived workload, we evaluate the potential of Bayesian methods for real-time cognitive state estimation in aviation. Despite the conceptual alignment between Bayesian inference and biological signal variability, there is limited work in applying Bayesian techniques to continuous physiological time series data in human factors research. To address this gap, physiological data was collected from experienced pilots performing flight tasks under varying levels of automation and task difficulty within a high-fidelity flight simulator. A hierarchical Bayesian model was constructed to examine the psychophysiological correlates of these dynamic flight environments and to evaluate the utility of Bayesian methods in an applied small-sample aviation use case. This work explicitly models individual differences, reducing the need to discard “outliers” that may be meaningful signal. Ultimately, this work seeks to advance real-time physiological state estimation by evaluating continuous physiological variables well-suited for Bayesian modeling, and developing an adaptable, data-efficient framework capable of producing robust and generalizable predictions across shifting environmental demands, limited sample sizes, and varying cognitive states.
2. Materials and Methods
Five pilots participated in the study, approved by Oklahoma State University’s IRB (IRB-24-229-ATRC), and the inclusion criteria were pilot certification, a current instrument rating, and experience in a Cessna 172 or similar aircraft. Participants were volunteers recruited at Oklahoma State University and the Stillwater, Oklahoma community area, and all potential participants were screened by a research team member before enrolling in the study. All five participants were male with ages ranging from 21 to 39 years (M = 27.60, SD = 9.10) and experience ranging from 253 and 1450 flight hours (M = 706, SD = 572). An auditory n-back task was used to manipulate workload throughout the experiment, with n = 1 corresponding to low workload and n = 2 for high workload. In an n-back task, temporal sequences of stimuli are presented, and participants must decide if the stimulus is the same as the stimuli they heard “n” steps ago. The flight simulator hardware can be seen in
Figure 1a, in addition to the experimental design (
Figure 1b).
A total of four 15 min flight task scenarios were flown per participant, creating a 2 (Automation on or off) × 2 (Workload high or low) within-subjects factorial design. A total of four 15 min flight task scenarios were flown per participant, creating a 2 (Automation on or off) × 2 (Workload high or low) within-subjects factorial design (Conditions described in
Figure 1b). This within-subjects factorial design was selected to maximize statistical power and control for inter-individual variability, which is particularly important in aviation human factors studies with small participant pools and high experimental costs. Within-subjects designs allow each pilot to serve as their own control, therefore reducing noise associated with between-subjects differences in physiology and flight experience [
33]. This structure also enables assessment of both main effects (e.g., workload and automation) and subject-level variability on physiological measures. However, within-subjects designs can be limited by potential learning, adaptation, or fatigue risks across repeated trials [
4]. To mitigate these risks, the order of flight conditions was counterbalanced, and rest breaks were provided between sessions. This factorial approach has been widely adopted in automation and workload research where simulator time and certified participants are limited [
38,
39].
Figure 2 displays the flight path for one participant. The outbound portion from KSWO RWY 17 to ACOKO did not include n-back activity. After completion of the turn-around maneuver, the n-back task was started and consequently concluded after landing back at KSWO. Boxcar functions were generated for automation status and workload level throughout the continuous physiological data recordings and used to segment the data into conditions based on the experimental context.
Physiological sensor hardware included an Advanced Brain Monitoring (ABM) B-Alert X10 EEG headset (Advanced Brain Monitoring Inc., Carlsbad, CA, USA) and an Equivital eq02+ Lifemonitor (Equivital Ltd., Cambridge, UK). The EEG headset collected raw brain activity at 256 Hz, which was cleaned using 50, 60, 100, and 120 Hz notch filters as well as a 0.05 Hz high-pass filter and median filter of order 56. Raw EEG signals were further processed using the headset’s proprietary artifact decontamination algorithm. After signal processing, probabilistic cognitive state estimates were generated at 1 Hz using the manufacturer’s classification algorithm [
40,
41,
42]. The Lifemonitor collected ECG signals at 256 Hz, which were used to calculate heart rate using R wave detection with a 30 s rolling average and were reported every 5 s [
43,
44]. Heart rate values were cleaned by removing extreme outliers less than 30 BPM or greater than 200 BPM and removing values outside of 3 standard deviations from an individual’s average heart rate [
43]. Respiration rate was collected using the Lifemonitor’s expansion sensor, which recorded values every 15 s.
The final dataset was up-sampled via linear interpolation into 1 s epochs, which included heart rate (HR; beats per minute), respiration rate (RR; breaths per minute) and workload brain state estimates (WL; percent probability). While ECG-based signals are more practical for real-time analysis in aviation contexts, an EEG-based metric was included alongside ECG-derived measures to address the initial objective of identifying the most promising physiological indicators. The ABM B-Alert X10 system was selected because it provides validated, probabilistic workload estimates derived from EEG signals through manufacturer-developed algorithms optimized for applied human-factors environments [
40,
41]. This allowed us to prioritize multimodal, synchronized data acquisition (EEG, ECG, respiration) rather than manual re-analysis of raw EEG features, ensuring consistent workload metrics and integration across physiological channels under the time and resource constraints typical of aviation research. In total, over 64 million database records of raw physiological time series data were collected from the sensors, and the final data frame used for analysis was a 19,571 × 3 matrix.
Bayesian Modeling Methodology
We analyzed the physiological data at two levels: the group (experimental condition) level and the participant level. This hierarchical structure is warranted as physiological responses exhibit substantial between-subjects heterogeneity, are sampled at high frequencies (small-N, large-T), and partial pooling allows for the borrowing of strength across participants and conditions to stabilize individual estimates while preserving person-specific effects. The hierarchy also yields calibrated uncertainty at both levels, which is essential for aviation studies with noisy, autocorrelated signals and limited sample sizes.
A univariate approach was used as the foundation for the overall Bayesian modeling, and the HR, RR, and WL data were subset to illustrate the univariate Bayesian approach for each variable of interest separately. A univariate modeling approach was selected to isolate and interpret the unique statistical properties and uncertainty structures of each physiological signal before integrating them into future multivariate frameworks. Given the small sample size, high-frequency data, and differing measurement scales and distributions across modalities, this approach enhances interpretability and model stability while minimizing confounding cross-signal noise that could obscure individual-level effects. Using heart rate as an example, it was assumed that the observed heart rate data is described in discrete observations. HR was sampled at 0.2 Hz throughout an experiment and is well approximated as a continuous variable that typically follows a bell-shaped distribution in healthy populations [
45]. The normal distribution often appears in physiological measurements due to the Central Limit Theorem, which states that the sum of many small, independent factors tends to form a normal distribution [
46]. Therefore, the heart rate data can be quantitatively described by a normal distribution where µ represents the sample mean HR and σ the sample standard deviation.
When applying a Bayesian approach, the goal is ultimately to obtain a posterior distribution of the parameter of interest [
47]. Although normality assumptions must be statistically verified in frequentist statistics, Bayesian statistical approaches do not require strict adherence to distributional assumptions such as normality for valid inference. This is because Bayesian inference is grounded in the likelihood function and the prior distribution, rather than relying on sampling distributions or asymptotic properties of estimators. As a result, the posterior distribution inherently reflects the observed data and the specified functional model, regardless of whether the data conforms to a standard normal distribution. This flexibility allows Bayesian models to accommodate skewed, heavy-tailed, or otherwise non-normal data, making them particularly well-suited for analyzing physiological time series.
In Bayesian statistics, conjugacy refers to a model where the prior and posterior distributions belong to the same family of probability distributions. For example, both the prior knowledge and observed likelihoods for heart rate data follow a normal distribution; therefore, the posterior is also a normal distribution due to conjugacy. The posterior probability is given by Bayes’ theorem, in which the posterior distribution is proportional to the prior distribution multiplied by the likelihood function of the data. Since both are normal, the product of the two normal densities is proportional to another normal density [
48]. This allows us to identify the posterior distribution that updates from the observed data by combining it with the prior distribution. More information about conjugacy derivations used in the work can be found in
Appendix A.1.
The Bayesian model was constructed hierarchically to account for both within- and between-subject variability in physiological responses arising from changes in primary and secondary task demands. This hierarchical framework supports robust inference at both group and individual levels, accommodates sparse or unbalanced datasets, and removes the need for manual transformation of physiological time series, making it especially well-suited for cognitive workload research in dynamic environments [
49]. More information about the hierarchical form and mathematics can be found in
Appendix A.2.
To visualize the results of a Bayesian approach to analyzing physiological data, 95% credible intervals were constructed by calculating a highest posterior density (HPD) interval that captures 95% of the posterior probability density function. These credible intervals were plotted alongside the posterior means over the observed physiological density curve for each participant and condition. To evaluate the fit of the model, several diagnostics were compared for each participant. Predictive performance is assessed with an 80/20 train-test split (Automation On/Workload Low scenario for comparability). We report the Coverage Probability Index (CPI), Mean Absolute Percentage Error (MAPE), and Concordance Correlation Coefficient (CCC) on held-out data. The CPI was estimated by calculating the probability that the observed heart rate values fall within the predicted 95% credible interval [
50]. The MAPE was calculated for the results of the overall prediction error [
51]. CCC was found for each set of predictions using the epiR package v2.0.83 in R Studio (R Foundation for Statistical Computing, Vienna, Austria) [
52].
The hierarchical Bayesian model was developed in R (Version 2023.12.1+402) that uses each physiological time series with associated contextual characteristics to model the pilot’s behavior throughout the experiment. HR, RR, and WL were modeled separately with the same hierarchical structure; differences lay only in their priors and measurement models (e.g., WL bounded in [0,1]). Prior knowledge from biomedical literature can inform the selection of appropriate prior distributions for physiological variables. For instance, studies have shown that heart rate in healthy adult males typically follows a normal distribution centered around 75 beats per minute (bpm) with a standard deviation of approximately 7.7 bpm [
53]. This prior information can be further refined using individual-specific data, such as baseline resting heart rates, to create more informative and personalized priors. However, to preserve generalizability and facilitate comparisons across individuals, noninformative (or weakly informative) priors were applied uniformly in this study.
To assess the robustness of the model to prior assumptions, a prior sensitivity analysis was conducted. This involved re-running the model with a range of alternative priors—varying in both informativeness and distributional form—to evaluate the impact on posterior estimates. The goal was to ensure that key inferences were driven by the data rather than overly influenced by the choice of prior. Results of this analysis supported the stability of model outcomes, indicating that the primary conclusions held consistently across different prior specifications. The probabilistic nature of ABM’s brain state estimates quantifies workload between 0 and 1; thus, a weakly uninformative prior belief was set to 0.50 probability with a standard deviation of 0.1. Therefore, the prior distributions used for each variable of interest were:
3. Results
The group-level results of this hierarchical Bayesian model for HR, RR, and WL brain state estimates can be seen in
Table 1. The hierarchical nature of the model allows for between-subjects comparisons in the means and standard deviations of each physiological variable. Participants exhibited the highest HR in the condition where Automation was on and workload was high (i.e., 2-back task). RR was the lowest in the Automation On/Workload High condition and highest in the Automation Off/Workload Low condition. WL estimates were higher while the n-back level was high, which reflects the efficacy of the secondary task working memory manipulation. In
Table 1, n indicates the number of observations for the experimental condition, which may vary due to the participant variability in completing the simulated flight.
The participant-level results for heart rate can be seen below in
Table 2. The number of per-trial observations across participants ranges from 623 to 1433. The posterior means and standard deviations are reasonable for each physiological variable. In general, participants 2 and 3 have a higher heart rate than average. Participant 2 also exhibits the highest respiration rate on average. WL estimates from participant 5 are observed to be higher than the other participants. These results highlight the vast individual differences in physiology present in the data collected.
The HR, RR, and WL estimates are shown visually with 95% credible intervals in
Figure 3. The participants’ physiological responses varied noticeably throughout the experiment. Some participants demonstrated substantial differences across conditions, for instance, participant 5’s WL estimates, while others showed more minimal changes, such as in participant 1’s HR across conditions. Overall, there was more variation present for all three physiological variables during the Automation Off/Workload High condition, as shown in the bottom left plot in each set of plots.
The diagnostic results of the Bayesian models for each participant can be seen in
Table 3 for HR, RR, and WL. Prediction diagnostics revealed clear differences in model accuracy across physiological measures. HR predictions were consistently accurate across participants, with low MAPE values (ranging from 8.15% to 10.1%) and strong agreement between predicted and observed values, as reflected in relatively high CCCs (0.766–0.904) and moderate to high CPIs (0.660–0.853). In contrast, RR predictions showed greater variability and generally poorer performance. MAPE values for RR were markedly higher (10.9% to 50.8%), and CCCs were notably lower (0.181–0.681), suggesting weaker concordance. CPIs for RR were also inconsistent, with values ranging from 0.363 to 0.853. WL predictions exhibited the highest overall MAPE (24.5% to 37.7%), yet CCCs ranged more favorably (0.610–0.853), indicating moderate agreement for some participants. CPI values for WL also varied widely (0.323–0.807), pointing to mixed coverage quality. These results suggest that predictive models performed best for HR, followed by WL, and were least reliable for RR.
To understand the implications of this Bayesian approach, the results of this model in the form of posterior estimates for mean and standard deviation can be visualized in conjunction with the observed physiological data collected throughout the experiment. Posterior density plots with corresponding 95% credible intervals (CIs) were generated for each participant for HR, RR, and WL to assess model calibration and predictive uncertainty.
Figure 4,
Figure 5 and
Figure 6 demonstrate the distribution of physiological data by participant. The colored bars on the plots correspond to the experimental conditions (Red represents the Automation Off/Workload Low condition, Green represents the Automation Off/Workload High condition, Orange represents Automation On/Workload Low condition, and Blue represents Automation On/Workload High condition).
With the complete posterior distribution over all experimental conditions, several pieces of information can be gathered from the model. For HR, observed values for all participants generally fell within the high-density regions of the posterior distributions and were encompassed by the 95% CIs, suggesting well-calibrated and reliable estimates. Participants 1 and 5 exhibited particularly tight HR posterior distributions with close alignment between predicted and observed values. In contrast, RR predictions showed greater variability across participants. Several posterior distributions—most notably for participants 3, 4, and 5—were skewed or multimodal, and observed values occasionally fell near the tails of the distributions, indicating increased uncertainty and reduced precision for RR. Workload predictions showed more consistent performance. The posterior distributions for WL were relatively symmetric and narrow, with observed values frequently aligning with the posterior modes for all participants. WL posterior credible intervals were more centered in the observed distributions, failing to capture densities on the tails of the workload estimate distributions. Overall, model performance appeared strongest for HR and WL, with greater variability and lower confidence in RR estimates—consistent with trends observed in the quantitative prediction diagnostics.
4. Discussion
A Bayesian approach was adopted to quantify and predict participants’ physiological states in a flight simulator study examining the effects of automation level and task workload. The model successfully captured variation in physiological responses across experimental conditions by leveraging contextual information to segment and interpret the continuous time series data. This enabled meaningful comparisons between levels of automation and cognitive demand. A hierarchical modeling structure facilitated both within- and between-subject analyses. At the group level, heart rate was consistently elevated under high workload conditions with automation. While most participants followed this trend, one outlier underscored the value of individualized modeling, highlighting how Bayesian methods accommodate variability without discarding data as “noise.”
Heart rate was the most reliable measure, followed by workload estimates from EEG, while respiration rate proved least consistent. As anticipated, EEG-based WL estimates were higher during the 2-back task compared to the 1-back task, confirming the effectiveness of the working memory manipulation. Additionally, respiration rates were generally lower when automation was disengaged, potentially reflecting the increased cognitive and physical demands associated with manual flight control. These findings highlight the promise of Bayesian approaches for interpreting small-sample, high-frequency physiological data in aviation.
A key advantage of this framework is its ability to accommodate individual variability without discarding it as statistical noise. For example, while most participants showed elevated heart rate under high workload with automation, one deviated from this trend: an effect that traditional averaging methods would obscure. By modeling posterior distributions rather than single-point estimates, the Bayesian approach offers nuanced, data-efficient predictions. For instance, participant 5 exhibited a distinct local heart rate maximum near 82 BPM under the Automation On/Workload High condition, yet their overall mean heart rate more closely aligned with the Automation Off/Workload High condition, which produced a global maximum near 75 BPM. These observations suggest the presence of semi-stationary elevated workload states that manifest differently across task contexts, with extended periods of physiologically distinct responses.
This individualized modeling approach enables the comparison of overall trends without compromising data integrity or inflating Type I error due to repeated measures. As additional data becomes available, the model naturally converges toward more precise estimates, improving both robustness and predictive accuracy. Notably, posterior estimates from one iteration of the model can be reused as informed priors in subsequent analyses, enabling longitudinal or within-subject modeling across multiple experimental sessions. Repeated sessions from a single participant could be analyzed using their personalized posterior distribution as a prior, yielding more accurate and context-specific predictions of physiological state. This iterative capability offers broad applicability across experimental designs and time-varying datasets, underscoring the flexibility and power of Bayesian inference in psycho-physiological research.
Collectively, these findings highlight the utility of Bayesian modeling in capturing individual-specific and context-dependent physiological patterns, offering a powerful tool for real-time assessment of cognitive workload in complex, dynamic environments. Bayesian approaches allow researchers to explicitly model uncertainty and tailor likelihoods to the characteristics of the data, improving robustness and ecological validity in real-world applications. Physiological signals are influenced by multiple intrinsic and extrinsic factors and often shift over time, making them non-stationary. Bayesian approaches are well suited for such data because they quantify uncertainty and adapt as evidence accumulates. This combination provides the repetition and granularity needed for highly confident, data-efficient predictions.
While recent advances in noninvasive physiological and neurological sensing have significantly improved our ability to observe the human state in real time, there remains a considerable gap in understanding the complex dynamics, interdependencies, and feedback mechanisms within these signals—particularly in the context of brain activity. Although physiological metrics such as heart rate, respiration, and EEG have been quantitatively linked to constructs like workload, fatigue, and engagement [
54,
55,
56,
57], the majority of existing work analyzes these signals in isolated snapshots. Very little research has focused on modeling physiological data longitudinally, limiting our understanding of how these metrics evolve over time and interact with cognitive processes. Without temporally sensitive models, researchers risk drawing incomplete or misleading conclusions from highly individualized, non-stationary data. Furthermore, while multimodal sensing approaches have shown promise for workload prediction in simulation and surgical settings [
58,
59], these methods typically rely on machine learning classifiers that require large training datasets and offer limited transparency.
Limitations and Future Work
While the present study demonstrates the utility of hierarchical Bayesian modeling for analyzing continuous physiological data in aviation contexts, several limitations should be acknowledged. First, the sample size (five instrument-rated pilots) limits the generalizability of the findings to broader pilot populations. However, small-n designs are common in aerospace human factors, and the hierarchical Bayesian approach partially mitigates this limitation by borrowing statistical strength across participants and conditions. Second, the simulated environment may not fully capture the sensory and contextual complexity of real-world flight operations. Physiological responses in actual flight could be influenced by additional environmental and emotional stressors not present in simulation. Future work should extend this framework to in-flight studies or higher-fidelity simulators.
Third, the study focused on univariate modeling of physiological signals analyzed independently. This approach was chosen because heart rate, respiration, and EEG-derived workload differ substantially in their scales, distributions, and noise characteristics, making it important to first isolate and interpret the unique statistical properties of each signal. Given the small-N but high-frequency nature of the dataset, univariate hierarchical Bayesian models offered greater stability and interpretability while reducing the risk of overfitting that could arise from a more complex multivariate framework. Prior work has successfully applied classifiers, including neural networks and multiresolution fusion frameworks, to multivariate physiological data for real-time emotion recognition and mental workload assessment [
60,
61]. However, these approaches often prioritize prediction accuracy over interpretability and lack mechanisms for uncertainty quantification or integration of prior knowledge—limitations that the Bayesian modeling framework used in the present study is designed to address. While the present work establishes this univariate foundation, future research should extend toward multivariate Bayesian structures or dynamic time-series models (e.g., Multivariate Normal-Wishart conjugacy) to jointly model physiological channels and further improve ecological validity.
These results carry important implications for aviation. Real-time Bayesian monitoring could inform adaptive automation, enabling systems to respond dynamically to operator state by adjusting task allocation or interface complexity. Unlike black-box machine learning classifiers, Bayesian models are lightweight, interpretable, and provide quantified uncertainty—qualities essential for aerospace applications where transparency and robustness are critical. Given its relatively non-invasive nature compared to EEG-based metrics, heart rate demonstrates strong potential for future modeling efforts in similar contexts. The current model provides discrete, condition-specific predictions for heart rate using a hierarchical Bayesian structure, yet its framework can be extended to incorporate time as an explicit variable, allowing for true time-series prediction rather than random posterior sampling on static conditions. Future work should also explore posterior updating using segmented time windows of varying lengths to evaluate signal stability, noise sensitivity, and time-dependent patterns in physiological state.
5. Conclusions
This study applied a hierarchical Bayesian modeling framework to physiological data collected from experienced pilots in a flight simulator—an environment that, by nature, presents challenges in terms of small sample sizes. Within the aviation research community, concerns about generalizability and statistical power are common, particularly when using traditional frequentist approaches that rely heavily on large samples and repeated trials. Bayesian statistics offer a compelling alternative: they naturally accommodate small-n, high-resolution datasets through the integration of prior knowledge and the probabilistic modeling of uncertainty. This makes Bayesian inference particularly well-suited for flight-based human factors research, where within-subjects data are rich but participant pools are often limited.
Among the physiological indicators analyzed, heart rate emerged as the most reliable and least intrusive predictor of workload, followed by EEG-derived metrics and respiration rate. These findings support the feasibility of Bayesian inference for real-time physiological monitoring and underscore its potential applications in adaptive flight deck systems that dynamically respond to pilot cognitive state. Compared to computationally intensive models such as neural networks or black-box classification algorithms, the Bayesian approach is lightweight, interpretable, and data-efficient, making it more feasible for real-time deployment in operational settings. Finally, by enabling continuous, individualized modeling of cognition and physiology, this method offers a more ecologically valid alternative to standardized questionnaires and discrete behavioral metrics. It represents a promising paradigm for advancing human-automation interaction and suggests that Bayesian inference may be the key analytical lens through which to understand and model the dynamic, complex nature of human physiological data in high-stakes environments.