Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research

Kyle, Ainsley; Rouser, Brock; Paul, Ryan C.; Jurewicz, Katherina A.

doi:10.3390/aerospace12111004

Open AccessArticle

Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research

¹

School of Industrial Engineering and Management, College of Engineering, Architecture, and Technology, 329 Engineering North, Oklahoma State University, Stillwater, OK 74078, USA

²

School of Mechanical and Aerospace Engineering, College of Engineering, Architecture, and Technology, 300 Engineering South, Oklahoma State University, Stillwater, OK 74078, USA

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(11), 1004; https://doi.org/10.3390/aerospace12111004

Submission received: 1 October 2025 / Revised: 30 October 2025 / Accepted: 3 November 2025 / Published: 11 November 2025

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

Monitoring pilot cognitive state in real time is becoming increasingly important as automation plays a larger role in aviation. Traditional workload assessments, such as questionnaires or task-based performance metrics, provide useful insights but can be limited in rapidly changing flight environments. Physiological measures, including heart rate, respiration, and electroencephalogram (EEG), offer continuous data streams, yet their variability and complexity present challenges for analysis. This study explores the use of a hierarchical Bayesian framework to quantify patterns from physiological signals recorded during high-fidelity flight simulations. Five certified pilots flew scenarios that varied in automation level and working memory demand while heart rate, respiration rate, and EEG-derived workload estimates were monitored. The model generated individualized and condition-specific estimates, quantified uncertainty, and remained stable with a small participant pool. Heart rate appeared to be the most consistent indicator, followed by EEG-derived workload, while respiration rate was less reliable across conditions. These results suggest that Bayesian inference may provide a promising way to interpret physiological data in aviation settings and could support the development of adaptive automation that responds to pilot workload. The approach emphasizes transparency and efficiency, offering complementary value to existing modeling techniques for aerospace human factors and flight deck applications.

Keywords:

Bayesian modeling; physiological measurements; human-automation interaction; aviation human factors; adaptive systems

1. Introduction

In modern technological systems, humans are exposed to more information than ever before, shifting system limitations from computational power to human capacity [1]. Research in human factors has shown that task-induced workload can substantially impair performance and cognitive resource allocation [2,3,4]. Automation is often introduced in these information-heavy systems to reduce workload, but ironically, it often ends up generating more complex technical challenges than the ones it was intended to solve [2,4,5,6]. With the shift from whether to how much automation to implement, careful consideration of human factors has become more crucial than ever for system designers [7].

Aviation exemplifies a domain where technological advancements have expanded the operator’s access to information, yet this abundance more often leads to information overload than to information dominance [8]. Automation has been implemented in 5th-gen aircraft that reduces information overload through information fusion and automated sensor management, which allows the pilot to focus on tactical decision-making. However, automating tasks previously performed by humans has been shown to have negative effects, such as reducing the situational awareness of the operator [9,10,11]. The implications of transitioning between autonomous and manual systems introduce the potential for degraded operator states, either in cognitive overload (e.g., fully manual operations) or cognitive underload (e.g., fully autonomous operations), and it is important to design systems such that human agents and automated agents can collaborate optimally to achieve decision making superiority [7,12]. Since neither full automation nor full manual control is universally practical, the path toward adaptive hybrid systems begins with understanding and quantifying the operator’s cognitive state.

Traditional methods of assessing cognitive workload often rely on subjective self-report tools such as the NASA-TLX [13], which capture an individual’s perceived workload during specific experimental periods. However, these active assessments are not only time-consuming and intrusive for participants but also pose challenges for experimental validity, particularly in studies with small sample sizes. Subjective measures are prone to bias, as workload ratings have been shown to increase non-linearly with actual cognitive demand [14]. Although some other performance-based measures have been used (e.g., reaction time, response time, task completion time, time to transition between operating modes or between tasks, etc.), these measures lack validity when performance is shared by the operator and automation.

Physiological measures such as heart rate, respiration, and brain activity provide continuous, objective insights into workload and arousal [15,16,17]. Unlike subjective ratings or task-based measures, they capture high-frequency dynamics but are noisy and individualized, posing statistical challenges. Physiological monitoring is a promising approach to gathering more data in real time for human factors studies, especially in specialized environments such as aviation, where access to qualified participants is limited and simulations are resource-intensive. Recent research has examined the relationships between physiological indicators and traditional subjective measures, such as correlations between NASA-TLX scores and heart rate variability. These studies have revealed complex, nonlinear patterns [18] and strong inter-correlations among subjective scales, yet often report weak or no associations with physiological measures [19], highlighting the nuanced challenges in aligning self-reported and physiological data.

Deterministic approaches often fall short in complex man-made systems, where intra- and inter-individual differences render behavioral data fundamentally statistical in nature [20,21]. Traditional frequentist approaches commonly assume that repeated observations across individuals follow the same distribution, often removing outliers to preserve population-level assumptions. However, in the context of physiological data where responses are highly individualized and inherently complex, this “one-size-fits-all” strategy loses power. What looks like noise at the population level may reflect a meaningful signal. To produce robust and reliable predictions, statistical methods must account for individual variability rather than obscure and smooth it away. Rather, statistical methods used to analyze physiological data should be tailored to the individual to ensure robust and reliable predictions. Bayesian statistics offer a compelling alternative, providing a more ecologically valid framework for analyzing physiological data.

Unlike frequentist approaches, which focus on point estimates and often overlook uncertainty, Bayesian methods explicitly model uncertainty and differ in conceptualization, calculations, and functionality from traditional statistical approaches [22]. By incorporating prior knowledge and updating beliefs as new data becomes available, Bayesian inference naturally accommodates both intra- and inter-subject variability [23]. Epistemic uncertainty is quantified and reduced through posterior distributions as more evidence accumulates [24]. This probabilistic foundation enables researchers to analyze complex, individualized, and often sparse psychophysiological datasets without overfitting or oversimplifying the underlying patterns. Recent publications in human factors have used Bayesian analyses for modeling driver response times during automated vehicle takeovers [25,26,27], the effects of distraction and vigilance [28,29], and human automation interaction [30,31,32]. Bayesian analyses of small-sample-size factorial designs have been explored in applications involving pilot incapacitation and flight trajectory predictions, and Bayesian methods were shown to increase the reliability and validity of results [33,34].

Recent studies highlight that pilot workload, fatigue, and situational awareness emerge from dynamic interactions among physiological, contextual, and operational factors rather than from any single measure. Physiological and subjective workload indicators in flight operations have been shown to diverge in nonlinear ways, and scheduling variables such as multi-day shifts and limited rest have been linked to measurable fatigue [18,35]. Undetected automation faults have been demonstrated to degrade situational awareness and decision-making, especially among less experienced pilots [36,37]. Collectively, these findings underscore that cognitive states fluctuate with context and time, motivating the need for probabilistic models to capture how physiological and performance patterns evolve under varying workload and automation conditions.

Despite growing interest in using physiology to assess cognitive states, most studies either treat psychophysiological signals as windowed features for classification or simple regression or apply Bayesian methods to discrete outcomes or small factorial designs. What is largely missing are probabilistic, time-sensitive models that operate directly on continuous multivariate physiological streams and capture both inter- and intra-subject variability. This gap is especially apparent in aviation contexts where sample sizes are small, performance is shared with automation, and subjective ratings are sparse and biased. Methods that exploit longitudinal, continuous time series per participant while borrowing strength across subjects are rare. As a result, current approaches provide limited individualized inference, weak uncertainty quantification, and little guidance for closing the loop between physiology and system design.

This study develops a hierarchical Bayesian model to analyze continuous physiological data from pilots in a flight simulator, addressing small-sample challenges and individual variability. By modeling heart rate, respiration, and EEG-derived workload, we evaluate the potential of Bayesian methods for real-time cognitive state estimation in aviation. Despite the conceptual alignment between Bayesian inference and biological signal variability, there is limited work in applying Bayesian techniques to continuous physiological time series data in human factors research. To address this gap, physiological data was collected from experienced pilots performing flight tasks under varying levels of automation and task difficulty within a high-fidelity flight simulator. A hierarchical Bayesian model was constructed to examine the psychophysiological correlates of these dynamic flight environments and to evaluate the utility of Bayesian methods in an applied small-sample aviation use case. This work explicitly models individual differences, reducing the need to discard “outliers” that may be meaningful signal. Ultimately, this work seeks to advance real-time physiological state estimation by evaluating continuous physiological variables well-suited for Bayesian modeling, and developing an adaptable, data-efficient framework capable of producing robust and generalizable predictions across shifting environmental demands, limited sample sizes, and varying cognitive states.

2. Materials and Methods

Five pilots participated in the study, approved by Oklahoma State University’s IRB (IRB-24-229-ATRC), and the inclusion criteria were pilot certification, a current instrument rating, and experience in a Cessna 172 or similar aircraft. Participants were volunteers recruited at Oklahoma State University and the Stillwater, Oklahoma community area, and all potential participants were screened by a research team member before enrolling in the study. All five participants were male with ages ranging from 21 to 39 years (M = 27.60, SD = 9.10) and experience ranging from 253 and 1450 flight hours (M = 706, SD = 572). An auditory n-back task was used to manipulate workload throughout the experiment, with n = 1 corresponding to low workload and n = 2 for high workload. In an n-back task, temporal sequences of stimuli are presented, and participants must decide if the stimulus is the same as the stimuli they heard “n” steps ago. The flight simulator hardware can be seen in Figure 1a, in addition to the experimental design (Figure 1b).

A total of four 15 min flight task scenarios were flown per participant, creating a 2 (Automation on or off) × 2 (Workload high or low) within-subjects factorial design. A total of four 15 min flight task scenarios were flown per participant, creating a 2 (Automation on or off) × 2 (Workload high or low) within-subjects factorial design (Conditions described in Figure 1b). This within-subjects factorial design was selected to maximize statistical power and control for inter-individual variability, which is particularly important in aviation human factors studies with small participant pools and high experimental costs. Within-subjects designs allow each pilot to serve as their own control, therefore reducing noise associated with between-subjects differences in physiology and flight experience [33]. This structure also enables assessment of both main effects (e.g., workload and automation) and subject-level variability on physiological measures. However, within-subjects designs can be limited by potential learning, adaptation, or fatigue risks across repeated trials [4]. To mitigate these risks, the order of flight conditions was counterbalanced, and rest breaks were provided between sessions. This factorial approach has been widely adopted in automation and workload research where simulator time and certified participants are limited [38,39].

Figure 2 displays the flight path for one participant. The outbound portion from KSWO RWY 17 to ACOKO did not include n-back activity. After completion of the turn-around maneuver, the n-back task was started and consequently concluded after landing back at KSWO. Boxcar functions were generated for automation status and workload level throughout the continuous physiological data recordings and used to segment the data into conditions based on the experimental context.

Physiological sensor hardware included an Advanced Brain Monitoring (ABM) B-Alert X10 EEG headset (Advanced Brain Monitoring Inc., Carlsbad, CA, USA) and an Equivital eq02+ Lifemonitor (Equivital Ltd., Cambridge, UK). The EEG headset collected raw brain activity at 256 Hz, which was cleaned using 50, 60, 100, and 120 Hz notch filters as well as a 0.05 Hz high-pass filter and median filter of order 56. Raw EEG signals were further processed using the headset’s proprietary artifact decontamination algorithm. After signal processing, probabilistic cognitive state estimates were generated at 1 Hz using the manufacturer’s classification algorithm [40,41,42]. The Lifemonitor collected ECG signals at 256 Hz, which were used to calculate heart rate using R wave detection with a 30 s rolling average and were reported every 5 s [43,44]. Heart rate values were cleaned by removing extreme outliers less than 30 BPM or greater than 200 BPM and removing values outside of 3 standard deviations from an individual’s average heart rate [43]. Respiration rate was collected using the Lifemonitor’s expansion sensor, which recorded values every 15 s.

The final dataset was up-sampled via linear interpolation into 1 s epochs, which included heart rate (HR; beats per minute), respiration rate (RR; breaths per minute) and workload brain state estimates (WL; percent probability). While ECG-based signals are more practical for real-time analysis in aviation contexts, an EEG-based metric was included alongside ECG-derived measures to address the initial objective of identifying the most promising physiological indicators. The ABM B-Alert X10 system was selected because it provides validated, probabilistic workload estimates derived from EEG signals through manufacturer-developed algorithms optimized for applied human-factors environments [40,41]. This allowed us to prioritize multimodal, synchronized data acquisition (EEG, ECG, respiration) rather than manual re-analysis of raw EEG features, ensuring consistent workload metrics and integration across physiological channels under the time and resource constraints typical of aviation research. In total, over 64 million database records of raw physiological time series data were collected from the sensors, and the final data frame used for analysis was a 19,571 × 3 matrix.

Bayesian Modeling Methodology

We analyzed the physiological data at two levels: the group (experimental condition) level and the participant level. This hierarchical structure is warranted as physiological responses exhibit substantial between-subjects heterogeneity, are sampled at high frequencies (small-N, large-T), and partial pooling allows for the borrowing of strength across participants and conditions to stabilize individual estimates while preserving person-specific effects. The hierarchy also yields calibrated uncertainty at both levels, which is essential for aviation studies with noisy, autocorrelated signals and limited sample sizes.

A univariate approach was used as the foundation for the overall Bayesian modeling, and the HR, RR, and WL data were subset to illustrate the univariate Bayesian approach for each variable of interest separately. A univariate modeling approach was selected to isolate and interpret the unique statistical properties and uncertainty structures of each physiological signal before integrating them into future multivariate frameworks. Given the small sample size, high-frequency data, and differing measurement scales and distributions across modalities, this approach enhances interpretability and model stability while minimizing confounding cross-signal noise that could obscure individual-level effects. Using heart rate as an example, it was assumed that the observed heart rate data is described in discrete observations. HR was sampled at 0.2 Hz throughout an experiment and is well approximated as a continuous variable that typically follows a bell-shaped distribution in healthy populations [45]. The normal distribution often appears in physiological measurements due to the Central Limit Theorem, which states that the sum of many small, independent factors tends to form a normal distribution [46]. Therefore, the heart rate data can be quantitatively described by a normal distribution where µ represents the sample mean HR and σ the sample standard deviation.

When applying a Bayesian approach, the goal is ultimately to obtain a posterior distribution of the parameter of interest [47]. Although normality assumptions must be statistically verified in frequentist statistics, Bayesian statistical approaches do not require strict adherence to distributional assumptions such as normality for valid inference. This is because Bayesian inference is grounded in the likelihood function and the prior distribution, rather than relying on sampling distributions or asymptotic properties of estimators. As a result, the posterior distribution inherently reflects the observed data and the specified functional model, regardless of whether the data conforms to a standard normal distribution. This flexibility allows Bayesian models to accommodate skewed, heavy-tailed, or otherwise non-normal data, making them particularly well-suited for analyzing physiological time series.

In Bayesian statistics, conjugacy refers to a model where the prior and posterior distributions belong to the same family of probability distributions. For example, both the prior knowledge and observed likelihoods for heart rate data follow a normal distribution; therefore, the posterior is also a normal distribution due to conjugacy. The posterior probability is given by Bayes’ theorem, in which the posterior distribution is proportional to the prior distribution multiplied by the likelihood function of the data. Since both are normal, the product of the two normal densities is proportional to another normal density [48]. This allows us to identify the posterior distribution that updates from the observed data by combining it with the prior distribution. More information about conjugacy derivations used in the work can be found in Appendix A.1.

The Bayesian model was constructed hierarchically to account for both within- and between-subject variability in physiological responses arising from changes in primary and secondary task demands. This hierarchical framework supports robust inference at both group and individual levels, accommodates sparse or unbalanced datasets, and removes the need for manual transformation of physiological time series, making it especially well-suited for cognitive workload research in dynamic environments [49]. More information about the hierarchical form and mathematics can be found in Appendix A.2.

To visualize the results of a Bayesian approach to analyzing physiological data, 95% credible intervals were constructed by calculating a highest posterior density (HPD) interval that captures 95% of the posterior probability density function. These credible intervals were plotted alongside the posterior means over the observed physiological density curve for each participant and condition. To evaluate the fit of the model, several diagnostics were compared for each participant. Predictive performance is assessed with an 80/20 train-test split (Automation On/Workload Low scenario for comparability). We report the Coverage Probability Index (CPI), Mean Absolute Percentage Error (MAPE), and Concordance Correlation Coefficient (CCC) on held-out data. The CPI was estimated by calculating the probability that the observed heart rate values fall within the predicted 95% credible interval [50]. The MAPE was calculated for the results of the overall prediction error [51]. CCC was found for each set of predictions using the epiR package v2.0.83 in R Studio (R Foundation for Statistical Computing, Vienna, Austria) [52].

The hierarchical Bayesian model was developed in R (Version 2023.12.1+402) that uses each physiological time series with associated contextual characteristics to model the pilot’s behavior throughout the experiment. HR, RR, and WL were modeled separately with the same hierarchical structure; differences lay only in their priors and measurement models (e.g., WL bounded in [0,1]). Prior knowledge from biomedical literature can inform the selection of appropriate prior distributions for physiological variables. For instance, studies have shown that heart rate in healthy adult males typically follows a normal distribution centered around 75 beats per minute (bpm) with a standard deviation of approximately 7.7 bpm [53]. This prior information can be further refined using individual-specific data, such as baseline resting heart rates, to create more informative and personalized priors. However, to preserve generalizability and facilitate comparisons across individuals, noninformative (or weakly informative) priors were applied uniformly in this study.

To assess the robustness of the model to prior assumptions, a prior sensitivity analysis was conducted. This involved re-running the model with a range of alternative priors—varying in both informativeness and distributional form—to evaluate the impact on posterior estimates. The goal was to ensure that key inferences were driven by the data rather than overly influenced by the choice of prior. Results of this analysis supported the stability of model outcomes, indicating that the primary conclusions held consistently across different prior specifications. The probabilistic nature of ABM’s brain state estimates quantifies workload between 0 and 1; thus, a weakly uninformative prior belief was set to 0.50 probability with a standard deviation of 0.1. Therefore, the prior distributions used for each variable of interest were:

μ_{H R} ~ Ν (75, {7.7}^{2})

μ_{R R} ~ Ν (15, 4^{2})

μ_{W L} ~ Ν (0.5, {0.1}^{2})

3. Results

The group-level results of this hierarchical Bayesian model for HR, RR, and WL brain state estimates can be seen in Table 1. The hierarchical nature of the model allows for between-subjects comparisons in the means and standard deviations of each physiological variable. Participants exhibited the highest HR in the condition where Automation was on and workload was high (i.e., 2-back task). RR was the lowest in the Automation On/Workload High condition and highest in the Automation Off/Workload Low condition. WL estimates were higher while the n-back level was high, which reflects the efficacy of the secondary task working memory manipulation. In Table 1, n indicates the number of observations for the experimental condition, which may vary due to the participant variability in completing the simulated flight.

The participant-level results for heart rate can be seen below in Table 2. The number of per-trial observations across participants ranges from 623 to 1433. The posterior means and standard deviations are reasonable for each physiological variable. In general, participants 2 and 3 have a higher heart rate than average. Participant 2 also exhibits the highest respiration rate on average. WL estimates from participant 5 are observed to be higher than the other participants. These results highlight the vast individual differences in physiology present in the data collected.

The HR, RR, and WL estimates are shown visually with 95% credible intervals in Figure 3. The participants’ physiological responses varied noticeably throughout the experiment. Some participants demonstrated substantial differences across conditions, for instance, participant 5’s WL estimates, while others showed more minimal changes, such as in participant 1’s HR across conditions. Overall, there was more variation present for all three physiological variables during the Automation Off/Workload High condition, as shown in the bottom left plot in each set of plots.

The diagnostic results of the Bayesian models for each participant can be seen in Table 3 for HR, RR, and WL. Prediction diagnostics revealed clear differences in model accuracy across physiological measures. HR predictions were consistently accurate across participants, with low MAPE values (ranging from 8.15% to 10.1%) and strong agreement between predicted and observed values, as reflected in relatively high CCCs (0.766–0.904) and moderate to high CPIs (0.660–0.853). In contrast, RR predictions showed greater variability and generally poorer performance. MAPE values for RR were markedly higher (10.9% to 50.8%), and CCCs were notably lower (0.181–0.681), suggesting weaker concordance. CPIs for RR were also inconsistent, with values ranging from 0.363 to 0.853. WL predictions exhibited the highest overall MAPE (24.5% to 37.7%), yet CCCs ranged more favorably (0.610–0.853), indicating moderate agreement for some participants. CPI values for WL also varied widely (0.323–0.807), pointing to mixed coverage quality. These results suggest that predictive models performed best for HR, followed by WL, and were least reliable for RR.

To understand the implications of this Bayesian approach, the results of this model in the form of posterior estimates for mean and standard deviation can be visualized in conjunction with the observed physiological data collected throughout the experiment. Posterior density plots with corresponding 95% credible intervals (CIs) were generated for each participant for HR, RR, and WL to assess model calibration and predictive uncertainty. Figure 4, Figure 5 and Figure 6 demonstrate the distribution of physiological data by participant. The colored bars on the plots correspond to the experimental conditions (Red represents the Automation Off/Workload Low condition, Green represents the Automation Off/Workload High condition, Orange represents Automation On/Workload Low condition, and Blue represents Automation On/Workload High condition).

With the complete posterior distribution over all experimental conditions, several pieces of information can be gathered from the model. For HR, observed values for all participants generally fell within the high-density regions of the posterior distributions and were encompassed by the 95% CIs, suggesting well-calibrated and reliable estimates. Participants 1 and 5 exhibited particularly tight HR posterior distributions with close alignment between predicted and observed values. In contrast, RR predictions showed greater variability across participants. Several posterior distributions—most notably for participants 3, 4, and 5—were skewed or multimodal, and observed values occasionally fell near the tails of the distributions, indicating increased uncertainty and reduced precision for RR. Workload predictions showed more consistent performance. The posterior distributions for WL were relatively symmetric and narrow, with observed values frequently aligning with the posterior modes for all participants. WL posterior credible intervals were more centered in the observed distributions, failing to capture densities on the tails of the workload estimate distributions. Overall, model performance appeared strongest for HR and WL, with greater variability and lower confidence in RR estimates—consistent with trends observed in the quantitative prediction diagnostics.

4. Discussion

A Bayesian approach was adopted to quantify and predict participants’ physiological states in a flight simulator study examining the effects of automation level and task workload. The model successfully captured variation in physiological responses across experimental conditions by leveraging contextual information to segment and interpret the continuous time series data. This enabled meaningful comparisons between levels of automation and cognitive demand. A hierarchical modeling structure facilitated both within- and between-subject analyses. At the group level, heart rate was consistently elevated under high workload conditions with automation. While most participants followed this trend, one outlier underscored the value of individualized modeling, highlighting how Bayesian methods accommodate variability without discarding data as “noise.”

Heart rate was the most reliable measure, followed by workload estimates from EEG, while respiration rate proved least consistent. As anticipated, EEG-based WL estimates were higher during the 2-back task compared to the 1-back task, confirming the effectiveness of the working memory manipulation. Additionally, respiration rates were generally lower when automation was disengaged, potentially reflecting the increased cognitive and physical demands associated with manual flight control. These findings highlight the promise of Bayesian approaches for interpreting small-sample, high-frequency physiological data in aviation.

A key advantage of this framework is its ability to accommodate individual variability without discarding it as statistical noise. For example, while most participants showed elevated heart rate under high workload with automation, one deviated from this trend: an effect that traditional averaging methods would obscure. By modeling posterior distributions rather than single-point estimates, the Bayesian approach offers nuanced, data-efficient predictions. For instance, participant 5 exhibited a distinct local heart rate maximum near 82 BPM under the Automation On/Workload High condition, yet their overall mean heart rate more closely aligned with the Automation Off/Workload High condition, which produced a global maximum near 75 BPM. These observations suggest the presence of semi-stationary elevated workload states that manifest differently across task contexts, with extended periods of physiologically distinct responses.

This individualized modeling approach enables the comparison of overall trends without compromising data integrity or inflating Type I error due to repeated measures. As additional data becomes available, the model naturally converges toward more precise estimates, improving both robustness and predictive accuracy. Notably, posterior estimates from one iteration of the model can be reused as informed priors in subsequent analyses, enabling longitudinal or within-subject modeling across multiple experimental sessions. Repeated sessions from a single participant could be analyzed using their personalized posterior distribution as a prior, yielding more accurate and context-specific predictions of physiological state. This iterative capability offers broad applicability across experimental designs and time-varying datasets, underscoring the flexibility and power of Bayesian inference in psycho-physiological research.

Collectively, these findings highlight the utility of Bayesian modeling in capturing individual-specific and context-dependent physiological patterns, offering a powerful tool for real-time assessment of cognitive workload in complex, dynamic environments. Bayesian approaches allow researchers to explicitly model uncertainty and tailor likelihoods to the characteristics of the data, improving robustness and ecological validity in real-world applications. Physiological signals are influenced by multiple intrinsic and extrinsic factors and often shift over time, making them non-stationary. Bayesian approaches are well suited for such data because they quantify uncertainty and adapt as evidence accumulates. This combination provides the repetition and granularity needed for highly confident, data-efficient predictions.

While recent advances in noninvasive physiological and neurological sensing have significantly improved our ability to observe the human state in real time, there remains a considerable gap in understanding the complex dynamics, interdependencies, and feedback mechanisms within these signals—particularly in the context of brain activity. Although physiological metrics such as heart rate, respiration, and EEG have been quantitatively linked to constructs like workload, fatigue, and engagement [54,55,56,57], the majority of existing work analyzes these signals in isolated snapshots. Very little research has focused on modeling physiological data longitudinally, limiting our understanding of how these metrics evolve over time and interact with cognitive processes. Without temporally sensitive models, researchers risk drawing incomplete or misleading conclusions from highly individualized, non-stationary data. Furthermore, while multimodal sensing approaches have shown promise for workload prediction in simulation and surgical settings [58,59], these methods typically rely on machine learning classifiers that require large training datasets and offer limited transparency.

Limitations and Future Work

While the present study demonstrates the utility of hierarchical Bayesian modeling for analyzing continuous physiological data in aviation contexts, several limitations should be acknowledged. First, the sample size (five instrument-rated pilots) limits the generalizability of the findings to broader pilot populations. However, small-n designs are common in aerospace human factors, and the hierarchical Bayesian approach partially mitigates this limitation by borrowing statistical strength across participants and conditions. Second, the simulated environment may not fully capture the sensory and contextual complexity of real-world flight operations. Physiological responses in actual flight could be influenced by additional environmental and emotional stressors not present in simulation. Future work should extend this framework to in-flight studies or higher-fidelity simulators.

Third, the study focused on univariate modeling of physiological signals analyzed independently. This approach was chosen because heart rate, respiration, and EEG-derived workload differ substantially in their scales, distributions, and noise characteristics, making it important to first isolate and interpret the unique statistical properties of each signal. Given the small-N but high-frequency nature of the dataset, univariate hierarchical Bayesian models offered greater stability and interpretability while reducing the risk of overfitting that could arise from a more complex multivariate framework. Prior work has successfully applied classifiers, including neural networks and multiresolution fusion frameworks, to multivariate physiological data for real-time emotion recognition and mental workload assessment [60,61]. However, these approaches often prioritize prediction accuracy over interpretability and lack mechanisms for uncertainty quantification or integration of prior knowledge—limitations that the Bayesian modeling framework used in the present study is designed to address. While the present work establishes this univariate foundation, future research should extend toward multivariate Bayesian structures or dynamic time-series models (e.g., Multivariate Normal-Wishart conjugacy) to jointly model physiological channels and further improve ecological validity.

These results carry important implications for aviation. Real-time Bayesian monitoring could inform adaptive automation, enabling systems to respond dynamically to operator state by adjusting task allocation or interface complexity. Unlike black-box machine learning classifiers, Bayesian models are lightweight, interpretable, and provide quantified uncertainty—qualities essential for aerospace applications where transparency and robustness are critical. Given its relatively non-invasive nature compared to EEG-based metrics, heart rate demonstrates strong potential for future modeling efforts in similar contexts. The current model provides discrete, condition-specific predictions for heart rate using a hierarchical Bayesian structure, yet its framework can be extended to incorporate time as an explicit variable, allowing for true time-series prediction rather than random posterior sampling on static conditions. Future work should also explore posterior updating using segmented time windows of varying lengths to evaluate signal stability, noise sensitivity, and time-dependent patterns in physiological state.

5. Conclusions

This study applied a hierarchical Bayesian modeling framework to physiological data collected from experienced pilots in a flight simulator—an environment that, by nature, presents challenges in terms of small sample sizes. Within the aviation research community, concerns about generalizability and statistical power are common, particularly when using traditional frequentist approaches that rely heavily on large samples and repeated trials. Bayesian statistics offer a compelling alternative: they naturally accommodate small-n, high-resolution datasets through the integration of prior knowledge and the probabilistic modeling of uncertainty. This makes Bayesian inference particularly well-suited for flight-based human factors research, where within-subjects data are rich but participant pools are often limited.

Among the physiological indicators analyzed, heart rate emerged as the most reliable and least intrusive predictor of workload, followed by EEG-derived metrics and respiration rate. These findings support the feasibility of Bayesian inference for real-time physiological monitoring and underscore its potential applications in adaptive flight deck systems that dynamically respond to pilot cognitive state. Compared to computationally intensive models such as neural networks or black-box classification algorithms, the Bayesian approach is lightweight, interpretable, and data-efficient, making it more feasible for real-time deployment in operational settings. Finally, by enabling continuous, individualized modeling of cognition and physiology, this method offers a more ecologically valid alternative to standardized questionnaires and discrete behavioral metrics. It represents a promising paradigm for advancing human-automation interaction and suggests that Bayesian inference may be the key analytical lens through which to understand and model the dynamic, complex nature of human physiological data in high-stakes environments.

Author Contributions

Conceptualization, A.K. and K.A.J.; methodology, A.K.; software, B.R.; validation, A.K., B.R., and K.A.J.; formal analysis, A.K.; investigation, A.K.; resources, R.C.P.; data curation, B.R.; writing—original draft preparation, A.K.; writing—review and editing, A.K. and K.A.J.; visualization, B.R.; supervision, K.A.J. and R.C.P.; project administration, R.C.P.; funding acquisition, K.A.J. and R.C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to privacy restrictions.

Conflicts of Interest

The authors declare no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

HR	Heart rate
RR	Respiration rate
HRV	Heart rate variability
EEG-WL	Electroencephalogram-based workload estimates

Appendix A

Appendix A.1. Normal–Normal Conjugacy Derivations

Consider a segment with n observations and sample mean

\bar{y} .

With known measurement variance

σ^{2}

and a Normal prior

μ ~ N (μ_{0}, σ_{0}^{2})

, the posterior distribution can be described as:

p (μ 丨 \bar{y}) \propto \exp (- \frac{{(\bar{y} - μ)}^{2}}{{2 (σ}^{2} / n)} - \frac{{(μ - μ_{0})}^{2}}{2 {σ_{0}}^{2}})

(A1)

To find the posterior distribution form, we expand the quadratic in μ and complete the square in Equation (A1) to expose the Normal kernel. This allows us to identify the posterior distribution that “learns” from the observed data

\bar{y}

by combining it with the prior belief µ, weighted by their respective precisions. Thus, the Normal–Normal conjugate update for the posterior mean and posterior variance is quantified, respectively, in Equations (A2) and (A3):

σ_{p o s t}^{2} = {(\frac{n}{σ^{2}} + \frac{1}{σ_{0}^{2}})}^{- 1}

(A2)

μ_{p o s t} = σ_{p o s t}^{2} (\frac{n \bar{y}}{σ^{2}} + \frac{μ_{0}}{σ_{0}^{2}})

(A3)

Appendix A.2. Hierarchical Pooling Methodology

Let

n_{p}

denote the number of samples for participant p and

n_{g}

the number of samples for condition group g, with

{\bar{y}}_{p}

and

{\bar{y}}_{g}

being their respective means. Each group-level mean is assigned a Normal prior and updated to a Normal posterior

μ_{g, p o s t}

, as described in the previous section. These group-level estimates are then used to inform individual participant-level posteriors, enabling the model to borrow strength across conditions and individuals. The explicit partial pooling equations can be found in Table A1.

A prior variance regularization parameter τ governs the relative influence of the prior group mean versus the observed data. Larger values of τ yield more conservative posteriors (i.e., less sensitive to new data), whereas smaller values allow the model to adapt more readily to incoming observations. As data accumulates—either within a participant (increasing

n_{p}

) or across participants (increasing

n_{g}

)—the influence of τ diminishes, allowing the posterior to become increasingly data-driven. Additionally, a hyperparameter δ is introduced to capture between-group variability, facilitating flexible modeling of condition-level effects.

Table A1. Formulas used for Group Level and Individual Level Posterior Estimates.

Parameter	Group Level Posterior Estimates	Individual Level Posterior Estimates
Posterior form	$μ_{g} \| {\bar{y}}_{p} = N (μ_{g, p o s t}, σ_{g, p o s t}^{2})$	$μ_{p} \| μ_{g} = N (μ_{p, p o s t}, σ_{p, p o s t}^{2})$
Mean	$μ_{g, p o s t} = \frac{\frac{μ_{0}}{σ_{0}^{2}} + \frac{n_{g} {\bar{y}}_{g}}{σ^{2}}}{\frac{1}{σ_{0}^{2}} + \frac{n_{g}}{σ^{2}}}$	$μ_{p, p o s t} = \frac{{τ μ}_{g} + \frac{n_{p} {\bar{y}}_{p}}{δ^{2}}}{τ + \frac{n_{p}}{δ^{2}}}$
Variance	$σ_{g, p o s t}^{2} = \frac{1}{\frac{1}{σ_{0}^{2}} + \frac{n_{g}}{σ^{2}}}$	$σ_{p, p o s t}^{2} = \frac{1}{τ + \frac{n_{p}}{δ^{2}}}$
Hyperparameter for between-group variability		$δ = \sqrt{\frac{\sum_{i = 1}^{n_{p}} {\frac{\sum_{j = 1}^{n_{g}} {(μ}_{j} - μ_{i})}{n_{g}}}^{2}}{n_{p}}}$
Hyperparameter for within-group variability		$τ = \frac{\sum_{i = 1}^{n_{p}} \frac{\sum_{j = 1}^{n_{g}} σ_{i, j}}{n_{g}}}{n_{p}}$

Notation key: p participant, g condition, t time index;

μ_{0}, σ_{0}^{2}

prior mean/variance;

σ_{p}^{2}

within-person variance;

σ_{g}^{2}

between-person variance.

References

Parnell, K.J.; Banks, V.A.; Allison, C.K.; Plant, K.L.; Beecroft, P.; Stanton, N.A. Designing flight deck applications: Combining insight from end-users and ergonomists. Appl Ergon. 2021, 95, 103450. [Google Scholar] [CrossRef]
Kahneman, D. Attention and Effort; Prentice-Hall: Englewood Cliffs, NJ, USA, 1973. [Google Scholar]
Hockey, G.R.J. Compensatory control in the regulation of human performance under stress and high workload: A cognitive-energetical framework. Biol. Psychol. 1997, 45, 73–93. [Google Scholar] [CrossRef]
Szalma, J.L.; Claypoole, V.L. Vigilance and workload in automated systems: Patterns of association, dissociation, and insensitivity. In Human Performance in Automated and Autonomous Systems; CRC Press: Boca Raton, FL, USA, 2019; pp. 41–65. [Google Scholar]
Bainbridge, L. Ironies of automation. Automatica 1983, 19, 775–779. [Google Scholar] [CrossRef]
Hancke, T. Ironies of Automation 4.0. IFAC-PapersOnLine 2020, 53, 17463–17468. [Google Scholar] [CrossRef]
Parasuraman, R.; Sheridan, T.B.; Wickens, C.D. A model for types and levels of human interaction with automation. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2000, 30, 286–297. [Google Scholar] [CrossRef] [PubMed]
Svoboda, A.; Boril, J.; Bauer, M.; Costa, P.C.G.; Blasch, E. Information overload in tactical aircraft. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC); IEEE: San Diego, CA, USA, 2019; pp. 1–5. [Google Scholar] [CrossRef]
Edwards, T.; Homola, J.; Mercer, J.; Claudatos, L. Multifactor interactions and the air traffic controller: The interaction of situation awareness and workload in association with automation. IFAC-PapersOnLine 2016, 49, 597–602. [Google Scholar] [CrossRef]
Endsley, M.R.; Kiris, E.O. The out-of-the-loop performance problem and level of control in automation. Hum. Factors 1995, 37, 381–394. [Google Scholar] [CrossRef]
Naranji, E.; Sarkani, S.; Mazzuchi, T. Reducing human/pilot errors in aviation using augmented cognition and automation systems in aircraft cockpit. AIS Trans. Hum.-Comput. Interact. 2015, 7, 71–96. [Google Scholar] [CrossRef]
Chen, J.Y.C.; Barnes, M.J.; Harper-Sciarini, M. Supervisory control of multiple robots: Human-performance issues and user-interface design. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2011, 41, 435–454. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in Psychology: Human Mental Workload; Hancock, P.A., Meshkati, N., Eds.; North-Holland: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. [Google Scholar] [CrossRef]
Estes, S. The workload curve: Subjective mental workload. Hum. Factors 2015, 57, 1174–1187. [Google Scholar] [CrossRef]
Dehais, F.; Lafont, A.; Roy, R.; Fairclough, S. A neuroergonomics approach to mental workload, engagement and human performance. Front. Neurosci. 2020, 14, 268. [Google Scholar] [CrossRef] [PubMed]
Orphanidou, C. A review of big data applications of physiological signal data. Biophys. Rev. 2019, 11, 83–87. [Google Scholar] [CrossRef]
Zhang, T.; Yang, J.; Liang, N.; Pitts, B.J.; Prakah-Asante, K.; Curry, R.; Duerstock, B.; Wachs, J.P.; Yu, D. Physiological measurements of situation awareness: A systematic review. Hum. Factors 2023, 65, 737–758. [Google Scholar] [CrossRef]
Alaimo, A.; Esposito, A.; Orlando, C.; Simoncini, A. Aircraft pilots workload analysis: Heart rate variability objective measures and NASA-Task Load Index subjective evaluation. Aerospace 2020, 7, 137. [Google Scholar] [CrossRef]
Mansikka, H.; Virtanen, K.; Harris, D. Comparison of NASA-TLX scale, modified Cooper-Harper scale and mean inter-beat interval as measures of pilot mental workload during simulated flight tasks. Ergonomics 2019, 62, 246–254. [Google Scholar] [CrossRef]
Adams, J.A.; Webber, C.E. Monte Carlo model of tracking behavior. Hum. Factors 1963, 5, 81–102. [Google Scholar] [CrossRef]
Johannsen, G.; Rouse, W.B. Mathematical concepts for modeling human behavior in complex man–machine systems. Hum. Factors 1979, 21, 733–747. [Google Scholar] [CrossRef]
Jurewicz, K.A.; Neyens, D.M. Bayesian approach to multimodal data in human factors engineering. In Multimodal and Tensor Data Analytics for Industrial Systems Improvement; Gaw, N., Pardalos, P.M., Gahrooei, M.R., Eds.; Springer: Cham, Switzerland, 2024; pp. 357–371. [Google Scholar] [CrossRef]
Hoff, P.D. A First Course in Bayesian Statistical Methods; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Cowles, M.K. Applied Bayesian Statistics: With R and OpenBUGS Examples; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Alambeigi, H.; McDonald, A.D. A Bayesian regression analysis of the effects of alert presence and scenario criticality on automated vehicle takeover performance. Hum. Factors 2023, 65, 288–305. [Google Scholar] [CrossRef] [PubMed]
Dinparast Djadid, A.; Lee, J.D.; Domeyer, J.; Schwarz, C.; Brown, T.L.; Gunaratne, P. Designing for the extremes: Modeling drivers’ response time to take back control from automation using Bayesian quantile regression. Hum. Factors 2021, 63, 519–530. [Google Scholar] [CrossRef]
Wei, R.; McDonald, A.D.; Mehta, R.K.; Garcia, A. Active inference models of AV takeovers: Relating model parameters to trust, situation awareness, and fatigue. Hum. Factors 2024, 66, 2889–2903. [Google Scholar] [CrossRef] [PubMed]
Biondi, F.N.; McDonnell, A.S.; Mahmoodzadeh, M.; Jajo, N.; Balasingam, B.; Strayer, D.L. Vigilance decrement during on-road partially automated driving across four systems. Hum. Factors 2024, 66, 2179–2190. [Google Scholar] [CrossRef]
Neyens, D.M.; Boyle, L.N.; Schultheis, M.T. The effects of driver distraction for individuals with traumatic brain injuries. Hum. Factors 2015, 57, 1472–1488. [Google Scholar] [CrossRef]
Boskemper, M.M.; Bartlett, M.L.; McCarley, J.S. Measuring the efficiency of automation-aided performance in a simulated baggage screening task. Hum. Factors 2022, 64, 945–961. [Google Scholar] [CrossRef]
Driggs, J.; Vangsness, L. Judgments of difficulty (JODs) while observing an automated system support the media equation and unique agent hypotheses. Hum. Factors 2025, 67, 347–366. [Google Scholar] [CrossRef]
Huang, J.; Choo, S.; Pugh, Z.H.; Nam, C.S. Evaluating effective connectivity of trust in human–automation interaction: A dynamic causal modeling (DCM) study. Hum. Factors 2022, 64, 1051–1069. [Google Scholar] [CrossRef]
Schmid, D.; Stanton, N.A. Exploring Bayesian analyses of a small-sample-size factorial design in human systems integration: The effects of pilot incapacitation. Hum.-Intell. Syst. Integr. 2019, 1, 71–88. [Google Scholar] [CrossRef]
Zhang, X.; Mahadevan, S. Bayesian neural networks for flight trajectory prediction and safety assessment. Decis. Support Syst. 2020, 131, 113246. [Google Scholar] [CrossRef]
Bartulović, D.; Steiner, S.; Fakleš, D.; Mavrin Jeličić, M. Correlations among fatigue indicators, subjective perception of fatigue, and workload settings in flight operations. Aerospace 2023, 10, 856. [Google Scholar] [CrossRef]
Lonca, Z.; Rzucidło, P. Investigation of the impact of an undetected instrument landing system failure on crew situational awareness. Aerospace 2025, 12, 845. [Google Scholar] [CrossRef]
Bartulović, D.; Steiner, S.; Fakleš, D.; Mavrin Jeličić, M. Simulating flight crew workload settings to mitigate fatigue risk in flight operations. Aerospace 2023, 10, 904. [Google Scholar] [CrossRef]
Stanton, N.A.; Li, W.C.; Harris, D. Editorial: Ergonomics and human factors in aviation. Ergonomics 2019, 62, 131–137. [Google Scholar] [CrossRef]
Alambeigi, H.; McDonald, A.D.; Manser, M.; Shipp, E.; Lenneman, J.; Pulver, E.M.; Christensen, S. Predicting driver errors during automated vehicle takeovers. Transp. Res. Rec. 2023, 2677, 410–420. [Google Scholar] [CrossRef]
Berka, C.; Levendowski, D.J.; Cvetinovic, M.M.; Petrovic, M.M.; Davis, G.; Lumicao, M.N.; Zivkovic Vladimir, T.; Popovic Miodrag, V.; Olmstead, R. Real-time analysis of EEG indexes of alertness, cognition, and memory acquired with a wireless EEG headset. Int. J. Hum.–Comput. Interact. 2004, 17, 151–170. [Google Scholar] [CrossRef]
Berka, C.; Johnson, R.; Whitmoyer, M.; Behneman, A.; Popovic, D.; Davis, G. Biomarkers for effects of fatigue and stress on performance: EEG, P300, and heart rate variability. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2008, 52, 192–196. [Google Scholar] [CrossRef]
Johnson, R.R.; Popovic, D.P.; Olmstead, R.E.; Stikic, M.; Levendowski, D.J.; Berka, C. Drowsiness/alertness algorithm development and validation using synchronized EEG and cognitive performance to individualize a generalized model. Biol Psychol. 2011, 87, 241–250. [Google Scholar] [CrossRef] [PubMed]
Akintola, A.A.; van de Pol, V.; Bimmel, D.; Maan, A.C.; van Heemst, D. Comparative analysis of the Equivital EQ02 Lifemonitor with Holter ambulatory ECG device for continuous measurement of ECG, heart rate, and heart rate variability: A validation study for precision and accuracy. Front. Physiol. 2016, 7, 391. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhu, S.H.; Wang, G.H.; Ye, F.; Li, P.Z. Validity and reliability of multiparameter physiological measurements recorded by the Equivital Lifemonitor during activities of various intensities. J. Occup. Environ. Hyg. 2013, 10, 78–85. [Google Scholar] [CrossRef] [PubMed]
Celka, P.; Vesin, J.M.; Vetter, R.; Grueter, R.; Thonet, G.; Pruvot, E. Parsimonious modeling of biomedical signals and systems: Applications to the cardiovascular system. In Nonlinear Biomedical Signal Processing; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2000; pp. 92–132. [Google Scholar] [CrossRef]
Dudley, R.M. Central limit theorems for empirical measures. Ann. Probab. 1978, 6, 899–929. [Google Scholar] [CrossRef]
Glickman, M.E.; van Dyk, D.A. Basic Bayesian methods. In Topics in Biostatistics; Ambrosius, W.T., Ed.; Humana Press: Totowa, NJ, USA, 2007; pp. 319–338. [Google Scholar] [CrossRef]
Bernardo, J.M.; Smith, A.F.M. Bayesian Theory; John Wiley & Sons: Chichester, UK, 2009. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC: New York, NY, USA, 2013. [Google Scholar]
Lin, L.; Hedayat, A.S.; Sinha, B.; Yang, M. Statistical methods in assessing agreement: Models, issues, and tools. J. Am. Stat. Assoc. 2002, 97, 257–270. [Google Scholar] [CrossRef]
Zhang, B.; Ren, H.; Huang, G.; Cheng, Y.; Hu, C. Predicting blood pressure from physiological index data using the SVR algorithm. BMC Bioinform. 2019, 20, 109. [Google Scholar] [CrossRef]
Stevenson, M.; Sergeant, E. epiR: Tools for the Analysis of Epidemiological Data, R package version 2.0.83; The R Project for Statistical Computing: Warsaw, Poland, 2025; Available online: https://CRAN.R-project.org/package=epiR (accessed on 9 March 2025).
Quer, G.; Gouda, P.; Galarnyk, M.; Topol, E.J.; Steinhubl, S.R. Inter- and intraindividual variability in daily resting heart rate and its associations with age, sex, sleep, BMI, and time of year: Retrospective, longitudinal cohort study of 92,457 adults. PLoS ONE 2020, 15, e0227709. [Google Scholar] [CrossRef] [PubMed]
Borghini, G.; Vecchiato, G.; Toppi, J.; Astolfi, L.; Maglione, A.; Isabella, R.; Caltagirone, C.; Kong, W.; Wei, D.; Zhou, Z.; et al. Assessment of mental fatigue during car driving by using high resolution EEG activity and neurophysiologic indices. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society; IEEE: San Diego, CA, USA, 2012; pp. 6442–6445. [Google Scholar] [CrossRef]
Gevins, A.; Smith, M.E. Neurophysiological measures of cognitive workload during human–computer interaction. Theor. Issues Ergon. Sci. 2003, 4, 113–131. [Google Scholar] [CrossRef]
Hopstaken, J.F.; van der Linden, D.; Bakker, A.B.; Kompier, M.A.J. The window of my eyes: Task disengagement and mental fatigue covary with pupil dynamics. Biol. Psychol. 2015, 110, 100–106. [Google Scholar] [CrossRef]
Lee, Y.H.; Liu, B.S. Inflight workload assessment: Comparison of subjective and physiological measurements. Aviat. Space Environ. Med. 2003, 74, 1078–1084. [Google Scholar]
Harrivel, A.R.; Stephens, C.L.; Milletich, R.J.; Heinich, C.M.; Last, M.C.; Napoli, N.J.; Abraham, N.; Prinzel, L.J.; Motter, M.A.; Pope, A.T. Prediction of cognitive states during flight simulation using multimodal psychophysiological sensing. In AIAA Information Systems—AIAA Infotech @ Aerospace; American Institute of Aeronautics and Astronautics: Grapevine, TX, USA, 2017. [Google Scholar] [CrossRef]
Zhou, T.; Cha, J.S.; Gonzalez, G.; Wachs, J.P.; Sundaram, C.P.; Yu, D. Multimodal physiological signals for workload prediction in robot-assisted surgery. J. Hum.-Robot Interact. 2020, 9, 1–26. [Google Scholar] [CrossRef]
Verma, G.K.; Tiwary, U.S. Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 2014, 102, 162–172. [Google Scholar] [CrossRef] [PubMed]
Wilson, G.F.; Russell, C.A. Real-time assessment of mental workload using psychophysiological measures and artificial neural networks. Hum. Factors 2003, 45, 635–643. [Google Scholar] [CrossRef]

Figure 1. (a) Experimental Apparatus and (b) Experimental Conditions.

Figure 2. Participant flight path.

Figure 3. Posterior credible intervals for HR (Top), RR (Middle), and WL (Bottom).

Figure 4. Stacked Density Plots with Posterior Estimates for HR. (a) Participant 1, (b) Participant 2, (c) Participant 3, (d) Participant 4, (e) Participant 5. Note: Heart Rate is measured in Beats per Minute (bpm). Solid lines represent estimated means and dashed lines represent estimated standard deviations.

Figure 5. Stacked Density Plots with Posterior Estimates for RR. (a) Participant 1, (b) Participant 2, (c) Participant 3, (d) Participant 4, (e) Participant 5. Note: Respiration Rate is measured in Breaths per Minute (b/m). Solid lines represent estimated means and dashed lines represent estimated standard deviations.

Figure 6. Stacked Density Plots with Posterior Estimates for WL. (a) Participant 1, (b) Participant 2, (c) Participant 3, (d) Participant 4, (e) Participant 5. Note: Workload Estimates are measured in percentages. Solid lines represent estimated means and dashed lines represent estimated standard deviations.

Table 1. Bayesian Modeling Group Level Results.

Automation Status	Workload	n	HR Posterior Mean	HR Posterior SD	RR Posterior Mean	RR Posterior SD	WL Posterior Mean	WL Posterior SD
Off	Low	4776	88.3	0.112	15.0	0.040	0.544	0.002
Off	High	4948	89.8	0.152	14.9	0.055	0.579	0.003
On	Low	4819	89.6	0.112	12.3	0.040	0.560	0.002
On	High	5027	92.1	0.109	12.8	0.039	0.588	0.002

Table 2. Summary of Bayesian posterior means and standard deviations for HR, RR, and WL participant-level results.

Participant	Automation Status	N-Back Level	n	HR Post Mean	HR Post SD	RR Post Mean	RR Post SD	WL Post Mean	WL Post SD
1	Off	Low	623	76.0	3.10	13.4	1.89	0.519	0.057
	Off	High	714	73.6	3.81	14.0	1.36	0.617	0.041
	On	Low	709	74.2	2.91	12.5	1.45	0.545	0.044
	On	High	756	77.5	2.81	13.8	1.49	0.561	0.045
2	Off	Low	1209	99.2	2.23	18.2	1.59	0.511	0.048
	Off	High	1433	118	3.11	18.1	2.32	0.532	0.070
	On	Low	1124	102	2.31	17.3	1.89	0.554	0.057
	On	High	1072	99.6	2.36	18.4	2.11	0.527	0.064
3	Off	Low	1066	102	2.37	12.1	1.94	0.524	0.059
	Off	High	942	92.4	3.47	14.8	2.21	0.572	0.067
	On	Low	1292	106	2.15	6.17	1.77	0.559	0.053
	On	High	1252	111	2.19	6.84	1.41	0.599	0.043
4	Off	Low	998	77.9	2.45	15.5	1.31	0.467	0.040
	Off	High	1007	80.3	3.18	14.8	1.52	0.474	0.046
	On	Low	958	77.9	2.50	14.5	1.74	0.466	0.053
	On	High	967	80.6	2.49	13.8	1.72	0.478	0.052
5	Off	Low	881	77.1	2.61	14.8	1.44	0.720	0.044
	Off	High	852	75.5	3.62	11.6	1.33	0.751	0.040
	On	Low	736	70.7	2.85	12.3	1.52	0.704	0.046
	On	High	980	81.6	2.47	12.5	1.51	0.770	0.046

Table 3. Bayesian Model Diagnostics.

	Participant	MAPE	CCC	CPI
HR	1	8.24	0.766	0.660
	2	8.15	0.897	0.853
	3	9.11	0.820	0.733
	4	9.25	0.904	0.806
	5	10.1	0.801	0.710
RR	1	28.1	0.354	0.707
	2	18.0	0.681	0.533
	3	10.9	0.181	0.853
	4	50.8	0.196	0.763
	5	28.4	0.651	0.363
WL	1	37.4	0.648	0.807
	2	24.5	0.610	0.323
	3	33.6	0.817	0.663
	4	37.7	0.853	0.567
	5	28.4	0.676	0.747

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kyle, A.; Rouser, B.; Paul, R.C.; Jurewicz, K.A. Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research. Aerospace 2025, 12, 1004. https://doi.org/10.3390/aerospace12111004

AMA Style

Kyle A, Rouser B, Paul RC, Jurewicz KA. Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research. Aerospace. 2025; 12(11):1004. https://doi.org/10.3390/aerospace12111004

Chicago/Turabian Style

Kyle, Ainsley, Brock Rouser, Ryan C. Paul, and Katherina A. Jurewicz. 2025. "Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research" Aerospace 12, no. 11: 1004. https://doi.org/10.3390/aerospace12111004

APA Style

Kyle, A., Rouser, B., Paul, R. C., & Jurewicz, K. A. (2025). Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research. Aerospace, 12(11), 1004. https://doi.org/10.3390/aerospace12111004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Bayesian Modeling for Physiological Data in Small-N Aviation Human Factors Research

Abstract

1. Introduction

2. Materials and Methods

Bayesian Modeling Methodology

3. Results

4. Discussion

Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Normal–Normal Conjugacy Derivations

Appendix A.2. Hierarchical Pooling Methodology

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI