Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review

Kyle, Ainsley R.; Rouser, Brock; Paul, Ryan C.; Jurewicz, Katherina A.

doi:10.3390/aerospace12070626

Open AccessReview

Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review

¹

School of Industrial Engineering and Management, College of Engineering, Architecture, and Technology, Oklahoma State University, 329 Engineering North, Stillwater, OK 74078, USA

²

School of Mechanical and Aerospace Engineering, College of Engineering, Architecture, and Technology, Oklahoma State University, 300 Engineering South, Stillwater, OK 74078, USA

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(7), 626; https://doi.org/10.3390/aerospace12070626

Submission received: 6 June 2025 / Revised: 9 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

Flight deck automation changes the nature of traditional piloting tasks, ultimately changing the cognitive requirements of the pilot. It is unclear how pilot performance should be measured as automation increases. The objective of this work is to understand the variability in experimental methodology regarding how pilot performance is measured since the introduction of flight deck automation. There were 90 articles included in this scoping literature review. Less than half of the articles investigated pilot performance (~40%), about half of the articles investigated mental workload (~45%), and almost 70% of the articles collected psychophysiological data; however, only 20% of the articles investigated human–automation interaction despite automation increasing in the flight deck. Design of resilient systems that support the needs of the pilot require consideration of human-system dynamics. As aircraft systems become more autonomous, performance metrics are increasingly derived from the human operator, reflecting a shift towards human-centered evaluation. Thus, it becomes more important to understand and model the relationship between performance, mental workload, and psychophysiological data when humans work with automation.

Keywords:

performance measurement; human–automation interaction; mental workload; physiological measurement

1. Introduction

Just 50 years ago, there was a five-person crew present in the cockpit of every civil airliner: two pilots, flight engineer, navigator, and radio operator [1]. Tasks were divided into several positions in the cockpit, with all crew simultaneously working to ensure safe flights [2]. As a result of technological innovations, the radio operator and navigator positions became less demanding, eventually leading to the elimination of dedicated crew positions to perform these functions. Continuing the crew reduction trend, the 1980s saw an adoption of two-crew cockpits without a flight engineer on the flight deck, requiring only the Captain and First Officer [2,3]. These reduced crew operations are compelling due to pilot shortages and economic savings, fostering an environment where exploring advancements in flight deck technology and automation is of utmost importance [4].

Manual control of an aircraft implies lateral-directional and longitudinal control through fine-motor inputs by the pilot with reference to raw data without cues from a flight director, or control reference following via an autopilot, auto-thrust, or other flight management systems [5]. To automate a task is to have a computer carry out certain functions that the human operator would normally perform [6]. Automation can differ in type and complexity from simply organizing the information to integrating information sources or to suggesting and even carrying out decision options. Automation can be present in the cockpit in multiple ways. Auto-flight systems are used to translate aircraft horizontal and vertical path guidance into a display interface for the pilot, and autopilot is a component that replaces the pilot’s manual control input to follow automated guidance [7]. There are also implementations of flight deck technology between automated and manual modes whereby a task is simplified by automation and the pilot operates in an outer loop while technology silently closes complex inner loops of the system. For example, automation has been implemented in 5th-generation aircraft to reduce information overload through information fusion and automated sensor management, which allow the pilot to focus on tactical decision-making [8,9,10].

With the evolution of mode technology, automation pushes pilots into a supervisory role rather than detaching them completely from control. The training mantra of aviate, navigate, and communicate has been extended to add manage systems as well [11]. Enhanced and increasingly precise information can be delivered to the pilot in a matter of milliseconds [12]. The increased complexity of technology with more pilot authority is introduced with the intent of increasing airspace capacity [13]. For example, free flight (or user-preferred routing) is supported by the inclusion of the Cockpit Display of Traffic Information (CDTI) in the cockpit that enhances the traditional Traffic Alert and Collision Avoidance System (TCAS) technology by providing pilots with an accurate picture of traffic [13]. In even newer highly automated systems, the delegation of flight roles and tasks may be more of a collaborative framework than the traditional manual control modes of the past [14].

By leveraging automation and remote assistance to maintain operational safety and efficiency, flight configurations are possible that involve fewer onboard crew members [15]—referred to as Reduced-Crewing Operations (RCO). A key example is Single-Pilot Operations (SPO), which reduce the traditional two-pilot configuration to one pilot for commercial aircraft [16]. This shift towards greater autonomy and distributed control mirrors developments in Unmanned Aircraft Systems (UAS), which operate without onboard pilots and increasingly rely on similar human–autonomy teaming principles [17]. RCO and SPO concepts, and their flight information structure with automation at its core, inevitably increase complexity and safety risks [18]. However, the historical reduction in flight deck crew from five to two in commercial aviation, combined with the rise in overall air traffic density, has coincided with a decrease in transport-category aircraft accident rates [19]. In part, the reduction in accident rates is attributed to modern aircraft worthiness standards, which require designers to continually improve the automation and reliability of aircraft systems. This trend indicates that the integration of automation has enhanced aviation safety, contrary to concerns about its potential negative impact [20].

The probability of aircraft equipment errors has dropped by about 70% over the past century; however, in the past 20 years, the proportion of total flight accidents due to human error factors is as high as 80% [21]. It is essential for flight safety that flight crew effectively divide their attention between flying tasks and system management in all flight phases to avoid accidents that may be attributed to human error. In situations where automation requires human intervention or decision-making, such as responding to alarms and assessing the urgency of interconnected factors in real time, it is critical for pilots to keep a high level of situation awareness (SA) to understand how their decisions can affect the ongoing safety of the flight [22]. SA relates to the perception of elements in an environment, comprehension of their meaning, and projection of their future status and is a critical cognitive construct whose breakdown is a leading contributor to human factors errors and compromised flight safety [23]. This is particularly important in high-stakes scenarios like tactical decision making and mission command, where quick and accurate judgment is needed to ensure safe outcomes [24].

Bainbridge (1983) identifies the ironies of automation that arise from humans not being suited for passive monitoring tasks. The irony is that by automating tasks, new difficulties are introduced that will require even more technical ingenuity to resolve [25]. Baxter et al. (2012) revisits these ironies after 30 years of technical innovations to find that indeed in the modern glass cockpit era, pilots’ cognitive resources and attention have been reallocated to supervisory tasks rather than reducing the complexity of their role in the cockpit. The more we depend on and delegate to technology, the more research we need to make resilient systems that act as a last line of defense when failures inevitably occur [11]. Hancke (2020) revisits Bainbridge’s seminal paper, finding that automation is leading to increasingly complex systems which raise the impact of the potential effects. A human-centered approach to human–automation interaction may help overcome these problems [26].

The automated flight deck has been associated with benefits in operational efficiency and safety, but it has also created “automation-surprise” accidents [27]. Introducing automation to a previously manual system leads to changes in cognitive workload of the operator, with the intent to reduce operator workload [28,29]. Cognitive workload refers to the mental effort required to perform a task and has long been associated with Human Factors research and safety-critical performance [30,31,32,33]. The NASA Task Load Index (NASA-TLX) is a widely used subjective assessment tool aimed at measuring an operator’s perceived workload across multiple dimensions [34]. It has been shown that performance can degrade when task demands exceed the brain’s capacity to process information; however, cockpit automation has sometimes reduced mental workload in phases of flight when workload was already low or increased workload when it was already high [33,35,36]. The perceived unpredictability of the machine’s behavior due to inconsistencies between the machine function and the pilot’s mental model can result in decreased safety [37].

The relationship between mental workload and SA is also well documented in the literature, with empirical studies finding a significant decrease in SA as automation takes over responsibilities previously handled by human operators [38,39,40]. A loss of SA can be caused by complacency and elimination of generating alternatives, which results in automation failures when the human who lacks higher awareness levels must intervene [36,41,42]. For example, loss of pilot awareness about the commanded autopilot descent modes is a reported cause in several Airbus A320 accidents [43]. The occurrences of accidents and incidents due to insufficient SA and mode confusion expose the need for human-centered design solutions [44]. The development of resilient aerospace systems necessitates consideration of how increased automation influences pilot roles, operational demands, cognitive resources, and decision-making.

Another unintended consequence of automation is that it can impose long periods of inactivity that induce degraded cognitive states of low vigilance and mind wandering [40]. Thus, there is a push to design automation that is adaptive to the individual and the context to keep the operator in an optimal state of vigilance and workload by adjusting the automation dynamically. This requires a continuous measurement of the pilot’s cognitive state, potentially through complex data collection methods such as real-time physiological monitoring. The idea of introducing physiological data into the human–machine interface could allow the system to be aware of operators’ states without eliciting a response [40,45]. Passive brain monitoring techniques have been shown to detect operator cognitive states like workload, fatigue, or engagement [46,47,48]. Previous research has used physiological measurements to indicate the presence of drowsiness and sleepiness in long-haul pilots, and the data highlighted the vigilance decrement with observed periods of mind wandering [49]. Continuous monitoring of the operator’s state may be useful in the development of closed-loop systems that can detect and adapt based on the potential for degraded performance.

As flight deck technology continues to evolve, the role of the pilot will evolve as well. In some cases, this is beneficial to safety and productivity by eliminating monotonous and repetitive tasks for the pilot. The military has identified potential benefits of automation in “Triple D” environments—meaning dull, dirty, or dangerous tasks [50]. In other cases, it pushes the human pilot into a role that can be extremely unsafe if the system is not designed to efficiently collaborate and share information with the operator [39]. In 2010, twenty-three civilians were killed in a UAS accident where a Predator drone failed to pass along critical information to a human operator [51]. Hazardous states of awareness are states or conditions where an operator’s performance is decreased to a potentially hazardous level. These states can be related to physical/physiological (i.e., fatigue) or cognitive/affective (i.e., anger) conditions; these conditions can be intrinsic or extrinsic in the environment [52]. In order to identify whether a pilot is operating in a hazardous state of awareness before an accident occurs, researchers need to understand how to mitigate the cognitive risks associated with passive monitoring roles in automated environments.

Traditionally, when evaluating a pilot’s performance it is common to measure the product of performance using metrics such as mean and standard deviation of error from an ideal trajectory, but significant pilot control input activity may not be reflected in changes of the modern aircraft’s altitude or flight pattern as automation assumes greater control over flight mechanics [53]. These traditional methods of evaluating flight performance may not be sufficient when investigating performance for a flight deck environment with increased automation and automation that is adaptive to the operator. Consequently, assessing only the outcome of flight is no longer sufficient, and future-ready systems must incorporate performance measurements that reflect the nuances of pilot engagement, workload, and readiness. As automation becomes more capable, system resilience increasingly depends on the pilot’s ability to monitor, adapt, and intervene appropriately.

The objective of this work is to perform a scoping literature review to understand the variability in how pilot performance is measured in increasingly automated flight deck environments. Performance measurement techniques are evaluated to identify ecologically valid and empirically sound metrics in modern flight decks. This scoping literature review seeks to identify gaps in existing research on measuring pilot performance in tasks involving human–automation interaction on the flight deck. A secondary aim of this work is to understand how physiological measurements can contribute to human–automation interaction. Ultimately, the goal of this review is to answer the following question: what research will aid in developing flight deck environments designed to support effective human–automation collaboration given the evolving system demands?

2. Materials and Methods

This work uses the Preferred Reporting Items for Systematic review and Meta-Analysis protocol [48] for data gathering, analysis, and reporting. PRISMA guidelines were followed to ensure transparency, reproducibility, and thoroughness for reporting in this systematic review [54,55]. Peer-reviewed journals published between 1st of January 1982 and the 10th of March 2024 were considered. These dates were chosen because Honeywell first introduced its flight management system (FMS) into service in 1982. The inclusion criteria used to select studies were works that evaluated the interactions of pilots in a real or simulated flight scenario, utilizing empirically collected data, and works that clearly stated the criteria used to evaluate performance. Works studying both fixed and rotary wing pilots were accepted, but UAS scenarios were excluded. To ensure the quality and peer-reviewed rigor of the literature included in this review, only journal articles were considered. Non-journal articles such as books, conference posters, dissertations, and technical reports were excluded due to their varied review standards, limited reproducibility, and often restricted methodological transparency.

To obtain the dataset, articles were searched for across multiple databases using specified keywords and the agreed upon inclusion criteria. The electronic databases searched were Science Direct, ProQuest, and IEEE Xplore. These databases were selected for their strong coverage of human factors, aviation psychology, and aerospace systems engineering. While limiting the databases searched may increase the risk of missing relevant studies, these databases were chosen for their balance of disciplinary relevance, indexing reliability, and access to high-impact peer-reviewed journals that play a central role in evaluating pilot performance in modern automated flight decks. The keywords selected helped ensure the articles were focused on pilot interactions and performance evaluation. All keywords were combined to create the following search query: (“Flight Deck” OR “Cockpit” OR “Aviation” OR “Plane”) AND (“Automation” OR “Autonomy” OR “Autopilot”) AND/OR (“Performance” OR “Pilot” OR “Investigate”).

After identifying the sample of articles, the screening process was performed by two researchers using Rayyan, a mobile and web-based tool for performing systematic reviews [56]. First, a title review was performed to screen the papers in accordance with the inclusion criteria. Next, two researchers independently reviewed each article’s abstract and reached consensus on inclusions and exclusions. Lastly, a full-text review was conducted to ensure that each article included met the inclusion criteria stated previously. This process described the review of the entire body of each article to justify its inclusion in the overall analysis. Data from each individual study was extracted including the constructs of interest in the work, the manipulations designed, the data collected, and the methods of evaluation used.

Risk of bias assessment was performed on the selected articles by the first author in accordance with the recommendation by the Cochrane Collaboration [57]. A Risk of Bias In Non-randomized Studied of Interventions (ROBINS-I) approach was taken, which provides a structured framework for evaluating potential bias across domains. A second reviewer was involved in the risk of bias assessment by performing a partial consensus approach in which a subset of papers were assessed and any ambiguities were discussed to mitigate potential interpretation bias. Bias for each selected article was ranked as “low”, “some concerns”, or “high” within each of the following domains: D1: bias arising from the randomization process, D2: bias due to deviations from intended intervention, D3: bias due to missing outcome data, D4: bias in measurement of the outcome, and D5: bias due to selection of reported result. Results from the assessments were visualized using the robvis visualization tool [58]. To maximize sensitivity of the risk assessment method within the specific sample of articles, researchers assumed that subjective measures of workload or situational awareness adequately assess the construct of interest, associated with a rating of “low” D4 measurement bias.

To extract quantitative data for analysis, the articles were primarily analyzed by isolating the categories of the performance measurement techniques (e.g., flight metrics, NASA-TLX) that were used to quantify pilot performance as well as the focus of the article (e.g., mental workload, fatigue, automation). The articles were then classified into categories based on how performance was measured and what research questions the work aimed to investigate. Each article was assigned by the researchers to as many of both performance measurement and hypothesis-driven focus classifications, and descriptive statistics was performed on the resulting data.

The results of the PRISMA approach for searching and selecting articles can be seen in Figure 1. The search resulted in a sample of 5680 articles, 332 of which were duplicates. A total of 1912 works were screened as non-peer reviewed journal articles (e.g., conference posters and book chapters), 3067 papers were excluded via title, and 277 by abstract. The researchers adopted a conservative screening approach where if there was any degree of uncertainty of inclusion, the articles were kept for full-text review. A total of 21 texts were screened by full-text review and removed, leaving 71 articles remaining in the sample. The references for the 71 included articles served as a fourth database and were additionally screened for inclusion. A further 19 articles were identified through the references of the included articles, and 90 studies were included as the final dataset. All papers are described in Table A1 in the Appendix A.

3. Results

A total of 90 articles were selected for analysis, of which 44 were published before 2020 and 46 were published in or after 2020. A summary of performance measurement techniques used in the final dataset is shown in Table 1. A summary of the general interests or manipulations of the experiments in the paper dataset is shown in Table 2. Figure 2 shows the frequency of use of each construct and measurement technique organized by the publication year of the article. The vertical line in Figure 2 signifies the division between construct of interest and measurement technique.

Figure 2 shows that physiological data started to become more commonly measured starting around 2015, with even more interest in 2019. Workload was a research construct of interest in 2002 but appeared to be studied less frequently in the following ten years. Additionally, workload was more of interest during 2019–2020 but was studied less frequently after 2020. NASA-TLX and flight parameters have exhibited increased use in recent years, as have physiological measurements like eye tracking, EEG, and ECG data. This may likely be attributed to increasing interest in attributing flight performance to pilots themselves rather than aircraft features such as handling characteristics. Additionally, with increasingly more complicated processing hardware and advanced modeling techniques, researchers have turned to using physiological data as a source of performance measurement. Eye tracking is shown to consistently garner more research, which is likely attributed to the interpretability of the data when considering pilot scan patterns as opposed to EEG/ECG, which are more abstract.

Figure 3 shows the results of the bias assessment for the selected articles. The visualization shows each article’s score in all five of the ROBINS-I bias categories and a summary for overall categorical scores. Overall, 17 works (18.89%) had a high risk of bias, 26 (28.89%) had some concerns regarding biases present, and 47 (52.22%) had a low risk of bias. The most frequent source of bias in the reviewed articles was D3 bias due to small sample sizes, and many of the studies reviewed had seven or fewer pilots participating. Combined with the results from the previous figure, future studies will need to address complications utilizing physiological data when small sample sizes are used. Research focused on predictive capabilities of physiological outcomes necessitates more rigorous demonstration of generalizability.

3.1. Quantifying Pilot Performance and Workload

Traditional flight performance metrics (e.g., RMSE of glide slope deviations) or other context-related aviation measures were used to quantify performance in 38.88% (35/90) of the articles sampled. Of those works, 11 (31.14%) also collected NASA-TLX ratings and 8 (22.86%) also collected research-specific subjective assessments. A total of 25.71% (9/35) of the articles that utilized flight performance metrics sought to evaluate human–automation interaction. Furthermore, 42.86% (15/35) investigated workload related research questions, 14.29% (5/35) evaluated the effects of expertise, 11.43% (4/35) studied attention allocation, and 8.57% (3/35) evaluated the effects of fatigue. Of those works measuring flight performance, 14.29% (5/35) also used secondary task performance measures (e.g., Alarm responses). A total of 19 articles (54.29%) collecting flight performance metrics also measured some modality of psychophysiological data.

Of the works sampled, 40 (44.44%) studied an intervention or analysis of the mental workload of pilots in the cockpit. A total of 8 articles (22.86%) collected secondary task performance to quantify workload and 11 articles (31.43%) collected a research specific active assessment. Of those articles studying workload, 18 (40.91%) administered the NASA-TLX subjective workload rating scale to quantify the perceived mental workload of pilots throughout an experiment, 14 (35%) utilized both NASA-TLX and physiological data, and 32 (80%) of the articles used physiological data to passively monitor the workload state of each pilot. Electrocardiogram (ECG) was the most frequently used physiological device for studying workload, followed by electroencephalogram (EEG; 32.5%; 13/40) and eye movement tracking (31.43%; 11/35). Of the works that used psychophysiological data analysis to investigate pilot workload, 8 articles (25%) applied an advanced statistical or neural network approach for analyzing the physiological data.

Of the articles sampled, 70% (63/90) collected psychophysiological data, and of those, only 19 (30.16%) collected more than one physiological variable of interest. Only one work investigated the use of multimodal physiological data (e.g., EDA and ECG) alongside flight performance and subjective assessments for the use of probing adaptive automation during flight [52]. A total of four works collected psychophysiological data during an experiment aimed at investigating characteristics of flight deck automation [23,38,52,53]. A total of 32 (50.79%) of the articles collecting physiological data were studying workload, 13 (20.63%) aimed to understand or model pilot attention, 7 (11.11%) studied the effects of fatigue, 7 (11.11%) studied the effects of expertise, and 1 (1.59%) investigated single-pilot operations.

Of the psychophysiological data collected, eye movement behavior were the most frequent (52.38%; 33/63), followed by brain activity (EEG; 36.51%; 23/63), heart rate activity (ECG; 36.51%; 23/63), electrodermal activity (EDA; 11.11%; 7/63), respiration rate (9.52%; 6/63), and frontal near infrared spectroscopy (fNIRS; 4.76%; 3/63). Two studies collected four physiological variables, eight collected three physiological variables, and nine collected two physiological variables. Eye movements and EEG were frequently collected together (36.84%; 7/19), as were EEG and ECG data (36.84%; 7/19). Of the works that measured ECG, 16 (69.57%) also collected another physiological variable and 8 (34.78%) of the works that collected EEG analyzed more than one. A total of 20.63% (13/63) of the works that collected physiological data used some kind of advanced statistical technique to model or classify the resulting physiological signatures.

3.2. Evaluating Human–Automation Interactions

In total, 17 articles (18.89%) examined the effects of automation or the results of a pilot interacting with some kind of automation. The majority of these (52.94%; 9/17) used flight performance metrics to evaluate the pilot’s performance. Traditional flight performance measures were most frequently paired with NASA-TLX ratings (44.44%; 4/9) and alarm response metrics (33.33%; 3/9). Overall, seven works (41.18%; 7/17) used research specific active assessments to measure human–automation interactions, seven (41.18%; 7/17) works used either NASA-TLX or a situational awareness scale (e.g., SART), five articles (29.41%; 5/17) used alarm detection based metrics, and five articles (29.41%; 5/17) measured physiological data.

One work examined the effects of pilot expertise and fatigue in a glass cockpit environment [121]. Attention was most frequently studied alongside automation (23.53%; 4/17). Three works studied the interactions between automation and workload on the flight deck [44,122,123]. One work investigated automation in an SPO context [2] and one studied the effects of flight deck automation on decision making [124].

Some of the research questions posed revolved around the characteristics of human–automation interaction on the flight deck. The effects of automation interface and context features on situational awareness was studied, and it was found that the impact of automation on SA is moderated by task features [124]. Error detection in autopilot mode selection [43], human–automation teaming after an anomaly [125], the effects of automated decision aids on performance and workload [122], and automation reliability [126] were all investigated with context-specific performance measures that corresponded to the research question of the article.

4. Discussion

The objective of this work was to perform a scoping literature review in existing research on measuring pilot performance in the context of modern flight technology. The articles in this review utilized a range of performance measurement techniques in addition to conventional flight path deviation metrics, such as secondary task performance or alarm responses, subjective questionnaires, behavioral analysis, and psychophysiological measurements. Overall, the highest risk of bias in the selected articles was risk of bias due to missing outcome data. Studies involving piloting or aviation tasks have notoriously small sample sizes, which can affect the statistical power of results. Additionally, the articles in this sample are extremely skewed towards male participants. There are notable differences in physiology due to gender [127], and future works should prioritize balanced recruitment strategies to counterbalance the effects of potentially moderating factors such as gender.

A total of 35 out of the 90 articles evaluated used flight performance metrics to quantify the performance of the pilot, although 9 of those articles aimed to investigate the effects of automation. Less than 20% of the articles in this review sought to further understand human–automation interaction despite increasing levels of automation in the modern flight deck environment. A total of 70% of the works reviewed collected some kind of physiological data and 19 articles collected more than one channel of physiological data. Less than half of the articles sampled investigated mental workload (~45%) and the majority of these (80%) used physiological data to analyze the pilot’s cognitive processes.

Measures of mental workload can be categorized as performance-based, linked to subjective self-assessments, or associated with neurophysiology. Many of the works sampled used simplified levels of mental workload (e.g., low and high) to achieve the experimental manipulations required to draw conclusions about within-subjects factors. However, workload is a complex, multifaceted construct and varies throughout a dynamic task. There is a need to develop more robust and configurable explanatory models of mental workload that can be operationalized in empirical studies. Theoretical perspectives on mental workload should be formalized and synthesized into a working model in future research efforts. Dehais (2020) applies a neuroergonomic approach rooted in biology to understanding mental workload that includes two axes (engagement and arousal), and while the ideal metrics for engagement and arousal are still unknown, this work is a well-formulated and biologically sound step towards using physiological data for empirically quantifying the limitations of human cognition [128]. However, the analysis of physiological data still involves many technical and interpretive difficulties that are yet to be solved.

Of the articles aimed at investigating automation, 70% did not use complex datasets, and only five articles studying automation collected physiological data. Eye movements were used to predict fatigue and expertise levels in a simulated glass cockpit [121], investigate the effects of automation reliability on mind wandering [126], and evaluate monitoring strategies on automated flight decks [27]. Automated features such as an angle of attack display [122] and a tactical mission management system [37] were evaluated in relation to the pilot’s workload, however, in these works, flight performance metrics were used to quantify performance and workload was measured with NASA-TLX. An aircraft-to-aircraft conflict decision aid was evaluated with handoff performance, secondary task performance, and NASA-TLX [35]. Although the research questions of interest in these works are valuable to the research community, the methods for assessing pilot performance must adapt to systems characterized by higher levels of automation. Flight performance indicators and point estimate workload ratings may not provide the sensitivity needed to reflect the pilot’s real-time condition in advanced cockpit environments.

Dong et al. (2023) used AI-based bidirectional long short-term memory (BiLSTM) with flight operations and behavioral data to model the pilot’s intention during SPO [18]. Similarly, H. Wang et al. (2022) used extended symbol aggregation approximation theory and intelligent icon method to extract the pilot’s intention from physiological data [129]. Intention is thought to be a necessary part of future human–automation collaboration, where the automated partner is able to determine the implicit intentions of the single pilot through their time-series behavior [18]. Both studies demonstrate that inattention can be quantified through a variety of means but neglect inattention when multiple crew members are conducting in-flight operations. It is also believed that the intention recognition process directly drives the generation of situational awareness, which can be degraded due to the shift in pilots’ roles towards supervision.

Overall, psychophysiological data were collected by the majority of the studies sampled, at almost 70% of the articles. The results of this literature review show that psychophysiological data are capable of capturing of the effects of training [130], flight complexity [131,132], anxiety and attention [133], fatigue [121,134], situational awareness [135,136], time on task effects [137,138], and mental workload [139,140]. However, there is a need to apply more robust modeling and analysis techniques to physiological data. Tracing cognitive processing must be carried out implicitly by modeling the derivatives of physiological features, as written in a study utilizing EEG in a passive-BCI system to model pilots’ situational assessments [141].

A total of 13 works applied complex data analysis techniques to physiological data collected in a flight environment, and future research should continue to explore the methods available for physiological data analysis. Several machine learning techniques such as a windowed means classifier [141], local and global network [59], support vector machine (SVM) algorithm [60], and multimodal deep learning network [61] were used to classify pilots’ cognitive states based on brain activity. Dehais et al. (2019) [139] found that workload could be classified with frequency features but not event-related potentials (ERPs).

Temporal fluctuations in the theta frequency band (Hz) derived from Fourier transformations on raw EEG data were found to be significant predictors of reaction time [62], shown to be closely related to mental workload [63,64] and task difficulty [65], and were used by Gorji et al. (2023) to train a machine learning algorithm to classify the mental workload state of a pilot during real flight [66]. Although frontal theta spectral power density is thought to be significantly related to workload, these works all used different data pre-processing pipelines, EEG headset technologies, and experimental environments. Further research is needed to identify EEG metrics with high internal and external experimental validity as cognitive correlates of workload.

As technologies become more advanced, sensitive, and accessible for research, the results of physiological research may become more tangible and applicable to theoretical models of cognitive constructs like mental workload and situational awareness. The ability to measure a human’s cognitive processing and functioning in applied settings is becoming increasingly feasible. Similarly, as technology evolves on the flight deck and traditionally manual tasks are now allocated to automation, there is a large need to investigate the implications of supervisory control on the performance of human operators. Empirical studies have measured the pilot’s return to manual performance after automation fails [1,67,68]. However, in these safety-critical systems, failure is likely rare yet potentially catastrophic. Studying the response to unexpected automation failures is difficult, expensive, and time-consuming as the buildup of expectancies must be mitigated by the rareness of failure events [35]. Understanding human–automation teaming in routine operations is essential for designing resilient, redundant systems that minimize the risk of failure.

5. Conclusions

The progressive “de-crewing” trend over the past half-century in aviation systems has resulted in more automation on the flight deck that changes the task demands for pilots. The works sampled in this systematic literature review reflect the research community’s shift in defining good performance by only considering deviations from the ideal flight path to further including factors from inside the cockpit. Performance is increasingly being measured from the pilot rather than the plane, whether by behavioral, subjective, or physiological assessments.

The results of this scoping literature review highlight the need for future work in the flight deck environment to investigate the combination of mental workload, pilot performance, and human–automation interaction using complex data sets such as psychophysiological data. Future studies should examine how pilot-centered metrics like control behavior and physiological data can enhance performance assessment in defined flight contexts. A key limitation is that much of the research on psychophysiological measurements has been conducted in laboratory environments or low-fidelity task simulations. Yet, the utility and interpretation of these measures may differ substantially when examined under ecologically valid operational conditions.

Many of the difficulties in operationalizing physiological measures lie in the complexity of the datasets and minimal modeling techniques available for complex physiological data. Physiological data are frequently multivariate, longitudinal, not independent, and/or identically distributed. These features contribute to violations of assumptions of normality, linearity, and independence when utilizing traditional regression techniques. The methods with which statisticians and data analysts can model physiological data are limited. Some works have used machine learning approaches, but these works frequently aim to classify the operator’s physiology into simplified contextual categories (e.g., high and low mental workload, flight phase, etc.). Much work is needed to understand how researchers can model the complexity and uncertainty of physiological data in meaningful ways or applications. Continuous tracking of performance and workload enables future studies to more accurately investigate human–automation interaction in contexts where pilots are either fully engaged in demanding tasks or placed in supervisory roles by advanced flight deck technologies.

Advanced, reliable automation is reshaping the tasks required of human operators. Whether in commercial aviation, military fighter jets, Air Traffic Control, or UAS, interaction with flight deck automation is widespread and increasingly common. To be prepared to design systems that have features like dynamic task delegation, bump-less hand offs, and emergency event prevention, we must first understand and accurately quantify the cognitive performance of the pilot. The automation or agent must be aware of the pilot or operator’s state to behave reliably in the closed-loop system. Studying failure responses of unexpected events is difficult, expensive, and time-consuming, so we must identify the methods available for use during routine exercises as well. By passively tracing what is perceived and processed, researchers can measure outcome variables without requiring any actions, inputs, or behaviors from pilots that are busy managing multiple tasks.

To ensure resilient and pilot-centered flight decks, automation must evolve dynamically to recognize and respond to real-time cognitive states. Consequently, performance measurement techniques must evolve to meet the demands of an increasingly complex flight deck.

Author Contributions

Conceptualization, A.R.K., K.A.J., B.R. and R.C.P.; methodology, A.R.K. and K.A.J.; validation, A.R.K. and B.R.; formal analysis, A.R.K. and B.R.; investigation, A.R.K. and B.R.; data curation, A.R.K. and B.R.; writing—original draft preparation, A.R.K.; writing—review and editing, B.R., K.A.J. and R.C.P. visualization, A.R.K. and B.R.; supervision, K.A.J. and R.C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets and scripts used in this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to extend gratitude to the members of the Human-Systems Engineering and Applied Statistics (HSEAS) Lab for their help throughout the literature review process.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ECG	Heart rate/electrocardiogram
EEG	Electroencephalogram
EDA	Electrodermal activity
fNIRS	Frontal near infrared spectroscopy

Appendix A

Table A1. All papers included in the literature review.

#	Authors	Research Focus	Performance Measures
1	(Massé et al., 2022) [137]	P, WL, F	1, 8
2	(Sarter et al., 2007) [27]	P, Au, A	6, 7, 11
3	(Alreshidi et al., 2023) [69]	P, A	2, 8
4	(Klaproth et al., 2020) [141]	P	1, 2, 8
5	(Schulte, 2002) [37]	Au	3, 4, 5
6	(Lefrançois et al., 2021) [70]	P	5, 6
7	(Maik et al., 2021) [130]	P, WL	3, 5, 6
8	(Yang et al., 2023) [21]	P, A	5, 6, 7
9	(Brams et al., 2018) [71]	P, E	1, 6
10	(Metzger & Parasuraman, 2005) [35]	Au	1, 3, 5
11	(Schmid & Stanton, 2019) [72]	Au, WL, S	2, 3
12	(Xing et al., 2023) [73]	P, WL, A	1, 3, 6, 7, 9
13	(Wickens et al., 2002) [13]	WL	5, 6
14	(Gateau et al., 2018) [74]	WL	2, 5, 10
15	(Mosier et al., 2013) [124]	Au, A, D	11
16	(Binias et al., 2023) [62]	P, A	1, 8
17	(Zanoni et al., 2023) [75]	P, WL	5, 12
18	(Verdière et al., 2018) [40]	Au	2, 10
19	(Gouraud et al., 2018) [126]	Au, A	1, 4, 6, 11
20	(Jankovics & Kale, 2019) [76]	P, MP, WL	5, 6, 7, 9
21	(Causse et al., 2011) [77]	D	1, 9
22	(Ma et al., 2022) [78]	P, A	6
23	(Fellah et al., 2016) [79]	C	5, 12
24	(Anneke & Nils, 2022) [65]	P, MP, WL, F	1, 3, 8, 10, 11
25	(Lassalle et al., 2017) [80]	P, MP, WL	6, 9, 13, 14
26	(Mohanavelu et al., 2020) [81]	P, WL	1, 3, 5, 9
27	(H. Sun et al., 2023) [82]	E	5, 7, 11
28	(Silva et al., 2021) [83]	P, MP, WL	5, 9, 11, 13
29	(Takahashi et al., 2022) [84]	Au	5
30	(Duchevet et al., 2022) [2]	P, Au, E, S	5, 6, 7, 11
31	(W.-C. Li et al., 2020) [44]	P, WL, Au, A	3, 6, 11
32	(Wei et al., 2014) [85]	P, WL	1, 2, 3, 9
33	(Haarmann et al., 2009) [142]	P, MP, Au	5, 9, 11, 13, 14
34	(Ahmadi et al., 2022) [86]	P	6
35	(Causse et al., 2024) [87]	P, WL, S	10
36	(Lin et al., 2012) [88]	WL, A	11
37	(X. Wang et al., 2020) [64]	P, MP, WL	6, 8, 9
38	(Lutnyk et al., 2023) [89]	P, MP	6, 13
39	(Yiu et al., 2022) [90]	P, WL	2, 3, 8
40	(Q. Li et al., 2023) [91]	P	5, 8
41	(Mohanavelu et al., 2020) [109]	P, MP, WL	2, 8, 9
42	(Chen et al., 2022) [92]	P, MP, WL	2, 3, 6, 9, 14
43	(Haslbeck & Zhang, 2017) [93]	P, E	5, 6
44	(Johnson & Pritchett, 1995) [43]	Au	1, 5
45	(Farjadian et al., 2017) [125]	Au	1, 5
46	(Jin et al., 2021) [94]	P, E	6, 11
47	(Hernández-Sabaté et al., 2024) [95]	P, WL	1, 3, 8
48	(Dong et al., 2023) [18]	S	2, 7, 11
49	(Thomas, 2011) [96]	Au	11
50	(Lounis et al., 2021) [97]	P, E, A	5, 6
51	(Suppiah et al., 2020) [98]	WL	3
52	(H. Wang et al., 2022) [129]	P, MP	2, 9, 13, 14
53	(Zhang et al., 2019) [99]	P, MP, WL	3, 5, 6, 9, 14
54	(Ververs et al., 2011) [100]	Au	3, 5, 11
55	(Gontar et al., 2017) [101]	WL	1, 3, 5
56	(Taheri Gorji et al., 2023) [66]	P, WL	2, 5, 8, 11
57	(Binias et al., 2023) [62]	P	1, 8
58	(Y. Li et al., 2013) [102]	P, MP	8, 9
59	(J. Sun et al., 2019) [103]	P, WL	1, 3, 10
60	(W.-C. Li et al., 2022) [104]	WL, A	3, 4
61	(Huettig et al., 1995) [105]	P	3, 6
62	(Socha et al., 2022) [106]	F	5
63	(Han et al., 2020) [61]	P, MP, WL, F	2, 8, 9, 13, 14
64	(Samel et al., 1997) [107]	P, MP, F	3, 8, 9, 11
65	(Škvareková et al., 2020) [108]	P, WL, E	6
66	(Y. Wang et al., 2024) [59]	P, WL	1, 2, 8
67	(K et al., 2020) [109]	P	3, 8
68	(Shao et al., 2021) [110]	P, A	1, 6
69	(Dorneich et al., 2017) [111]	P, Au	1, 3, 7, 11
70	(Lee et al., 2023) [112]	P	1, 2, 8
71	(Alaimo et al., 2020) [113]	P, WL, F	3, 5, 9
72	(Mansikka et al., 2016) [114]	P, WL	5, 9
73	(Bennett, 2018) [50]	WL, F	11
74	(Mansikka et al., 2019) [115]	P, WL	3, 9, 11
75	(Diaz-Piedra et al., 2019) [131]	P, MP, WL	3, 5, 6
76	(Allsop & Gray, 2014) [133]	P, MP, A	5, 6, 9, 11
77	(Astolfi et al., 2011) [116]	P	8
78	(Di Nocera et al., 2007) [117]	P, WL	3, 6
79	(Di Stasi et al., 2015) [132]	P, WL	8, 11
80	(Thomas et al., 2015) [134]	P, MP, WL, F	5, 6, 8, 9, 11
81	(van de Merwe et al., 2012) [136]	P, A	6
82	(van Dijk et al., 2011) [135]	P, A	6, 11
83	(Bellenkes et al., 1997) [118]	P, E, A	5, 674]
84	(Naeeri et al., 2021) [121]	P, Au, F, E	6
85	(Itoh et al., 1990) [119]	P, WL	6, 9, 11
87	(Wilson, 2002) [140]	P, MP, WL	8, 9, 11, 13
88	(Gibb et al., 2008) [120]	E	5
89	(Dehais et al., 2019) [139]	P, WL	2, 8
90	(Bromfield et al., 2023) [122]	Au	3, 5

Note: The Research Foci are defined as automation (Au), physiological measurements (P), multimodal physiology (MP), Single-pilot/reduced-crewing operations (S), workload (WL), expertise (E), fatigue (F), attention (A), and decision-making (D). The Performance Measures are defined as (1) alarm detection/signal detection theory, (2) advanced statistical methods or machine learning algorithm, (3) NASA-TLX subjective workload scale, (4) SAGAT or SART situational awareness rating technique, (5) traditional flight performance metrics, (6) oculometrics/gaze behavior, (7) behavioral data analysis, (8) electroencephalogram (EEG), (9) electrocardiogram (ECG), (10) functional near-infrared spectroscopy (fNIRS), (11) research-specific subjective questionnaires, (12) joystick/grip measurements, (13) electrodermal activity (EDA), and (14) respiration activity.

References

Griffiths, N.; Bowden, V.; Wee, S.; Loft, S. Return-to-Manual Performance can be Predicted Before Automation Fails. Hum. Factors 2024, 66, 1333–1349. [Google Scholar] [CrossRef] [PubMed]
Duchevet, A.; Imbert, J.P.; Hogue, T.D.L.; Ferreira, A.; Moens, L.; Colomer, A.; Cantero, J.; Bejarano, C.; Vázquez, A.R. HARVIS: A digital assistant based on cognitive computing for non-stabilized approaches in Single Pilot Operations. Transp. Res. Procedia 2022, 66, 253–261. [Google Scholar] [CrossRef]
Vu, K.P.L.; Lachter, J.; Battiste, V.; Strybel, T.Z. Single Pilot Operations in Domestic Commercial Aviation. Hum. Factors 2018, 60, 755–762. [Google Scholar] [CrossRef]
Myers, C.; Ball, J.; Cooke, N.; Freiman, M.; Caisse, M.; Rodgers, S.; Demir, M.; McNeese, N. Autonomous Intelligent Agents for Team Training. IEEE Intell. Syst. 2019, 34, 3–14. [Google Scholar] [CrossRef]
Haslbeck, A.; Hoermann, H.J. Flying the Needles: Flight Deck Automation Erodes Fine-Motor Flying Skills Among Airline Pilots. Hum. Factors 2016, 58, 533–545. [Google Scholar] [CrossRef]
Parasuraman, R.; Sheridan, T.B.; Wickens, C.D. A model for types and levels of human interaction with automation. IEEE Trans. Syst. Man. Cybern. Part A Syst. Hum. 2000, 30, 286–297. [Google Scholar] [CrossRef]
Geiselman, E.E.; Johnson, C.M.; Buck, D.R. Flight Deck Automation: Invaluable Collaborator or Insidious Enabler? Ergon. Des. 2013, 21, 22–26. [Google Scholar] [CrossRef]
Butterworth-Hayes, P. Pilot training for fifth-generation fighters. Aerosp. Am. 2012, 50, 4–6. [Google Scholar]
Svoboda, A.; Boril, J.; Bauer, M.; Costa, P.C.G.; Blasch, E. Information Overload in Tactical Aircraft. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), San Diego, CA, USA, 8–12 September 2019; pp. 1–5. Available online: https://ieeexplore.ieee.org/abstract/document/9081763 (accessed on 2 August 2024).
Summerfield, D.; Raslau, D.; Johnson, B.; Steinkraus, L. Physiologic Challenges to Pilots of Modern High Performance Aircraft. In Aircraft Technology; IntechOpen: London, UK, 2018; Available online: https://www.intechopen.com/chapters/61486 (accessed on 4 June 2025).
Baxter, G.; Rooksby, J.; Wang, Y.; Khajeh-Hosseini, A. The ironies of automation: Still going strong at 30? In Proceedings of the ECCE’12: 30th European Conference on Cognitive Ergonomics, New York, NY, USA, 28–31 August 2012; pp. 65–71. [Google Scholar] [CrossRef]
Parnell, K.J.; Banks, V.A.; Allison, C.K.; Plant, K.L.; Beecroft, P.; Stanton, N.A. Designing flight deck applications: Combining insight from end-users and ergonomists. Cogn. Technol. Work 2021, 23, 353–365. [Google Scholar] [CrossRef]
Wickens, C.D.; Helleberg, J.; Xu, X. Pilot maneuver choice and workload in free flight. Hum. Factors 2002, 44, 171. [Google Scholar] [CrossRef]
Harris, D. Distributed Cognition in Flight Operations. In Engineering Psychology and Cognitive Ergonomics Applications and Services; Lecture Notes in Computer Science; Harris, D., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 125–133. [Google Scholar]
National Academies of Sciences, Engineering, and Medicine. Review of Methods Used by the U.S. Department of Energy in Setting Appliance and Equipment Standards; National Academies Press: Washington, DC, USA, 2021; Available online: https://www.nap.edu/catalog/25992 (accessed on 7 July 2025).
Brand, Y.; Schulte, A. Workload-adaptive and task-specific support for cockpit crews: Design and evaluation of an adaptive associate system. Hum-Intell. Syst. Integr. 2021, 3, 187–199. [Google Scholar] [CrossRef]
Austin, R. Introduction to Unmanned Aircraft Systems (UAS). In Unmanned Aircraft Systems; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2010; pp. 1–15. [Google Scholar] [CrossRef]
Dong, L.; Chen, H.; Zhao, C.; Wang, P. Analysis of Single-Pilot Intention Modeling in Commercial Aviation. Int. J. Aerosp. Eng. 2023, 2023, 9713312. [Google Scholar] [CrossRef]
Passarella, R.; Veny, H.; Fachrurrozi, M.; Samsuryadi, S.; Vindriani, M. Evaluating the influence of the international civil aviation organization on aircraft accident rates and fatalities: A seven-decade historical data analysis. Acadlore Trans. Appl. Math. Stat. 2023, 1, 33–43. [Google Scholar] [CrossRef]
O’Connor, R.; Roberts, Z.; Ziccardi, J.; Koteskey, R.; Lachter, J.; Dao, Q.; Johnson, W.; Battiste, V.; Vu, K.-P.L.; Strybel, T.Z. Pre-study Walkthrough with a Commercial Pilot for a Preliminary Single Pilot Operations Experiment. In Human Interface and the Management of Information Information and Interaction for Health, Safety, Mobility and Complex Environments; Yamamoto, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 136–142. [Google Scholar]
Yang, J.; Qu, Z.; Song, Z.; Yu, Q.; Chen, X.; Li, X. Initial Student Attention-Allocation and Flight-Performance Improvements Based on Eye-Movement Data. Appl. Sci. 2023, 13, 9876. [Google Scholar] [CrossRef]
Dehais, F.; Causse, M.; Vachon, F.; Régis, N.; Menant, E.; Tremblay, S. Failure to Detect Critical Auditory Alerts in the Cockpit: Evidence for Inattentional Deafness. Hum. Factors 2014, 56, 631–644. [Google Scholar] [CrossRef]
Endsley, M.R. Toward a Theory of Situation Awareness in Dynamic Systems. Hum. Factors 1995, 37, 32–64. [Google Scholar] [CrossRef]
McGiffin, J. Mission (Command) Complete: Implications of JADC2. Jt. Force Q. 2024, 113, 14. [Google Scholar]
Bainbridge, L. Ironies of automation. Automatica 1983, 19, 775–779. [Google Scholar] [CrossRef]
Hancke, T. Ironies of Automation 4.0. IFAC-Pap. 2020, 53, 17463–17468. [Google Scholar] [CrossRef]
Sarter, N.B.; Mumaw, R.J.; Wickens, C.D. Pilots’ Monitoring Strategies and Performance on Automated Flight Decks: An Empirical Study Combining Behavioral and Eye-Tracking Data. Hum. Factors 2007, 49, 347–357. [Google Scholar] [CrossRef]
Balfe, N.; Sharples, S.; Wilson, J.R. Impact of automation: Measurement of performance, workload and behaviour in a complex control environment. Appl. Ergon. 2015, 47, 52–64. [Google Scholar] [CrossRef]
Harris, W.C.; Hancock; Arthur, P.A.; Erik, J.; Caird, J.K. Performance, Workload, and Fatigue Changes Associated with Automation. Int. J. Aviat. Psychol. 1995, 5, 169–185. [Google Scholar] [CrossRef] [PubMed]
Moray, N. Models and Measures of Mental Workload. In Mental Workload: Its Theory and Measurement; Moray, N., Ed.; Springer: Boston, MA, USA, 1979; pp. 13–21. [Google Scholar] [CrossRef]
Hancock, P.; Warm, J. A Dynamic Model of Stress and Sustained Attention. J. Hum. Perform. Extreme Environ. 2003, 7, 4–28. [Google Scholar] [CrossRef] [PubMed]
Young, M.S.; Brookhuis, K.A.; Wickens, C.D.; Hancock, P.A. State of science: Mental workload in ergonomics. Ergonomics 2015, 58, 1–17. [Google Scholar] [CrossRef]
Wickens, C.D. Multiple Resources and Mental Workload. Hum. Factors 2008, 50, 449–455. [Google Scholar] [CrossRef]
Hart, S.G. Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2006, 50, 904–908. [Google Scholar] [CrossRef]
Metzger, U.; Parasuraman, R. Automation in Future Air Traffic Management: Effects of Decision Aid Reliability on Controller Performance and Mental Workload. Hum. Factors 2005, 47, 35–49. [Google Scholar] [CrossRef]
Hancock, P.A.; Jagacinski, R.J.; Parasuraman, R.; Wickens, C.D.; Wilson, G.F.; Kaber, D.B. Human-Automation Interaction Research: Past, Present, and Future. Ergon. Des. 2013, 21, 9–14. [Google Scholar] [CrossRef]
Schulte, A. Cognitive Automation for Tactical Mission Management: Concept and Prototype Evaluation in Flight Simulator Trials. Cogn. Technol. Work. 2002, 4, 146–159. [Google Scholar] [CrossRef]
Edwards, T.; Homola, J.; Mercer, J.; Claudatos, L. Multifactor interactions and the air traffic controller: The interaction of situation awareness and workload in association with automation. IFAC-PapersOnLine 2016, 49, 597–602. [Google Scholar] [CrossRef]
Endsley, M.R.; Kiris, E.O. The Out-of-the-Loop Performance Problem and Level of Control in Automation. Hum. Factors 1995, 37, 381–394. [Google Scholar] [CrossRef]
Verdière, K.J.; Roy, R.N.; Dehais, F. Detecting Pilot’s Engagement Using fNIRS Connectivity Features in an Automated vs. Manual Landing Scenario. Front. Hum. Neurosci. 2018, 12, 6. [Google Scholar]
Jones, D.G.; Endsley, M.R. Sources of situation awareness errors in aviation. Aviat. Space Environ. Med. 1996, 67, 507–512. [Google Scholar]
Naranji, E.; Sarkani, S.; Mazzuchi, T. Reducing Human/Pilot Errors in Aviation Using Augmented Cognition and Automation Systems in Aircraft Cockpit. AIS Trans. Hum.-Comput. Interact. 2015, 7, 71–96. [Google Scholar] [CrossRef]
Johnson, E.N.; Pritchett, A.R. Experimental Study of Vertical Flight Path Mode Awareness. IFAC Proc. Vol. 1995, 28, 153–158. [Google Scholar] [CrossRef]
Li, W.C.; Horn, A.; Sun, Z.; Zhang, J.; Braithwaite, G. Augmented visualization cues on primary flight display facilitating pilot’s monitoring performance. Int. J. Hum.-Comput. Stud. 2020, 135, 102377. [Google Scholar] [CrossRef]
Pope, A.T.; Bogart, E.H.; Bartolome, D.S. Biocybernetic system evaluates indices of operator engagement in automated task. Biol. Psychol. 1995, 40, 187–195. [Google Scholar] [CrossRef]
Khan, M.J.; Hong, K.S. Passive BCI based on drowsiness detection: An fNIRS study. Biomed. Opt. Express 2015, 6, 4063–4078. [Google Scholar] [CrossRef]
Roy, R.N.; Frey, J. Neurophysiological Markers for Passive Brain–Computer Interfaces. In Brain–Computer Interfaces; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2016; pp. 85–100. [Google Scholar] [CrossRef]
Zander, T.O.; Kothe, C. Towards passive brain–computer interfaces: Applying brain–computer interface technology to human–machine systems in general. J. Neural Eng. 2011, 8, 025005. [Google Scholar] [CrossRef]
Wright, N.; McGown, A. Vigilance on the civil flight deck: Incidence of sleepiness and sleep during long-haul flights and associated changes in physiological parameters. Ergonomics 2001, 44, 82–106. [Google Scholar] [CrossRef]
Bennett, S.A. Pilot workload and fatigue on short-haul routes: An evaluation supported by instantaneous self-assessment and ethnography. J. Risk Res. 2018, 21, 645–677. [Google Scholar] [CrossRef]
Shanker, T.; Richtel, M. In new military, data overload can be deadly. The New York Times, 16 January 2011. [Google Scholar]
Darzi, A.; Gaweesh, S.M.; Ahmed, M.M.; Novak, D. Identifying the Causes of Drivers’ Hazardous States Using Driver Characteristics, Vehicle Kinematics, and Physiological Measurements. Front. Neurosci. 2018, 12, 568. [Google Scholar] [CrossRef]
Ebbatson, M.; Harris, D.; Huddlestone, J.; Sears, R. The relationship between manual handling performance and recent flying experience in air transport pilots. Ergonomics 2010, 53, 268–277. [Google Scholar] [CrossRef] [PubMed]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ 2009, 339, b2535. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An Updated Guideline for Reporting Systematic Reviews. 29 March 2021. Available online: https://www.bmj.com/content/372/bmj.n71 (accessed on 7 July 2025).
Ouzzani, M.; Hammady, H.; Fedorowicz, Z.; Elmagarmid, A. Rayyan—A web and mobile app for systematic reviews. Syst. Rev. 2016, 5, 210. [Google Scholar] [CrossRef]
Higgins, J.P.T.; Altman, D.G.; Gøtzsche, P.C.; Jüni, P.; Moher, D.; Oxman, A.D.; Savović, J.; Schulz, K.F.; Weeks, L.; Sterne, J.A.C.; et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 2011, 343, d5928. [Google Scholar] [CrossRef] [PubMed]
McGuinness, L.A.; Higgins, J.P.T. Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Res. Synth. Methods. 2021, 12, 55–61. [Google Scholar] [CrossRef]
Wang, Y.; Han, M.; Peng, Y.; Zhao, R.; Fan, D.; Meng, X.; Xu, H.; Niu, H.; Cheng, J. LGNet: Learning local–global EEG representations for cognitive workload classification in simulated flights. Biomed. Signal Process Control 2024, 92, 106046. [Google Scholar] [CrossRef]
Mohanavelu, K.; Poonguzhali, S.; Janani, A.; Vinutha, S. Machine learning-based approach for identifying mental workload of pilots. Biomed. Signal Process Control 2022, 75, 103623. [Google Scholar] [CrossRef]
Han, S.Y.; Kwak, N.S.; Oh, T.; Lee, S.W. Classification of pilots’ mental states using a multimodal deep learning network. Biocybern. Biomed. Eng. 2020, 40, 324–336. [Google Scholar] [CrossRef]
Binias, B.; Myszor, D.; Binias, S.; Cyran, K.A. Analysis of Relation between Brainwave Activity and Reaction Time of Short-Haul Pilots Based on EEG Data. Sensors. 2023, 23, 6470. [Google Scholar] [CrossRef]
Li, H.; Zhu, P.; Shao, Q. Rapid Mental Workload Detection of Air Traffic Controllers with Three EEG Sensors. Sensors 2024, 24, 4577. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Gong, G.; Li, N.; Ding, L. Use of multimodal physiological signals to explore pilots’ cognitive behaviour during flight strike task performance. Med. Nov. Technol. Devices 2020, 5, 100030. [Google Scholar] [CrossRef]
Anneke, H.; Nils, C. Investigating mental workload-induced changes in cortical oxygenation and frontal theta activity during simulated flights. Sci. Rep. Nat. Publ. Group 2022, 12, 6449. [Google Scholar]
Taheri Gorji, H.; Wilson, N.; VanBree, J.; Hoffmann, B.; Petros, T.; Tavakolian, K. Using machine learning methods and EEG to discriminate aircraft pilot cognitive workload during flight. Sci. Rep. Nat. Publ. Group. 2023, 13, 2507. [Google Scholar] [CrossRef]
Manzey, D.; Reichenbach, J.; Onnasch, L. Human Performance Consequences of Automated Decision Aids: The Impact of Degree of Automation and System Experience. J. Cogn. Eng. Decis. Mak. 2012, 6, 57–87. [Google Scholar] [CrossRef]
Onnasch, L.; Wickens, C.D.; Li, H.; Manzey, D. Human Performance Consequences of Stages and Levels of Automation: An Integrated Meta-Analysis. Hum. Factors 2014, 56, 476–488. [Google Scholar] [CrossRef]
Alreshidi, I.; Moulitsas, I.; Jenkins, K.W. Multimodal Approach for Pilot Mental State Detection Based on EEG. Sensors 2023, 23, 7350. [Google Scholar] [CrossRef] [PubMed]
Lefrançois, O.; Matton, N.; Causse, M. Improving Airline Pilots’ Visual Scanning and Manual Flight Performance through Training on Skilled Eye Gaze Strategies. Safety 2021, 7, 70. [Google Scholar] [CrossRef]
Brams, S.; Hooge, I.T.J.; Ziv, G.; Dauwe, S.; Evens, K.; De Wolf, T.; Levin, O.; Wagemans, J.; Helsen, W.F. Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task? PLoS ONE 2018, 13, e0207439. [Google Scholar] [CrossRef]
Schmid, D.; Stanton, N.A. Exploring Bayesian analyses of a small-sample-size factorial design in human systems integration: The effects of pilot incapacitation. Hum.-Intell. Syst. Integr. 2019, 1, 71–88. [Google Scholar] [CrossRef]
Xing, G.; Sun, Y.; He, F.; Wei, P.; Wu, S.; Ren, H.; Chen, Z. Analysis of Human Factors in Typical Accident Tests of Certain Type Flight Simulator. Sustainability. 2023, 15, 2791. [Google Scholar] [CrossRef]
Gateau, T.; Ayaz, H.; Dehais, F. In silico vs. Over the Clouds: On-the-Fly Mental State Estimation of Aircraft Pilots, Using a Functional Near Infrared Spectroscopy Based Passive-BCI. Front. Hum. Neurosci. 2018, 12, 187. [Google Scholar]
Zanoni, A.; Garbo, P.; Masarati, P.; Quaranta, G. Frustrated Total Internal Reflection Measurement System for Pilot Inceptor Grip Pressure. Sensors 2023, 23, 6308. [Google Scholar] [CrossRef]
Jankovics, I.; Kale, U. Developing the pilots’ load measuring system. Aircr. Eng. Aerosp. Technol. 2019, 91, 281–288. [Google Scholar] [CrossRef]
Causse, M.; Baracat, B.; Pastor, J.; Dehais, F. Reward and Uncertainty Favor Risky Decision-Making in Pilots: Evidence from Cardiovascular and Oculometric Measurements. Appl. Psychophysiol. Biofeedback 2011, 36, 231–242. [Google Scholar] [CrossRef] [PubMed]
Ma, S.; Guo, J.; Zeng, S.; Che, H.; Pan, X. Modeling eye movement in dynamic interactive tasks for maximizing situation awareness based on Markov decision process. Sci. Rep. Nat. Publ. Group 2022, 12, 6449. [Google Scholar] [CrossRef]
Fellah, K.; Guiatni, M.; Ournid, A.K.; Boulahlib, M.A. Fuzzy-PID side-stick force control for flight simulation. Aeronaut. J. 2016, 120, 845–872. [Google Scholar] [CrossRef]
Lassalle, J.; Rauffet, P.; Leroy, B.; Guérin, C.; Chauvin, C.; Coppin, G.; Saïd, F. Communication and Workload Analyses to Study the Collective Work of Fighter Pilots: The COWORK² method. Cogn. Technol. Work 2017, 19, 477–491. [Google Scholar] [CrossRef]
Mohanavelu, K.; Poonguzhali, S.; Ravi, D.; Singh, P.K.; Mahajabin, M.; Ramachandran, K.; Singh, U.K.; Jayaraman, S. Cognitive Workload Analysis of Fighter Aircraft Pilots in Flight Simulator Environment. Def. Sci. J. 2020, 70, 131–139. [Google Scholar] [CrossRef]
Sun, H.; Zhou, X.; Zhang, P.; Liu, X.; Lu, Y.; Huang, H.; Song, W. Competency-based assessment of pilots’ manual flight performance during instrument flight training. Cogn. Technol. Work. 2023, 25, 345–356. [Google Scholar] [CrossRef]
Silva, J.R.; Ribeiro, M.W.; Deolindo, C.S.; Aratanha, M.A.; de Andrade, D.; Forster, C.H.; Figueira, J.M.; Corrêa, F.L.; Lacerda, S.S.; Machado, B.S.; et al. Quantitative assessment of pilot-endured workloads during helicopter flying emergencies: An analysis of physiological parameters during an autorotation. Sci. Rep. Nat. Publ. Group. 2021, 11, 17734. [Google Scholar]
Takahashi, M.D.; Fujizawa, B.T.; Lusardi, J.A.; Goerzen, C.L.; Cleary, M.J.; Carr, J.P.; Waldman, D.W. Comparison of Autonomous Flight Control Performance Between Partial- and Full-Authority Helicopters. J. Guid. Control Dyn. 2022, 45, 885–901. [Google Scholar] [CrossRef]
Wei, Z.; Zhuang, D.; Wanyan, X.; Liu, C.; Zhuang, H. A model for discrimination and prediction of mental workload of aircraft cockpit display interface. Chin. J. Aeronaut. 2014, 27, 1070–1077. [Google Scholar] [CrossRef]
Ahmadi, N.; Romoser, M.; Salmon, C. Improving the tactical scanning of student pilots: A gaze-based training intervention for transition from visual flight into instrument meteorological conditions. Appl. Ergon. 2022, 100, 103642. [Google Scholar] [CrossRef] [PubMed]
Causse, M.; Mouratille, D.; Rouillard, Y.; El Yagoubi, R.; Matton, N.; Hidalgo-Muñoz, A. How a pilot’s brain copes with stress and mental load? Insights from the executive control network. Behav. Brain Res. 2024, 456, 114698. [Google Scholar] [CrossRef] [PubMed]
Lin, C.J.; Lin, P.H.; Chen, H.J.; Hsieh, M.C.; Yu, H.C.; Wang, E.M.; Ho, H.L. Effects of controller-pilot communication medium, flight phase and the role in the cockpit on pilots’ workload and situation awareness. Saf. Sci. 2012, 50, 1722–1731. [Google Scholar] [CrossRef]
Lutnyk, L.; Rudi, D.; Schinazi, V.R.; Kiefer, P.; Raubal, M. The effect of flight phase on electrodermal activity and gaze behavior: A simulator study. Appl. Ergon. 2023, 109, 103989. [Google Scholar] [CrossRef]
Yiu, C.Y.; Ng, K.K.H.; Li, X.; Zhang, X.; Li, Q.; Lam, H.S.; Chong, M.H. Towards safe and collaborative aerodrome operations: Assessing shared situational awareness for adverse weather detection with EEG-enabled Bayesian neural networks. Adv. Eng. Inform. 2022, 53, 101698. [Google Scholar] [CrossRef]
Li, Q.; Ng, K.K.H.; Yiu, C.Y.; Yuan, X.; So, C.K.; Ho, C.C. Securing air transportation safety through identifying pilot’s risky VFR flying behaviours: An EEG-based neurophysiological modelling using machine learning algorithms. Reliab. Eng. Syst. Saf. 2023, 238, 109449. [Google Scholar] [CrossRef]
Chen, J.; Xue, L.; Rong, J.; Gao, X. Real-time evaluation method of flight mission load based on sensitivity analysis of physiological factors. Chin. J. Aeronaut. 2022, 35, 450–463. [Google Scholar] [CrossRef]
Haslbeck, A.; Zhang, B. I spy with my little eye: Analysis of airline pilots’ gaze patterns in a manual instrument flight scenario. Appl. Ergon. 2017, 63, 62–71. [Google Scholar] [CrossRef]
Jin, H.; Hu, Z.; Li, K.; Chu, M.; Zou, G.; Yu, G.; Zhang, J. Study on How Expert and Novice Pilots Can Distribute Their Visual Attention to Improve Flight Performance. IEEE Access 2021, 9, 44757–44769. [Google Scholar] [CrossRef]
Hernández-Sabaté, A.; Yauri, J.; Folch, P.; Álvarez, D.; Gil, D. EEG Dataset Collection for Mental Workload Predictions in Flight-Deck Environment. Sensors 2024, 24, 1174. [Google Scholar] [CrossRef] [PubMed]
Thomas, L.C. Evaluation of Levels of Automation for Non-Normal Event Resolution. SAE Int. J. Aerosp. 2011, 4, 1191–1196. [Google Scholar] [CrossRef]
Lounis, C.; Peysakhovich, V.; Causse, M. Visual scanning strategies in the cockpit are modulated by pilots’ expertise: A flight simulator study. PLoS ONE 2021, 16, e0247061. [Google Scholar] [CrossRef]
Suppiah, S.; Liu, D.; Sang, A.L.; Dattel, A.; Vincenzi, D. Impact of Electronic Flight Bag (efb) on Single Pilot Performance and Workload. Int. J. Aviat. Aeronaut. Aerosp. 2020, 7, 4. [Google Scholar] [CrossRef]
Zhang, X.; Sun, Y.; Qiu, Z.; Bao, J.; Zhang, Y. Adaptive Neuro-Fuzzy Fusion of Multi-Sensor Data for Monitoring a Pilot’s Workload Condition. Sensors 2019, 19, 3629. [Google Scholar] [CrossRef] [PubMed]
Ververs, P.M.; He, G.; Suddreth, J.; Odgers, R.; Engels, J.; Wyatt, I.; Hughes, K.; Hamblin, C.; Feyereisen, T. Design and Flight Test of a Primary Flight Display Combined Vision System. SAE Int. J. Aerosp. 2011, 4, 738–750. [Google Scholar] [CrossRef]
Gontar, P.; Schneider, S.A.E.; Schmidt-Moll, C.; Bollin, C.; Bengler, K. Hate to interrupt you, but… analyzing turn-arounds from a cockpit perspective. Cogn. Technol. Work. 2017, 19, 837–853. [Google Scholar] [CrossRef]
Li, Y.; Zhang, T.; Deng, L.; Wang, B.; Nakamura, M. EEG Physiological Signals Correlation under Condition of +Gz Accelerations. J. Multimed. 2013, 8, 64–71. [Google Scholar] [CrossRef]
Sun, J.; Cheng, S.; Ma, J.; Xiong, K.; Su, M.; Hu, W. Assessment of the static upright balance index and brain blood oxygen levels as parameters to evaluate pilot workload. PLoS ONE 2019, 14, e0214277. [Google Scholar] [CrossRef]
Li, W.C.; Zhang, J.; Court, S.; Kearney, P.; Braithwaite, G. The influence of augmented reality interaction design on Pilot’s perceived workload and situation awareness. Int. J. Ind. Ergon. 2022, 92, 103382. [Google Scholar] [CrossRef]
Huettig, G.; Hotes, A.; Tautz, A. Design and Evaluation of an ATC-Display in Modern Glass Cockpit. IFAC Proc. Vol. 1995, 28, 541–545. [Google Scholar] [CrossRef]
Socha, V.; Hanáková, L.; Weiss, J.; Matyáš, R.; Karapetjan, L.; Pilmannová, T.; Kušmírek, S. The Influence of Fatigue on an Instrument Approach. Transp. Res. Procedia 2022, 65, 275–282. [Google Scholar] [CrossRef]
Samel, A.; Wegmann, H.M.; Vejvoda, M. Aircrew fatigue in long-Haul operations. Accid. Anal. Prev. 1997, 29, 439–452. [Google Scholar] [CrossRef]
Škvareková, I.; Pecho, P.; Ažaltovič, V.; Kandera, B. Number of Saccades and Fixation Duration as Indicators of Pilot Workload. Transp. Res. Procedia 2020, 51, 67–74. [Google Scholar] [CrossRef]
Mohanavelu, K.; Poonguzhali, S.; Adalarasu, K.; Ravi, D.; Chinnadurai, V.; Vinutha, S.; Ramachandran, K.; Jayaraman, S. Dynamic cognitive workload assessment for fighter pilots in simulated fighter aircraft environment using EEG. Biomed. Signal. Process Control. 2020, 61, 102018. [Google Scholar]
Shao, F.; Lu, T.; Wang, X.; Liu, Z.; Zhang, Y.; Liu, X.; Wu, S. The influence of pilot’s attention allocation on instrument reading during take-off: The mediating effect of attention span. Appl. Ergon. 2021, 90, 103245. [Google Scholar] [CrossRef] [PubMed]
Dorneich, M.C.; Dudley, R.; Letsu-Dake, E.; Rogers, W.; Whitlow, S.D.; Dillard, M.C.; Nelson, E. Interaction of Automation Visibility and Information Quality in Flight Deck Information Automation. IEEE Trans. Hum.-Mach. Syst. 2017, 47, 915–926. [Google Scholar] [CrossRef]
Lee, D.H.; Jeong, J.H.; Yu, B.W.; Kam, T.E.; Lee, S.W. Autonomous System for EEG-Based Multiple Abnormal Mental States Classification Using Hybrid Deep Neural Networks Under Flight Environment. IEEE Trans. Syst. Man. Cybern. Syst. 2023, 53, 6426–6437. [Google Scholar] [CrossRef]
Alaimo, A.; Esposito, A.; Orlando, C.; Simoncini, A. Aircraft Pilots Workload Analysis: Heart Rate Variability Objective Measures and NASA-Task Load Index Subjective Evaluation. Aerospace 2020, 7, 137. [Google Scholar] [CrossRef]
Mansikka, H.; Simola, P.; Virtanen, K.; Harris, D.; Oksama, L. Fighter pilots’ heart rate, heart rate variation and performance during instrument approaches. Ergonomics 2016, 59, 1344–1352. [Google Scholar] [CrossRef] [PubMed]
Mansikka, H.; Virtanen, K.; Harris, D. Comparison of NASA-TLX scale, modified Cooper-Harper scale and mean inter-beat interval as measures of pilot mental workload during simulated flight tasks. Ergonomics 2019, 62, 246–254. [Google Scholar] [CrossRef]
Astolfi, L.; Toppi, J.; Borghini, G.; Vecchiato, G.; Isabella, R.; De Vico Fallani, F.; Cincotti, F.; Salinari, S.; Mattia, D.; He, B.; et al. Study of the functional hyperconnectivity between couples of pilots during flight simulation: An EEG hyperscanning study. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Boston, MA, USA, 30 August–3 September 2011; pp. 2338–2341. Available online: https://ieeexplore.ieee.org/abstract/document/6090654?casa_token=An2x3IX22sAAAAAA:qY7ELDrykcImVLclXEwVv3lQtUDQ5onquK6Y-F3x3vPGAaoy4SAqVbi49ADfir2VLXKRhp832A (accessed on 9 October 2024).
Di Nocera, F.; Camilli, M.; Terenzi, M. A Random Glance at the Flight Deck: Pilots’ Scanning Strategies and the Real-Time Assessment of Mental Workload. J. Cogn. Eng. Decis. Mak. 2007, 1, 271–285. [Google Scholar] [CrossRef]
Bellenkes, A.H.; Wickens, C.D.; Kramer, A.F. Visual scanning and pilot expertise: The role of attentional flexibility and mental model development. Aviat. Space Environ. Med. 1997, 68, 569–579. [Google Scholar] [PubMed]
Itoh, Y.; Hayashi, Y.; Tsukui, I.; Saito, S. The ergonomic evaluation of eye movement and mental workload in aircraft pilots. Ergonomics 1990, 33, 719–732. [Google Scholar] [CrossRef]
Gibb, R.; Schvaneveldt, R.; Gray, R. Visual Misperception in Aviation: Glide Path Performance in a Black Hole Environment. Hum. Factors 2008, 50, 699–711. [Google Scholar] [CrossRef] [PubMed]
Naeeri, S.; Kang, Z.; Mandal, S.; Kim, K. Multimodal Analysis of Eye Movements and Fatigue in a Simulated Glass Cockpit Environment. Aerospace 2021, 8, 283. [Google Scholar] [CrossRef]
Bromfield, M.A.; Milward, T.; Everett, S.B.; Stedmon, A. Pilot performance and workload whilst using an angle of attack system. Appl. Ergon. 2023, 113, 104101. [Google Scholar] [CrossRef]
Ruff, H.A.; Narayanan, S.; Draper, M.H. Human Interaction with Levels of Automation and Decision-Aid Fidelity in the Supervisory Control of Multiple Simulated Unmanned Air Vehicles. Presence 2002, 11, 335–351. [Google Scholar] [CrossRef]
Mosier, K.L.; Fischer, U.; Morrow, D.; Feigh, K.M.; Durso, F.T.; Sullivan, K.; Pop, V. Automation, task, and context features: Impacts on pilots’ judgments of human-automation interaction. J. Cogn. Eng. Decis. Mak. 2013, 7, 377–399. [Google Scholar] [CrossRef]
Farjadian, A.B.; Annaswamy, A.M.; Woods, D. Bumpless Reengagement Using Shared Control between Human Pilot and Adaptive Autopilot. IFAC-PapersOnLine 2017, 50, 5343–5348. [Google Scholar] [CrossRef]
Gouraud, J.; Delorme, A.; Berberian, B. Out of the Loop, in Your Bubble: Mind Wandering Is Independent From Automation Reliability, but Influences Task Engagement. Front. Hum. Neurosci. 2018, 12, 383. [Google Scholar] [CrossRef]
Corsi-Cabrera, M.; Ramos, J.; Guevara, M.A.; Arce, C.; Gutiérrez, S. Gender differences in the EEG during cognitive activity. Int. J. Neurosci. 1993, 72, 257–264. [Google Scholar] [CrossRef] [PubMed]
Dehais, F.; Lafont, A.; Roy, R.; Fairclough, S. A Neuroergonomics Approach to Mental Workload, Engagement and Human Performance. Front. Neurosci. 2020, 14, 268. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Pan, T.; Si, H.; Zhang, H.; Shang, L.; Liu, H. Time-Varying Pilot’s Intention Identification Based on ESAX-CSA-ELM Classification Method in Complex Environment. Appl. Sci. 2022, 12, 4858. [Google Scholar] [CrossRef]
Maik, F.; Lee, S.Y.; Bates, P.; Martin, W.; Faulhaber, A.K. The influence of training level on manual flight in connection to performance, scan pattern, and task load. Cogn. Technol. Work. 2021, 23, 715–730. [Google Scholar]
Diaz-Piedra, C.; Rieiro, H.; Cherino, A.; Fuentes, L.J.; Catena, A.; Di Stasi, L.L. The effects of flight complexity on gaze entropy: An experimental study with fighter pilots. Appl. Ergon. 2019, 77, 92–99. [Google Scholar] [CrossRef]
Di Stasi, L.L.; Diaz-Piedra, C.; Suárez, J.; McCamy, M.B.; Martinez-Conde, S.; Roca-Dorda, J.; Catena, A. Task complexity modulates pilot electroencephalographic activity during real flights. Psychophysiology 2015, 52, 951–956. [Google Scholar] [CrossRef]
Allsop, J.; Gray, R. Flying under pressure: Effects of anxiety on attention and gaze behavior in aviation. J. Appl. Res. Mem. Cogn. 2014, 3, 63–71. [Google Scholar] [CrossRef]
Thomas, L.C.; Gast, C.; Grube, R.; Craig, K. Fatigue Detection in Commercial Flight Operations: Results Using Physiological Measures. Procedia Manuf. 2015, 3, 2357–2364. [Google Scholar] [CrossRef]
van Dijk, H.; van de Merwe, K.; Zon, R. A Coherent Impression of the Pilots’ Situation Awareness: Studying Relevant Human Factors Tools. Int. J. Aviat. Psychol. 2011, 21, 343–356. [Google Scholar] [CrossRef]
van de Merwe, K.; van Dijk, H.; Zon, R. Eye Movements as an Indicator of Situation Awareness in a Flight Simulator Experiment. Int. J. Aviat. Psychol. 2012, 22, 78–95. [Google Scholar] [CrossRef]
Massé, E.; Bartheye, O.; Fabre, L. Classification of Electrophysiological Signatures With Explainable Artificial Intelligence: The Case of Alarm Detection in Flight Simulator. Front. Neuroinform. 2022, 16, 904301. [Google Scholar]
Xue, H.; Zhang, Q.; Zhang, X. Research on the Applicability of Touchscreens in Manned/Unmanned Aerial Vehicle Cooperative Missions. Sensors 2022, 22, 8435. [Google Scholar] [CrossRef] [PubMed]
Dehais, F.; Duprès, A.; Blum, S.; Drougard, N.; Scannella, S.; Roy, R.N.; Lotte, F. Monitoring Pilot’s Mental Workload Using ERPs and Spectral Power with a Six-Dry-Electrode EEG System in Real Flight Conditions. Sensors 2019, 19, 1324. [Google Scholar] [CrossRef]
Wilson, G.F. An Analysis of Mental Workload in Pilots During Flight Using Multiple Psychophysiological Measures. Int. J. Aviat. Psychol. 2002, 12, 3–18. [Google Scholar] [CrossRef]
Klaproth, O.W.; Vernaleken, C.; Krol, L.R.; Halbruegge, M.; Zander, T.O.; Russwinkel, N. Tracing Pilots’ Situation Assessment by Neuroadaptive Cognitive Modeling. Front. Neurosci. 2020, 14, 795. [Google Scholar]
Haarmann, A.; Boucsein, W.; Schaefer, F. Combining electrodermal responses and cardiovascular measures for probing adaptive automation during simulated flight. Appl. Ergon. 2009, 40, 1026–1040. [Google Scholar] [CrossRef]

Figure 1. Flowchart of PRISMA approach implemented in article search and selection strategy.

Figure 2. Timeline heatmap of construct and measurement usage by year.

Figure 3. Risk of bias assessment results [59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120].

Table 1. Summary of performance measurement techniques.

Performance Measurement Technique	Frequency
Alarm detection/SDT	21
Classification algorithm	16
NASA-TLX (perceived workload)	27
SAGAT/SART (perceived situational awareness)	5
Subjective questionnaires	24
Contextual/flight metrics	35
Joystick/control metrics	3
Behavioral data	8
Oculometrics/gaze behavior/eye movement	33
Heart rate/electrocardiogram (ECG)	23
Electroencephalogram (EEG)	23
Respiration activity	6
Electrodermal activity (EDA)	7
Frontal near infrared spectroscopy (fNIRS)	5

Table 2. Summary of article classifications.

Manuscript Focus and Constructs of Interest	Frequency
Physiological measurement	64
Multimodal physiology	19
Automation/decision aid	17
Single-pilot operations/reduced-crewing operations	3
Workload/mental load	40
Expertise	9
Fatigue	9
Attention	17
Decision-making	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kyle, A.R.; Rouser, B.; Paul, R.C.; Jurewicz, K.A. Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review. Aerospace 2025, 12, 626. https://doi.org/10.3390/aerospace12070626

AMA Style

Kyle AR, Rouser B, Paul RC, Jurewicz KA. Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review. Aerospace. 2025; 12(7):626. https://doi.org/10.3390/aerospace12070626

Chicago/Turabian Style

Kyle, Ainsley R., Brock Rouser, Ryan C. Paul, and Katherina A. Jurewicz. 2025. "Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review" Aerospace 12, no. 7: 626. https://doi.org/10.3390/aerospace12070626

APA Style

Kyle, A. R., Rouser, B., Paul, R. C., & Jurewicz, K. A. (2025). Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review. Aerospace, 12(7), 626. https://doi.org/10.3390/aerospace12070626

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantifying Pilot Performance and Mental Workload in Modern Aviation Systems: A Scoping Literature Review

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Quantifying Pilot Performance and Workload

3.2. Evaluating Human–Automation Interactions

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI