Air Traffic Controllers’ Rostering: Sleep Quality, Vigilance, Mental Workload, and Boredom: A Report of Two Case Studies

: Fatigue in air traffic management (ATM) is a well-recognized safety concern. International organizations like ICAO and EASA have responded by advocating for fatigue risk management systems (FRMSs). EU Regulation 2017/373, implemented in January 2020, mandates specific requirements for air traffic service providers (ANSPs) regarding controller fatigue, stress, and rostering practices. These regulations are part of broader safety management protocols. Despite ongoing efforts to raise awareness about fatigue in ATC, standardized operational requirements remain elusive. To address this gap, Eurocontrol recently published “Guidelines on fatigue management in ATC rostering systems” (23 April 2024). This initiative aims to facilitate the adoption of common fatigue management standards across operations. However, neither EU Regulation 2017/373 nor existing documentation provides definitive rostering criteria. ANSPs typically derive these criteria from a combination of scientific research, best practices, historical data, and legal and operational constraints. Assessing and monitoring fatigue in the real-world ATC setting is complex. The multifaceted nature of fatigue makes it difficult to study, as it is influenced by many factors including sleep quality, circadian rhythms, psychosocial stressors, individual differences, and environmental conditions. Long-term studies are often required to fully understand these complex interactions. This paper presents two case studies that attempt to create an evidence-based protocol for fatigue risk monitoring in ATC operations. These studies utilize a non-invasive approach and collect multidimensional data. The cases involved en-route and tower (TWR) controllers from different ATC centers. The results highlight the importance of fatigue assessment in ATC and shed light on the challenges of implementing fatigue monitoring systems within operational environments


Introduction
Air traffic controllers (ATCs) face a uniquely demanding profession.They must rapidly manage complex and dynamic situations, identify and resolve potential conflicts whilst optimizing flight paths-all without compromising safety.This requires them to perceive, understand, and anticipate the characteristics and trajectories of numerous aircrafts simultaneously.These demanding tasks necessitate a significant investment of both mental and physical resources from individual controllers.However, these resources are not limitless and can deteriorate over time, potentially impacting operator performance and consequently jeopardizing the safety of air traffic control operations.In addition to the mentally taxing nature of the work, shift work presents another significant challenge.Many strategic ATC (air traffic control) centers require controllers to work uninterrupted 24 h shifts, seven days a week.
Within these operational contexts, workload, situational awareness, and vigilance are recognized as critical factors that can negatively influence controller performance if compromised.This can manifest as increased response times and a higher error rate [1].
It is important to note that here, the term workload refers to the "mental workload" phenomenon that has been extensively studied in the human factors literature [2,3].Instead, other studies may use the term workload to indicate task load (e.g., the number of aircrafts that must be monitored), which in our case is one of the independent variables affecting the amount of cognitive resources available to execute one or more tasks.For example, a study by Di Mascio et al. [4] employed a modified CAPAN method to investigate the human element by examining TWR controller workload.Their findings suggest that exceeding the maximum manageable airside traffic volume would result in an unacceptable workload for TWR controllers.
To mitigate the impact of these phenomena on the entire air traffic control system, a regulatory framework has been established.This framework outlines measures for detecting, monitoring, and managing fatigue.European Commission Regulation (EC) 2017/373 ATS.OR.135 specifies requirements and provides guidance for developing a fatigue management policy.This policy includes procedures for identifying, preventing, mitigating, and monitoring fatigue levels in ATCs.Additionally, it emphasizes the importance of providing appropriate training and support for ATC personnel [5].
Beyond regulations, fatigue has been extensively researched across various disciplines.Terenzi, Ricciardi, and Di Nocera [6] conducted a recent literature review to provide guidance for effective ATC shift planning.Their primary focus was on identifying appropriate parameters for defining rest periods for ATCs.This review considered three key elements: • Task-specific characteristics: the demands and nature of specific ATC tasks.

•
Physiological needs of the operator: the biological requirements for rest and recovery.

•
Definition of rest periods within the shift schedule: how and when rest breaks are incorporated into ATC shifts.
Research has consistently shown that night shifts lead to decreased vigilance and sleep deprivation from a physiological perspective [7].In terms of task characteristics, high workload and the monotony associated with some ATC tasks provide limited opportunity for controllers to implement strategies to counteract fatigue, making them more susceptible to its effects [8].Conversely, highly challenging tasks which are intrinsically interesting can help resist fatigue [9].The review suggested a maximum shift duration of 7-10 h, to be evaluated based on shift rotation patterns.Additionally, it emphasized the need for longer rest periods following night shifts, as these hours have a more significant impact on fatigue perception.
Despite extensive research on ATC shift scheduling requirements, standardized methods for fatigue assessment and monitoring remain elusive.Therefore, the primary objective of the studies reported here was to investigate the consequences of fatigue in ATCs.Through analyzing sleep patterns, vigilance levels, mental workload, and boredom, these studies aimed to identify the challenges associated with fatigue monitoring within an operational ATC environment.The results of these studies will contribute to refining fatigue management strategies and pave the way for future research and practical applications.

Rationale and Instruments
This study investigated the impact of air traffic controller rostering systems on fatigue and alertness levels.Two case studies were conducted to achieve this objective.These case studies had two main goals: • To assess the functional state of ATCs: This involved using a combination of subjective and objective measures to evaluate their fatigue and alertness.Additionally, it aimed to provide insights into the effectiveness of the current rostering system.• To identify critical elements for a fatigue monitoring protocol: the findings can be used to inform the development of a standardized approach for monitoring fatigue in ATCs.
Case Study 1 compared two groups of air traffic controllers (ATCs): En-route controllers working at an area control center (ACC) who assist aircrafts during the en-route flight phase (either landing or simply traversing the airspace) (hereinafter, Group 1) and TWR controllers working at a high-density airport who manage take-offs, landings, and ground movements (hereinafter, Group 2).
Case Study 2 involved ATCs from three different work settings: controllers working at a high-density control tower (hereinafter, Group 3), controllers working at a low-density control tower (hereinafter, Group 4), and controllers working at an area control center (hereinafter, Group 5).
ATCs from all groups were assessed during different work shifts (morning, afternoon, and night).During each shift, we asked them to self-report sleep quality and duration, provide a subjective assessment of mental workload and boredom, and perform a vigilance task: • Sleep log.The sleep log was a set of questions to be filled out daily upon awakening.It allowed the recording of information regarding sleep duration and quality for each sleep period.• Psychomotor Vigilance Task (PVT) [10].The PVT is a brief, reaction-time based test that takes just a few minutes to complete.It assesses a participant's ability to sustain attention and react quickly to visual stimuli appearing on a device screen.Participants must press a button as quickly as possible when they see the stimulus.The time between the stimulus appearing and the participant's response is recorded.This reaction time serves as a measure of alertness and information processing speed.The PVT software was installed on smartphones left at the workplace for the duration of the data collection period, making it readily accessible to the air traffic controllers who self-administered the test before and after each work session.

•
NASA Task Load Index (NASA-TLX) [11].The NASA-TLX is the gold standard for subjectively assessing mental workload across various operational and research settings.It employs six subscales: Mental Demand, Temporal Demand, Performance, Effort, Frustration.Operators rate each scale from 0 to 100, with the combined score providing a comprehensive index of mental workload associated with a specific task.Controllers filled out the questionnaire after each work session.• The Multidimensional State Boredom Scale (MSBS) [12].The MSBS is a 29-item ques- tionnaire designed to assess an operator's experience of boredom after completing a task.The MSBS measures five distinct aspects of boredom through its subscales: Disengagement, High Arousal, Low Arousal, Inattention, and Time Perception.The total score is commonly used as an indicator of overall boredom.Controllers filled out the questionnaire after each work session.

Case Study 1
As reported above Group 1 and Group 2 were involved in this study.The ACC handling the en-route traffic had a rostering schedule defined by three eight-hour shifts (morning, afternoon, and night).The control tower responsible for the traffic management of the high-density airport (more than 12,000 movements 2021 when the study was conducted) had the same rostering schedule.Both ATC centers operate 24 h a day, 7 days a week (24/7).

Participants
Eighteen air traffic controllers (ATCs) from Group 1 and Group 2 (as described above) volunteered to participate in this study.However, after reviewing the collected data, five participants were excluded due to either an excessive amount of missing data in their questionnaires, or not having worked at least one shift during each time slot (morning, afternoon, and night).This resulted in a final group of 13 controllers (9 en-route ATCs and 4 TWR controllers; average age = 48, standard deviation = 5.56; 7 females and 6 males).To ensure comprehensive data collection, operators were recruited for the study based on their scheduling plan, with the goal of having each shift covered at least three times.

Procedure
During a preliminary meeting, the study objectives were explained to the air traffic controllers, and informed consent was obtained.Participants were briefed on the study duration (20 days) and the specific times they would be required to complete the questionnaires and the Psychomotor Vigilance Task (PVT).Additionally, the controllers received comprehensive instructions to enable them to administer the tests independently.

Data Analysis and Results
Compliance with completing the sleep log and end-of-shift questionnaires varied among participants.In some instances, PVT data were missing from either the beginning or from the end of a shift.This resulted in missing data points within the final dataset.However, due to the collection of multiple measurements throughout each shift, mean values were calculable for most participants across various time points.
To maximize sample size, data from ATCs and TWR controllers were initially combined for analysis.However, the following tables present the results disaggregated by subgroup, including means and standard deviations for both air traffic controllers (ATCs) and TWR controllers.Analysis of variance (ANOVA), a statistical test used to assess the differences between the means of two or more groups, was employed for data analysis.
Sleep log analysis.Sleep duration and quality were analyzed using a repeated-measures ANOVA design, with work shift (morning, afternoon, and night) as an independent variable.The analysis of sleep duration revealed a statistically significant difference between work shifts (F 2,10 = 32.92,p < 0.001) (Table 1).Post-hoc tests indicated that air traffic controllers reported a shorter sleep duration during night shifts compared to morning and afternoon shifts.The sleep quality analysis showed a similar pattern (F 2,10 = 4.81, p < 0.05) (Table 2).PVT.A repeated-measures ANOVA design was used to analyze the difference between pre-work and post-work reaction times, with work shift (morning, afternoon, and night) as a factor.No statistically significant differences were found (F 2,10 = 0.33, p > 0.05) (Table 3).NASA-TLX.The NASA-TLX total score was analyzed using a repeated-measures ANOVA design with work shift (morning, afternoon, and night) as a factor.No statistically significant difference was found (F 2,10 = 0.09, p > 0.05) (Table 4).MSBS.MSBS scores were analyzed by a repeated measures ANOVA design employing shift (morning vs. afternoon vs. night) as a factor.No statistically significant difference was found (F(2,18) = 0.02390, p > 0.05) (Table 5).

Discussion Case Study 1
This case study revealed a high degree of consistency in workload, boredom, and vigilance levels across all shifts (morning, afternoon, and night).Interestingly, vigilance was not impacted by the specific work session within a shift.
While the implemented rostering system appeared to have no significant effect on the measured variables, the night shift appeared to negatively impact sleep duration and quality.It is important to note however, that some ATCs, in particular TWR controllers, frequently worked double shifts (morning and night).This resulted in some overlap between sleep data reported for the night before a morning shift and the night after a night shift.
The main limitation of Case Study 1 is missing data.Moreover, the case study design, with its rigid scheduling of data collection on specific dates, presented challenges.This top-down approach may have placed an undue burden on the ATCs by requiring them to dedicate time for data collection on specific days, regardless of their work schedule.Therefore, a key takeaway from Case Study 1 is the importance of flexible data collection that reflects the actual distribution of shifts.Instead of a rigid schedule, allowing ATCs to choose the days for completing the tests could potentially lead to more reliable data.For example, data collection could stop once the necessary conditions are met (e.g., collecting data for three days across morning, afternoon, and night shifts, for a total of nine days, not necessarily consecutive).This approach would empower ATCs to manage the pace of data collection, potentially leading to more reliable and consistent measurements, even if they are not collected consecutively.
Building upon the findings and methodology of Case Study 1, Case Study 2 implemented a more flexible data collection approach, as described above, whilst maintaining the same data collection instruments.

Case Study 2
One high-density airport (more than 75,000 movements in 2022) (Group 3), one lowdensity airport (in the 2022 ~1000 movements) (Group 4), and one ACC (Group 5) were involved in the assessment of Case Study 2.
The control tower of the high-density airport had a three, eight-hour shift roster (morning, afternoon, and night) and 24/7 operating hours, whilst the control tower of the low-density airport (Group 4) had a two, eight-hour shift rostering schedule (morning 07:00-15:00, afternoon 15:00-23:00).No night shift was foreseen in this center, indeed the operating hours were 16/7.Finally, the ACC (Group 5) handling the en-route traffic had a rostering schedule defined by the usual three, eight-hour shifts (morning, afternoon, and night) and the operative hours were 24/7.

Procedures: Changes from the First Study
This work incorporated the valuable lessons learned from Case Study 1 regarding the research procedures, including study presentation, data collection, and the experimental plan.
Firstly, the study was introduced to participants through individual in-person meetings, where the purpose and methodology were explained in detail.When in-person meetings were not possible, remote meetings were conducted with the same objectives.
Secondly, due to the complexity of the existing rostering system, an ecological approach was adopted for data collection and the experimental plan.This approach empowered ATCs to take ownership of the testing process.They were individually responsible for following the protocol, and choosing suitable days within a set period to complete the self-administered tests.Data collection stopped once enough data were obtained from three occurrences of morning, afternoon, and night shifts (a total of nine days, not necessarily consecutive).This flexible approach allowed ATCs to manage the pace of data collection, ultimately resulting in a reduction in missing data.

Data Analysis and Results
Similar to Case Study 1, data analysis employed a repeated-measures ANOVA design with "Shift" (morning, afternoon, night) as the independent variable.All measures were treated as dependent variables.This analysis was initially conducted on the entire ATC group to maximize sample size.However, the following tables present the results disaggregated by subgroup (high-density vs. low-density airport and ACC), including the mean and standard deviation for each group.It is important to note that Group 4 (low-density airport) did not contribute data for night shifts due to their specific rostering schedule (see Section 2).
Sleep log analysis.A repeated-measures ANOVA revealed statistically significant effects for both sleep duration (F 2,22 = 32.877,p < 0.001) (Table 6) and sleep quality (F 2,22 = 17.598, p < 0.001) (Table 7) as reported by ATCOs in the sleep log journal.Duncan post-hoc testing indicated that these effects were primarily due to the difference between the night shift and the other two shifts Psychomotor Vigilance Task.A repeated-measures ANOVA on the difference in reaction times before and after the shift, through the use of the PVT test, revealed a significant effect of work shift (F 2,32 = 3.7291, p < 0.05) (Table 8).Duncan's post-hoc test indicated that this difference was primarily due to the night shift, which showed a larger increase in reaction times (delta) compared to the other shifts.NASA-Task Load Index.An analysis of workload using a repeated-measures ANOVA on the NASA-Task Load Index (NASA-TLX) scores revealed no significant differences between work shifts (F 2,32 = 0.31734, p > 0.05) (Table 9).Multidimensional State Boredom Scale.The analysis of boredom, using a repeatedmeasures ANOVA on the Multidimensional State Boredom Scale scores revealed no significant differences between work shifts (F 2,32 = 1.6551, p > 0.05) (Table 10).

Discussion Case Study 2
Field studies, especially those relying on self-reported data, often involve balancing completeness, methodological rigor, and ecological validity.Whilst ATCs generally complied with data collection requests, unforeseen events like illness or training absences impacted participation.Despite adjustments based on lessons learned from Case Study 1, some questionnaires remained incomplete and some tests were not performed, resulting in missing data within the final dataset.
Case Study 2 confirmed high levels of consistency in workload and boredom results, across all shifts (morning, afternoon, and night).Vigilance was not affected by the specific work shift when considering the morning or afternoon shifts.However, a slight decrease in vigilance was observed during the night shift.Similar to Case Study 1, sleep duration and quality reported by ATCs varied according to shift, with the night shift showing the most significant difference, compared to the other two.Incomplete sleep log data, however, limited the strength of this finding.
Group 4 (low-density airport) displayed lower boredom and mental workload compared to the other groups.Whilst this subgroup represented a small number of ATCs (8 out of 10 at the airport), it nonetheless reflects the operational staff currently employed there.
The primary difference between this group and the others lies in their two-shift rostering system, which excludes night shifts entirely.

General Discussions and Conclusions
Shift work is a defining feature of air traffic controllers' lives, potentially exposing them to increased fatigue and associated risks, like performance decline, which could impact on safety.These two case studies investigated fatigue in air traffic controllers through an agile and user-friendly "toolkit" measuring sleep duration and quality, mental workload, boredom, and vigilance.The observed consistency in workload and boredom results, across all shifts (morning, afternoon, and night), is encouraging.This suggests that the current rostering system may not significantly impact these aspects.
Whilst self-reported data like workload and boredom can be subjective, a slight decrease in vigilance during the night shift warrants further investigation.Building upon established methods, this study incorporates the PVT as an objective measure to complement the subjective data.The PVT is a validated test that measures reaction time to visual cues, serving as a sensitive indicator of sleepiness, fatigue, and sleep disruption.Implementing the PVT alongside subjective measures provides a multifaceted approach to understanding fatigue risk in ATC.Research, such as the study by Peukert and Meyer [13], has documented detrimental effects of night shifts on vigilance for various shift workers, including nurses [14].Their work, along with the PVT results from this study, highlight the need for further investigation into night shift scheduling and its impact on ATC vigilance.
The most critical limitation identified in these case studies is the inconsistent compliance of operators in completing questionnaires and performing the PVT test.Whilst a flexible approach was attempted, it appears a more structured strategy might be necessary.
To improve ongoing data collection, several strategies could be considered: • The introduction of specific training to educate ATCs on self-monitoring techniques for fatigue and the importance of data collection.• The exploration of the collection of data through each controller's personal device to ensure everyone receives tests at the beginning and end of their shift, reducing the chance of forgetting.

•
Rewarding the ATCOs' efforts to some extent (e.g., an extra day off) to increase compliance.This benefit could mitigate the somewhat superficial attitude of some participants.

•
Emphasizing the importance the ATCO's contribution to safety.Specifically highlighting the value of their data for improving fatigue monitoring systems and ultimately enhancing safety within ATC operations.
Despite these limitations, the information obtained from these two case studies provides valuable insights for developing minimally invasive fatigue monitoring systems, which can provide systematic information on the functional status of air traffic controllers.This information can be used to optimize rostering systems and mitigate fatigue risks, ultimately improving safety within air traffic control operations.

Table 1 .
Case Study 1: Average sleep duration by shift and group in minutes (standard deviations in brackets).

Table 2 .
Case Study 1: Average sleep quality by shift and group (standard deviations in brackets).The scale range was 1 (lowest) to 5 (highest).

Table 3 .
Case Study 1: Median PVT reaction time differences (pre-post working session) by shift and group (ranges in brackets).

Table 4 .
Case Study 1: Average NASA-TLX score by shift and site (standard deviations in brackets).The scale range was 0 (lowest) to 100 (highest).

Table 6 .
Case Study 2: Average sleep duration by shift and group in minutes (standard deviations in brackets).

Table 7 .
Case Study 2: Average sleep quality by shift and group (standard deviations in brackets).

Table 8 .
Case Study 2: Median PVT reaction time differences (pre-post working session) by shift and site (ranges in brackets).

Table 9 .
Case Study 2: Average NASA-TLX score by shift and site.Standard deviations in brackets.The scale range was 0 (lowest) to 100 (highest).

Table 10 .
Case Study 2: Average MSBS score by shift and site (standard deviations in brackets).The scale range was 1 (lowest) to 7 (highest).