Fatigue in Emergency Services Operations: Assessment of the Optimal Objective and Subjective Measures Using a Simulated Wildfire Deployment

Under controlled laboratory conditions, neurobehavioral assays such as the Psychomotor Vigilance Task (PVT) are sensitive to increasing levels of fatigue, and in general, tend to correlate with subjective ratings. However, laboratory studies specifically curtail physical activity, potentially limiting the applicability of such findings to field settings that involve physical work. In addition, laboratory studies typically involve healthy young male participants that are not always representative of a typical working population. In order to determine whether these findings extend to field-like conditions, we put 88 Australian volunteer firefighters through a multi-day firefighting simulation. Participants were required to perform real-world physical and cognitive tasks under conditions of elevated temperature and moderate sleep restriction. We aimed to examine changes in fatigue in an effort to determine the optimum objective and subjective measures. Objective and subjective tests were sensitive to fatigue outside laboratory conditions. The PVT was the most sensitive assay of objective fatigue, with the Samn-Perelli fatigue scale the most sensitive of the subjective measures. The Samn-Perilli fatigue scale correlated best with PVT performance, but explained a small amount of variance. Although the Samn-Perelli scale can be easily administered in the field, the wide range of individual variance limits its efficacy as a once-off assessment tool. Rather, fatigue measures should be applied as a component of a broader fatigue risk management system. Findings provide firefighting agencies, and other occupations involving physical work, guidance as to the most sensitive and specific measures for assessing fatigue in their personnel.


Introduction
Fatigue arises as a result of task related factors (e.g., task demand), inadequate sleep, circadian misalignment and extended time awake [1,2]. Management of fatigue-related risk in operations that use long hours and/or night work requires multiple levels of control [3][4][5]. While almost all fatigue mitigation strategies involve organizational controls such as setting maximum limits on work hours or minimum rest break requirements, others may incorporate controls at the level of the individual. In order to identify instances of increased risk, both objective fatigue monitoring and subjective fatigue reporting can be used [5]. These methods are based on the premise that in situations where sleep is inadequate, cognitive performance decreases and subjectively, people report they are tired or fatigued [5][6][7]. Assessment of fatigue levels to identify increased fatigue-related risk therefore requires measures that are sensitive to changes across time and under a range of conditions. Neurobehavioral assays such as the Psychomotor Vigilance Task have consistently shown to be sensitive to increasing levels of fatigue, demonstrated by numerous laboratory studies [8,9]. In addition, objective performance correlates reasonably well with subjective ratings of fatigue [10]. However, laboratory studies specifically curtail physical activity, potentially limiting the applicability of findings to field settings that involve physical work. Akerstedt and colleagues [11] reported that reductions in ratings of sleepiness are dependent on context, including "free activity". Thus, if physical activity masks subjective feelings of sleepiness or fatigue, self-reports may not be appropriate in occupational settings that involve physical work. Assessment of measures of fatigue in the presence of physical activity and personal protective equipment (e.g., helmets, gloves, jackets, boots) is an important current gap in the literature.
The current study examined changes in fatigue measured using both objective and subjective tests in a simulated wildfire deployment. The environmental and occupational stressors experienced during wildfire suppression operations are associated with elevated likelihood of fatigue-extended work shifts, night work, inadequate sleep opportunity and inadequate sleeping environments, with implications for health and safety of personnel [12,13]. The simulation involved real-world physical tasks that are performed routinely by Australian tanker-based firefighters, under conditions of elevated ambient temperature and moderate sleep restriction. The aims were to determine the most sensitive objective and subjective measures to fatiguing conditions, in a population of active firefighters. Findings will provide firefighting agencies, and other occupations involving physical work, guidance as to the most sensitive and specific measures for assessing fatigue in their personnel.

Participants
A total of 88 healthy active volunteer rural fire fighters (Males, N = 77, Females; N = 11) took part in the study. The participants had an overall mean (˘SD) age of 38.42˘14.42, and an average body mass index (BMI) of 27.8 kg/m²˘4.53, reflective of Australian bush firefighters [14]. Participants, who volunteered to take part in the study, were recruited from various state and territory rural fire agencies across Australia. Exclusion criteria included current or pre-existing injury or condition preventing performance of fire ground duties, diagnosed sleep disorder, and pregnancy. Participants were randomly assigned to either the normal sleep/cool (n = 29), normal sleep/hot (n = 20), sleep restricted/cool (n = 26), or sleep restricted/hot (n = 13) condition (see Section 2.2). Participants self-selected suitable dates for testing but were not aware of the condition ahead of testing. A total of 21 study trials were conducted across three locations. The number of participants in each study trial ranged from 2 to 5. A total of 98 participants began the trials, however 10 voluntarily withdrew at various points in the trial (these participants were excluded from the analysis).

Procedure
We developed a laboratory-based simulation of a fire-ground tour that incorporated three consecutive 12-h day shifts [15]. The study protocol spanned four days, including a study briefing, familiarisation of the tasks, and adaption to sleeping conditions (stretcher bed) on the evening prior to testing (arrival 6 pm), and a morning testing session on day four. Participants lived in a simulated environment for the duration of the study and remained inside except when using external amenities. Refer to Figure 1 for an outline of the protocol. All components of the simulation, including day and night temperatures, sleeping environment, as well as physical and cognitive test batteries were designed in conjunction with subject matter experts and using field data [15]. testing (arrival 6 pm), and a morning testing session on day four. Participants lived in a simulated environment for the duration of the study and remained inside except when using external amenities. Refer to Figure 1 for an outline of the protocol. All components of the simulation, including day and night temperatures, sleeping environment, as well as physical and cognitive test batteries were designed in conjunction with subject matter experts and using field data [15].  Participants completed 15, two-hour testing sessions over 4 days. Each session consisted of 55 min of physical work designed to mimic fire-ground tasks (see Section 2.3), followed by physiological testing lasting 20 min, and a cognitive battery lasting 20 min. Testing sessions were completed in the participants own firefighting personal protective clothing throughout the simulation. This included a two-piece jacket and trouser set made from Proban cotton fabric (Solvay, Brussels, Belgium), suspenders, boots, gloves, helmet, and goggles (amounting to~5 kg). On completion of each session, participants had a 15-30 min break before beginning the next session. Participants were not required to wear any protective equipment during meal breaks, physiological testing, between testing sessions, or during free time. Cognitive testing sessions were completed in full personal protective equipment with the exception of gloves.
All participants were given an 8 h sleep opportunity (10:00 pm-6:00 am) on the adaptation night (night 1), and the recovery night (night 4). In the normal sleep/cool and normal sleep/hot conditions, participants were also given 8 h sleep opportunity on nights 2 and 3. However, in the sleep restricted/hot and sleep restricted/cool conditions, participants were given 4 h sleep opportunity (2 am-6 am) on nights 2 and 3. Ambient temperature for participants in the normal sleep/cool and sleep restricted/cool conditions was kept to 18-20 Degrees Celsius (C) during the day and night, For the sleep restricted/hot and sleep restricted/cool conditions, the temperature was raised to 33-35 C during the day (between 6 am-6 pm) and lowered to 20-23 C during the night (6 pm-6 am). Temperature was monitored using HOBOware Pro Software, and three ZW-003 Temperature Data Nodes (Onset Computer Corporation, Bourne, MA, USA) positioned around the room. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Central Queensland University (H12/01-016) and Deakin University Human Research Ethics Committees (210-170).

Physical Tasks
Participants were required to complete a series of six physical tasks designed to simulate real-world fire-fighting [16] during each 55 min circuit. During a testing circuit, each firefighter was placed on a specific work-rest schedule, and rotated through a series of 5-min tasks. Tasks included dragging a weighted tyre, raking debris, different forms of walking with a weighted hose whilst avoiding obstacles, holding a weighted hose rake in a static position, and rolling up a 25 m fire hose to operational standard (see paper by Vincent et al., [17], that resulted from the outcomes of the same study for details relating to the physical tasks).

Cognitive Tasks
The cognitive test battery included tasks that require both lower order (e.g., reaction time) and higher order tasks (e.g., requiring executive function) to tap a range of cognitive functions used by personnel during incident response [15] and reported to be variously sensitive to fatigue in laboratory settings (see below).
Psychomotor Vigilance Task (PVT): The PVT assesses vigilance and reaction time and is sensitive to the effects of sleep loss, time-on-task and time-of-day [8,18,19]. While longer tests are commonly used in the laboratory, a validated 5-min version, run on a palm pilot, was used due to time constraints [18]. PVT variables used in this study include mean reciprocal reaction time calculated by 1/RT*1000 [6,8], and lapses: which classifies RT ě500 milliseconds as a lapse in attention.
Memory Task: Participants were given 20 s to read an authentic emergency message on a handheld pager device (SAGRN Samsung SFA-170 Pager, Samsung). After a 5-min gap (when they were completing the PVT), participants were given 2 min to recall three aspects of the message that differed each time (either the incident number, location of incident, type of incident, and map reference). The maximum score that could be achieved for each testing session was 3 (1 point given for each aspect recalled).
Stroop Colour Word Test: Participants were required to respond to two different stimuli over two different tests-a "matching colour word", and a "non-matching colour word" task [20,21]. Each component lasted 2 min. In the matching colour-word condition, participants were presented with a series of coloured words (e.g., red blue, yellow and green) displayed on a black computer screen. Participants were instructed to select the colour of the text/font by pressing the corresponding colour key on a colour coded computer keypad. In the non-matching colour word condition, the colour of the word and the word were different (e.g., the word red was written in green font). Participants were required to select the colour (font) of the word, and not the word itself. This second task produces a phenomenon known as the "stroop effect", where response time is delayed in the second part of the task. That is, there is a cognitive trade-off between speed and accuracy. Percentage correct and reaction time for correct responses were used as the dependent variables for this test.
Go-No-Go: The Go-No-Go is a task that requires the participant to inhibit their responses. In this task, the participant must differentiate between responding to "go" stimuli, interspersed with "no go" (no response required) stimuli which have a lower rate of presentation frequency (see [22]). The Go No-Go task presented four shape stimuli in the center of the screen for a period of 200 ms, followed by a black screen interval of 1300 ms, three of the shape stimuli were respond or "go stimuli" and one was no response or "no-go", stimuli in which they still had time to respond. Each test went on average for 4 mins 35 s, with 181 images shown and 63% (median of 61.88%-68.3%) of these consisting of "go" images. Participants were instructed by researchers to press the space bar as quickly as possible, with their dominant index finger, right or left, in response to the "go images", and then refrain from pressing the space bar when the no-go images appear, Percentage (%) correct: Stimuli responded to that were "go" stimuli; and stimuli left for 1.5 ms that were "no go" stimuli. Go-No-Go was measured as percentage correct, and reaction time.
Occupational Safety Performance Assessment Test (OSPAT): The OSPAT is an unpredictable tracking task that assesses hand-eye coordination, sustained attention and reaction time, and is widely used as a fitness for duty measure in a variety of industries (see [23]). For this task, participants were instructed to keep the computer cursor positioned on the target displayed on the screen as much as possible for the duration of the test. This task lasted 60 s. A global performance measure for each test is determined by summing the "error" distance between the cursor and target and the rate at which the subject adapted to the random changes. This measure indicated how "well" the subject performed on the task. Scores typically fall between 10 and 20 with higher scores indicating better performance. The algorithm used to determine the dependent measure derived from OSPAT cannot be disclosed as it is subject to commercial confidence.

Subjective Measures
Prior to cognitive testing, participants were required to assess their own level of fatigue using the Samn-Perelli Fatigue Scale, where 1 = fully alert, wide awake and 7 = completely exhausted, unable to function effectively [24]. Participants were also required to rate their motivation to perform the next cognitive battery, alertness at that time point, and predicted performance on the next cognitive battery using visual analogue scales [25]. That is, they responded to the question "How motivated are you to perform well?" by placing a single stroke at the applicable point along a 100 mm line, with the anchors "not motivated at all" at the left-hand end and "very motivated" at the right-hand end; "How alert do you feel?" with anchors "not alert at all" on the left and "very alert" on the right; and "How well do you think you will perform?" with anchors "very poorly" on the left, and "very well" on the right.

Objective Measures
To assess the objective measure over the 15 sessions of the study that best reproduced the true fatigue of the subjects, we made a fundamental assumption. There were several days of testing, and several sessions within each day (see Figure 1). We used the very last session prior to a recovery period (Circuit 13), and assumed that this was a period of maximum fatigue for most subjects. Conforming to this assumption, mean performance, as a composite of all standardised objective and subjective measures, was worst on this day. Baseline (Circuit B) and Recovery (Circuit 14) sessions had the highest mean performance, and were used as comparison sessions.
Our data-analytic approach, therefore, became to find the best measure that showed maximum responsiveness to demonstrating fatigue in this test condition, relative to the comparison conditions. We went about this analysis in two ways: In the first step, we analysed which measure showed maximum responsiveness to the change in performance from the first "baseline" measure where participants were not fatigued, to the final session where they experienced the most fatigue. Next, we repeated this analysis to determine which measures showed the maximum responsiveness from this test condition of maximum fatigue to the recovery period on the following day.
As a proportion, Go-No-Go Percentage Correct was arcsin transformed to yield an approximately normal response. All other measures of objective fatigue were approximately normal, except for some moderately extreme values that had the potential to bias the analyses. A lower-bound threshold for OSPAT was set at´8 (below which scores were set to´8), whilst upper-bound thresholds (above which, scores were set to the threshold), Stroop Reaction Time (1.5) and PVT Lapses (2). Thresholding affected less than 1% of OSPAT, Stroop Reaction Time, and PVT Lapses scores. These transformations had the effect of making the test of differences more conservative by stabilizing the variance of the measures.
In the study, subjects were assigned to conditions of normal sleep/cool, sleep restricted/cool, normal sleep/hot, or sleep restricted/hot. Although all subjects were subject to fatigue, our assumption is that the sleep restricted/hot conditions produced a particular stress on subjects, and the other conditions should have a variable (and expected lessor) impact on the experience of fatigue. Therefore, we also included in our analysis a between-subjects consideration of how condition assignment affected these fatigue change scores. All change-score residuals conformed to the Shapiro-Wilk test of normality, except for Stroop reaction time (W = 0.965, p = 0.016). Inspection of the residual distribution confirmed that it was symmetrical and mildly platykurtic. Therefore, this violation of assumptions was not treated as a concern. Table 1 shows the result of ANOVA models applied to each change measure, with condition (treated as a single 4 level factor) as the independent variable. Notably, we excluded an intercept term in the model to assess the significance of the deviation of each score from 0 (i.e., no change from baseline). Panel A in Table 1 shows the results for changes from the baseline measure to the test condition with maximum fatigue. The PVT Mean Reaction Time shows the strong performance amongst the measures by demonstrating a strong deviation from the null hypothesis of no change (r 2 " 0.41). Go-No-Go reaction time deviated slightly more from baseline (r 2 " 0.48). However, the PVT Mean Reaction Time shows stronger responsiveness than the Go-No-Go to the sleep restricted/hot manipulations compared to the other less stressed conditions. Table 1 Panel B shows the changes from the test condition with maximum fatigue to the recovery period the following day. The PVT Mean Reaction Time shows the best performance by demonstrating maximum change (r 2 " 0.40), as well as the greatest absolute change for the sleep restricted/hot condition of subjects that experienced the greatest stress. In this case, Go-No-Go reaction time was markedly less sensitive overall (r 2 " 0.18).

Subjective Measures
The analysis outlined above was repeated for the 4 subjective measures: Fatigue (Samn-Perelli Fatigue Scale), Alertness, Motivation, and Pre-Performance. The purpose of this analysis, as outlined in Table 2, was to find the subjective measure that showed the greatest responsiveness to our experimental session with the highest induced fatigue by comparing it with the baseline (see Table 2, Panel A) and the recovery session (see Table 2, Panel B). Moreover, as with the objective measures, we assumed that the best measure would show greatest responsiveness, in terms of a change score, for the sleep restricted/hot condition. This analysis showed that all measures of subjective fatigue performed reasonably well in detecting deviations from baseline to test. However, the Samn-Perelli Fatigue Scale measure yielded the greatest variance in both the change in baseline to test (r 2 " 0.60) and the change in test to the recovery period (r 2 " 0.70). Similarly, all measures of subjective fatigue showed the relative best performance in being sensitive to changes for those participants in the most stressed sleep restricted/hot condition. However, the Samn-Perelli Fatigue Scale subjective measure was again superior (β = 0.90) in terms of sensitivity to this condition relative to other conditions and other measures.

Subjective versus Objective Measures
A final analysis was conducted in which we compared the subjective measures in terms of correspondence with objective PVT Mean Reaction Time scores. The premise of the analysis was that an effective subjective measure should be more strongly associated with the preferred objective measure of fatigue. Linear mixed effects models were applied, with a random intercept for between-subjects variation, and fixed effects for the subjective measures. Data from the baseline, test, and recovery sessions were included in the analysis. Table 3 (Models 1-4) summarizes the results of a stepwise backward variable selection process, in which the worst performing indicator was progressively eliminated from an original set including Samn-Perelli Fatigue Scale, Alertness, Motivation and Pre-Performance. Models 4-7 allow comparison of each of the subjective indicators in isolation. Samn-Perelli Fatigue Scale was selected by the stepwise regression procedure as the best single predictor of PVT Mean Reaction Time scores. Samn-Perelli Fatigue Scale individually predicted 11.5% of the variance in PVT Mean Reaction Time scores. Although this effect size is not particularly high, it is nevertheless substantially higher than other subjective measures tested. Of the subjective measures, Samn-Perelli Fatigue Scale was most responsive to the experimental manipulations, and most highly correlated with the corresponding objective measure. We therefore concluded that Samn-Perelli Fatigue Scale was the most effective subjective measure of fatigue.

Discussion
Under conditions designed to increase fatigue, and in the presence of physical work tasks, the PVT was the most sensitive assay of fatigue in firefighters working a simulated deployment. The Samn-Perelli fatigue scale was the most sensitive of the subjective measures, and was the most highly correlated with PVT performance. The findings have both theoretical and practical implications.
The majority of laboratory studies examining changes in cognitive performance under conditions of sleep restriction minimize or eliminate physical activity. The current analysis however, demonstrates changes in PVT performance even in the presence of intermittent physical activity. This suggests that while some masking may be occurring [11], physical activity does not eliminate performance decrements associated with fatigue. In addition, the PVT was the most sensitive objective test of fatigue in this simulation, reinforcing the idea that simple and monotonous tasks such as vigilance are most susceptible to fatigue [26,27]. It is likely that the 10-min version of the PVT would show even greater sensitivity to fatigue. In addition, the PVT is a monotonous task requiring sustained attention.
The level of fatigue elicited by the protocol may not have been sufficient to impact higher order tasks in the same manner. Lastly, the Samn-Perelli fatigue scale mirrored PVT performance more accurately than the visual analogue scales measuring alertness, motivation and performance prediction.
In terms of practical implications, one of the challenges in reducing fatigue-related accidents in occupational settings is to identify instances of elevated risk [3,4]. This requires assessment tools that reliably signal elevated fatigue before performance is impaired to the point of error. Objective and subjective tests that are sensitive to fatigue in all conditions, including in the presence of physical activity and other external stressors, may be applied as part of a fatigue risk management system. The ease with which the Samn-Perelli scale can be administered in the field, compared to objective performance tests, suggests it as the preferable option in such a system. However, the design and analysis of the current study allowed us to control for individual differences, something that is not possible in occupational settings. An alternative method is to use subjective scales in a repeated manner to determine changes within an individual, and/or in combination with other measures of performance.
It is important to note that the current work involved only day shifts and a moderate level of sleep restriction. Under more extreme conditions, including night work, the relationship may not hold. It is also relevant to note that: (1) The fatigue scale, although the most reliable subjective measure, only accounted for a small percentage of the variance; and (2) Our analysis did include several other subjective assessments of performance, none of which correlated well with PVT performance. This suggests that the specific questions asked of individuals in the field are important and should focus on tangible states such as those listed in the Samn-Perelli scale. While other subjective scales could be applied in both experimental and operational settings, the choice of the Samn-Perelli scale in this study was made on the basis of the specific descriptors. Fatigue as a concept is increasingly familiar to personnel in occupations that involve extended shifts or night/shift work. While debate continues as to the use of terms that describe sleepiness, fatigue, tiredness and alertness, the use of a fatigue scale to assess fatigue with the goal of developing more robust fatigue risk management scales was the logical choice.
While the aim of this study was to identify the most sensitive measures of fatigue in the presence of physical work, the generalizability of the findings to firefighter performance is an obvious question. The performance of the physical tasks during the simulation was not impacted by sleep [17], but reaction time and vigilance are both important faculties for preserving safety on the fireground. Therefore, the finding that PVT performance declines under certain conditions has practical implications, and indicates that at a moderate level of fatigue, lower order elements of cognitive performance are impacted. A next step in the research would involve monitoring of firefighting performance in live-fire scenarios.

Conclusions
The study found that neurobehavioral performance and subjective rating were impacted as expected under conditions likely to elicit elevated levels of fatigue, in spite of the repeated bouts of physical activity. This finding is important as the majority of studies examining performance changes in response to fatiguing conditions are conducted in the laboratory in the absence of physical activity, whereas high-risk occupations do involve physical work (e.g., emergency services, mining, military etc.). Importantly, the Samn-Perelli fatigue scale most strongly correlated with changes in cognitive performance, suggesting it may be a useful self-assessment tool in certain occupational settings. These findings need to be tested further in live-fire scenarios and also during night shifts.