Severity-Mapped Vibrotactile Cues to Support Interruption Management with Weather Messaging in the General Aviation Cockpit

: Despite the increasing availability of technologies that provide access to aviation weather information in the cockpit, weather remains a prominent contributor to general aviation (GA) accidents. Pilots fail to detect the presence of new weather information, misinterpret it, or otherwise fail to act appropriately on it. When cognitive demands imposed by concurrent ﬂight tasks are high, the risks increase for each of these failure modes. Previous research shows how introducing vibrotactile cues can help ease or redistribute some of these demands, but there is untapped potential in exploring how vibratory cues can facilitate “interruption management”, i.e., ﬁtting the processing of available weather information into ﬂight task workﬂow. In the current study, GA pilots ﬂew a mountainous terrain scenario in a ﬂight training device while receiving, processing, and acting on various weather information messages that were displayed visually, in graphical and text formats, on an experimental weather display. Half of the participants additionally received vibrotactile cues via a connected smartwatch with patterns that conveyed the “severity” of the message, allowing pilots to make informed decisions about when to fully attend to and process the message. Results indicate that weather messages were acknowledged more often and faster when accompanied by the vibrotactile cues, but the time after acknowledgment to fully process the messages was not signiﬁcantly affected by vibrotactile cuing, nor was overall situation awareness. These ﬁndings illustrate that severity-encoded vibrotactile cues can support pilot awareness of updated weather as well as task management in processing weather messages while managing concurrent ﬂight demands.


Introduction
Despite the increasing availability and affordability of technologies that enable general aviation (GA) pilots to receive weather information in the cockpit [1,2], weather remains a primary factor contributing to many fatal GA accidents in the US and abroad [3]. Weather conditions can change unexpectedly, and most GA weather-related accidents occur when pilots inadvertently transition from visual flight rules (VFR) into instrument meteorological conditions (IMC) [3], which can result in spatial disorientation and increased risk of losing aircraft control [4]. There are a number of potential reasons why increased access to weather information fails to improve weather-related flight decision-making.
One potential reason for the continued problems in processing weather may be that these technologies introduce too much information, or at least too much in a particular format [2]. "Data overload" problems such as these are often the target of human factors research [5,6], describing when the cognitive resources have been exceeded by the information processing demands. Additionally, pilots may not be aware of the availability of new weather information, having failed to be alerted via the technologies; they could receive but misinterpret the information, or they could process the information fully yet not apply the information appropriately to decisions regarding the flight plan [1,3,7]. Whether the Table 1. Example weather messages received in the cockpit [11]. Given that current cockpit technologies support communicating weather messages in multiple media formats (e.g., visual and/or auditory), there is an opportunity to further support pilots' interruption management by spreading some aspects of the messaging to information processing channels that are more "available" for the pilot, given the demands of concurrent flight tasks [12], such as tactile. For example, pilots can listen to auditory representations of the messages when visual demands of flight tasks are high, and/or can display a message as visual text, which can often be faster to process and allows the message to be persistent (rather than the transient nature of the auditory communication). Persistent visual displays can be sampled as part of pilots' scan patterns, reducing the demand on working memory resources by providing an easy means of referencing specific details in the message.

Message Acronym Explanation
While in flight, pilots can be challenged at times to fit the tasks of monitoring routine weather reports (e.g., Aerodrome Routine Meteorological Reports (METARs)) as well as processing any unscheduled weather messages (e.g., Significant Meteorological Information (SIGMET)) into their workflow while concurrently performing other flight-related tasks. Under normal flight conditions, there is flexibility in when and how pilots attend to these incoming messages. For example, pilots can choose to listen to continuously looped recordings of routine reports by tuning to a particular radio channel and can call ATC to read out or clarify any unscheduled messages. This flexibility supports some degree of pilot "interruption management", an important concept to consider in aviation human factors [7,13,14] and a topic of research in a variety of work domains that require humans to divide attention and information-processing resources among multiple tasks that overlap in time [15][16][17][18].
Especially when adverse weather develops, a pilot's cognitive resources will be in high demand for other flight tasks (e.g., aviating, navigating, and communicating). Therefore, it is critical that weather messages be announced with salient cues that reliably capture attention even when visual and auditory resources are engaged in other flight tasks [19]. However, capturing attention does not always support effective interruption management, as it can result in an automatic reorientation of pilots' cognitive resources when reorientation may not be warranted. This can be detrimental to overall flight safety if other, more safety-critical ongoing tasks are interrupted. In this way, having a bright and colorful (thus highly salient) graphical display of weather information in a prominent cockpit location can inappropriately draw attention and even lead to attentional tunneling, in which pilots attend increasingly to the graphical representation at the cost of normal instrument scanning and outside-the-cockpit visual references, leading to a reduction in flight situational awareness [20]. Under the higher levels of cognitive workload that pilots can experience in adverse weather, scan patterns are often more rapid and irregular [21,22], which increases the chances for pilot error [23]. Thus, it is important to consider non-visual means of announcing the availability of new weather information so that pilots are aware of the message and can process it fully when flight-related workload and workflow allow.
Aviation, much like a number of visually, auditorily, and cognitively demanding domains, can benefit from offloading some of the more heavily demanded resources by redistributing messages so they can be processed by relatively available perceptual and cognitive channels [12,24]. The sense of touch represents an underutilized display channel in the cockpit, and according to human information processing theory [12], engaging this channel can improve multitasking performance when visual and/or auditory resources are in high demand for concurrent tasks [5,24]. Previous research has investigated how these so-called "tactile" displays might be introduced into the cockpit to improve spatial awareness [25,26], improve awareness of cockpit automation behavior [27], and to guide attention to visual cockpit displays [28]. An example of a haptic display integrated into the cockpit is the "stick shaker" that warns pilots of impending stall conditions. Tippey et al. [29] were the first to evaluate how vibratory cues presented via smartwatches could be used to improve GA pilots' reception of new weather messages, showing better detection and faster responses to these messages on a visual display when they were announced by a vibratory cue.
One key limitation of the Tippey et al. [29] study was that the vibratory cues were salient-thus effectively capturing attention-but the vibration patterns that mapped to the different types of weather messages were not easily distinguishable due, in part, to the hardware limitations of the smartwatch chosen for that study. An interesting finding was that the vibration patterns were less distinguishable when pilots experienced higher cognitive workload related to flight tasks [29]. This led to pilots reliably receiving weather messages, but because interruption management was not well supported (pilots were not able make informed decisions about whether and when to shift attention to the incoming Atmosphere 2021, 12, 341 4 of 18 message), concurrent task processing could be disrupted and overall flight safety may be impacted negatively [29].
To improve upon the Tippey et al. [29] study, efforts were made in the current study to meaningfully encode the "severity" of an incoming weather message-partial information that can be used to inform whether and when to reorient attention to process the full message-into dimensions of a vibrotactile cue that is highly distinguishable and intuitively interpretable. Previous research has shown how vibrotactile displays can support interruption management by conveying task-relevant information via vibrotactile patterns that require minimal cognitive engagement to interpret [5,16].
Following the design requirements of creating a set of vibratory signals that are maximally distinguishable and identifiable under varied workload, Roady [30] conducted a series of studies with vibratory cues which were varied according to signal intensity (low, medium, and high gain), frequency, rhythmicity (straight cadence vs. syncopation), and dynamism ("melodic" vibration patterns that changed constantly with time vs. those with relatively static levels in vibratory display dimensions). A very large set of generated patterns was evaluated in a controlled experiment that manipulated the imposed workload in an aviation-like task environment (NASA's Multi-Attribute Task Battery; [31]), ultimately resulting in the final selection of three patterns that maximized "perceptual distance" and were identifiable as low, moderate, or high severity with high accuracy [30].
The current study applied the vibratory patterns designed in Roady [30] to a considerably more complex flight environment, testing the effectiveness of cues presented via a smartwatch for supporting pilot interruption management in the reception and processing of weather messages. The evaluation context was a flight scenario that imposed a range of workloads from very low to very high as weather and visibility degraded over the course of the flight. Pilots that received the vibratory cues paired with incoming weather messages had the opportunity to infer the severity of the message by the encoded vibratory pattern and to use that information in their decision making about whether and when to reorient attention and information processing resources to process the full message. It was expected that the pilots who received these vibratory cues would be more likely to receive and faster to acknowledge the arrival of the message, but that the time to process the full message would depend on the scenario-imposed task load relative to the interpreted severity. For example, a message that is announced with a vibratory cue conveying "moderate" severity may be processed immediately (or faster) when other flight-related demands are low, but may appropriately show longer response times when flight task demands are high.
The findings of this study provide further evidence of the benefits of integrating vibrotactile cues to support multitasking performance and safety in visually and/or auditorily demanding work contexts. Aviation is a domain that has historically welcomed haptic and tactile displays (the "stick shaker" stall warning is a great example), and the introduction of vibrotactile cues that can be reliably interpreted and differentiated can lead to further improvements in flight safety by better supporting flight management when encountering adverse weather.

Experiment
Thirty-six general aviation pilots participated in the study (TAMU IRB approval # 2014-0154D), which took place in a flight training device at the Federal Aviation Administration (FAA) William J. Hughes Technical Center (WJHTC) in Atlantic City, NJ. Participants were at least 18 years old, held an active Private Pilot License (PPL), and had flown in the previous 6 months. The reported mean age for 32 of the participants (4 participants' biographical data were missing) was 54.2 years old (min = 19 years, max = 80 years, standard deviation = 16.9 years). The pilots had varying levels of flight experience, with a mean of 1102.56 flight hours (min = 100, max = 5500, median = 600, standard deviation = 1253.68 h). They also had a mean of 87.7 instrument flight hours (min = 0, max = 500, median = 20, standard deviation = 136.6 h). This research investigated the effectiveness of severity-mapped vibratory cues delivered via a smartwatch to improve pilots' acknowledgement and response to weather messages in a simulated flight scenario. Additionally, situation awareness was assessed via periodic question probes to determine the extent to which cues may have distracted or disrupted concurrent flight-related activities.

Experimental Variables
The primary independent variable investigated was whether or not vibratory cues accompanied the incoming weather messages. This variable was handled as a betweensubjects factor, with participants divided into "Vibration" (which received coded vibratory cues with each weather message (WM) arrival-see Section 2.3) and "No Vibration" groups, each with 18 participants.
In this study, pilots' performance data were collected with regard to reception of weather messages, decision making, and situation awareness. Dependent measures were associated with the presentation of coded weather messages (WMs) (see Section 2.3 for more information on these messages) and Situation Awareness Probes (SAPs).
The variable Acknowledgment Rate (AR) represented the proportion of presented cues that participants "acknowledged", which was evidenced either by verbal response (e.g., "I see that I have a new weather message") or by another observed action that followed directly from that message (such as pressing a button to read the message text or calling air traffic control for clarification). The AR variable was calculated separately for WMs and SAPs, coded as Weather Message Acknowledgment Rate (WM.AR) and Situation Awareness Probe Acknowledgment Rate (SAP.AR), respectively. In the small number of cases in which flight-related decisions led to the scenarios ending early (e.g., calling air traffic control (ATC) and requesting changes to the flight plan, such as turning around or diverting to another destination), some late-scenario WMs and SAPs were never issued to the pilots and thus not considered in the AR calculations.
Acknowledgment Times (ATs) were also collected for both WMs and SAPs, coded as Weather Message Acknowledgement Time (WM.AT) and Situation Awareness Probe Acknowledgment Time (SAP.AT), respectively. WM.AT was measured as the time between the arrival of the message (whether cued or not) and the first verbal or physical indication that showed the pilot's awareness of the message. SAP.AT was measured as the time between the complete delivery of an SAP query (e.g., the final utterance in the request from ATC that represented the probe) until the first verbal or physical indication of the pilot's acknowledgement of that SAP query. The AT measure is indicative of pilots' attentional state and the salience and informativeness of the visual and vibratory cues associated with WMs, as well as the auditory (radio-based) cues associated with SAPs.
Response Times (RTs) for both WMs and SAPs, coded as Weather Message Response Time (WM.RT) and Situation Awareness Probe Response Time (SAP.RT), respectively, were measured as the time between the point of acknowledgement until the pilot's full response had been delivered. The point of "full response" was determined via consensus coding by multiple experimenters and represented when pilots had verbally (via think-aloud protocol; see Section 2.4) or demonstrably (through aircraft interaction) responded to the message. This measure is indicative of pilots' abilities in interruption management, balancing the task load between activities for maintaining safe flight and dedicating resources to processing WMs and SAP queries.
In some cases, the pilots never acknowledged one or more weather messages that were presented to them, as is indicated in the WM.AR measure. As a result, these in-stances were treated as missing data points and did not factor into the mean calculations of WM.AT and WM.RT. The impact that the WM.AR has on the mean WM.AT and mean WM.RT should be kept in mind when interpreting these latter measures.

Flight Environment and Scenario
An FAA WJHTC Flight Training Device (FTD) (see Figure 1a) was configured to perform similarly to a Mooney aircraft, having out-the-window visuals generated using Active Sky Next [32] for PREPAR3D [33]. The simulated scenario was a flight from Santa Fe, New Mexico (KSAF), to Albuquerque, New Mexico (KABQ), developed based on historical National Transportation Safety Board (NTSB) reports of weather-related accidents. The scenario involved mountainous terrain and weather patterns (mountain turbulence and convective activity) that progressively worsened as the pilots approached Albuquerque. Members of the experimental team who were Certified Flight Instructors (CFIs) role-played as ATC (see Figure 1b) and followed a script which precisely timed some communications (such as SAPs) but allowed for improvised responses to any queries from the pilots. of WM.AT and WM.RT. The impact that the WM.AR has on the mean WM.AT and mean WM.RT should be kept in mind when interpreting these latter measures.

Flight Environment and Scenario
An FAA WJHTC Flight Training Device (FTD) (see Figure 1a) was configured to perform similarly to a Mooney aircraft, having out-the-window visuals generated using Active Sky Next [32] for PREPAR3D [33]. The simulated scenario was a flight from Santa Fe, New Mexico (KSAF), to Albuquerque, New Mexico (KABQ), developed based on historical National Transportation Safety Board (NTSB) reports of weather-related accidents. The scenario involved mountainous terrain and weather patterns (mountain turbulence and convective activity) that progressively worsened as the pilots approached Albuquerque. Members of the experimental team who were Certified Flight Instructors (CFIs) roleplayed as ATC (see Figure 1b) and followed a script which precisely timed some communications (such as SAPs) but allowed for improvised responses to any queries from the pilots.   Table 2 summarizes the key scenario events. The weather conditions and visibility progressively worsened during the flight, which was cleared for take-off from KSAF under Visual Flight Rules (VFR) with 12 statute miles of visibility. As the aircraft progressed south, the visibility gradually worsened and Instrument Meteorological Conditions (IMC) were realized shortly after making a turn westward for the approach to KABQ. This final turn (into IMC conditions) also crossed over the Sandia mountain range, which introduced rising terrain and mountain obscuration that made it extremely challenging to safely navigate, while also making it virtually impossible to turn the aircraft around in order to escape the hazardous flight environment.   Table 2 summarizes the key scenario events. The weather conditions and visibility progressively worsened during the flight, which was cleared for take-off from KSAF under Visual Flight Rules (VFR) with 12 statute miles of visibility. As the aircraft progressed south, the visibility gradually worsened and Instrument Meteorological Conditions (IMC) were realized shortly after making a turn westward for the approach to KABQ. This final turn (into IMC conditions) also crossed over the Sandia mountain range, which introduced rising terrain and mountain obscuration that made it extremely challenging to safely navigate, while also making it virtually impossible to turn the aircraft around in order to escape the hazardous flight environment.
Participants did not have any prior experience with this particular scenario, but they were adequately trained for familiarity with the flight environment and displays with a training scenario set in the eastern United States.
During the flight, pilots received four scripted weather messages (WMs) which varied in severity at key points in the scenario, which imposed different levels of workload on the pilots (see Table 2). For example, WM1 was delivered at a point with good visibility, relatively little weather development, and with autopilot engaged. WM2, WM3, and WM4 were delivered in increasingly higher-workload contexts, with additional workload imposed by autopilot failure, degrading weather and visibility, increased frequency of ATC communications, and the addition of turbulence. Pilots were told that their response to these messages-including their time to acknowledge, fully process, and act on the messages-would be measures of interest in this study, but that they should keep flight safety as their top priority.  Weather-yellow storm cell appears over KABQ, grows in severity  To assess whether the additional weather messages may positively or negatively impact overall flight situation awareness, three SAPs were distributed to occur in low-, moderate-, and high-workload contexts of the flight. These probes inquired about the pilot's flight plans and intentions, as well as weather, altitude, and position information. Following the Situation Present Assessment Method (SPAM) [34], these probes were relevant to and embedded in the task itself, so that both the accuracy and the timing of the response provide insight into the pilot's situational awareness at that point.

Weather Message Displays
Inside the cockpit, terrain and weather information was available on a tablet computer with a proprietary experimental interface developed by AeroTech Research (ATR; [35]) to look and function similarly to existing commercial applications, such as Foreflight [36]. This display included an active graphical map with "layers" of information that could be toggled to be displayed or hidden using touchscreen soft buttons in the menu bar at the top of the map (see Figure 3). The map also supported functionality to zoom in and out, with concentric lines indicating the map scale and the aircraft proximity to various scenario areas of interest.
relatively little weather development, and with autopilot engaged. WM2, WM3, and WM4 were delivered in increasingly higher-workload contexts, with additional workload imposed by autopilot failure, degrading weather and visibility, increased frequency of ATC communications, and the addition of turbulence. Pilots were told that their response to these messages-including their time to acknowledge, fully process, and act on the messages-would be measures of interest in this study, but that they should keep flight safety as their top priority.
To assess whether the additional weather messages may positively or negatively impact overall flight situation awareness, three SAPs were distributed to occur in low-, moderate-, and high-workload contexts of the flight. These probes inquired about the pilot's flight plans and intentions, as well as weather, altitude, and position information. Following the Situation Present Assessment Method (SPAM) [34], these probes were relevant to and embedded in the task itself, so that both the accuracy and the timing of the response provide insight into the pilot's situational awareness at that point.

Weather Message Displays
Inside the cockpit, terrain and weather information was available on a tablet computer with a proprietary experimental interface developed by AeroTech Research (ATR; [35]) to look and function similarly to existing commercial applications, such as Foreflight [36]. This display included an active graphical map with "layers'' of information that could be toggled to be displayed or hidden using touchscreen soft buttons in the menu bar at the top of the map (see Figure 3). The map also supported functionality to zoom in and out, with concentric lines indicating the map scale and the aircraft proximity to various scenario areas of interest.  The tablet display was the primary means by which new weather messages were delivered to the pilots in-flight. Incoming weather messages included those listed in Table 1 but also other communications that may be relevant to weather-related decisions, such as pilot reports (PIREPs). The arrival of a new message was announced by a color change in the associated soft button on the menu bar (see Figure 3b, highlighting an incoming PIREP). The full text of each incoming message (either or both of encoded and verbose text formats) was then accessed by pressing the associated button. This opened a pop-up text overlay on top of (and obscuring most of) the map. The message text could be toggled to be hidden or brought back into focus as often as pilots desired for the remainder of the scenario.
Each incoming weather message was characterized with "summary" information that was intended to convey the severity of the weather developments or, alternatively, the severity with which pilots should process the full message (by accessing and reading the displayed message text). The summary statement was modeled after those which were found to be beneficial for supporting pilot workload and task management in previous weather technology interaction research [1,9]. These statements typically included the type of weather message, and a severity reference ("low", "moderate", or "severe"), which pilots can take into account when deciding whether and when to devote attentional resources to access and read the full message while concurrently maintaining safe flight parameters. In addition to the highlighting of the soft buttons on the tablet display, the summary messages were displayed visually on a Samsung Gear S3 smartwatch, which all participants wore on their left wrist (see Figure 4).
weather technology interaction research [1,9]. These statements typically included the type of weather message, and a severity reference ("low", "moderate", or "severe"), which pilots can take into account when deciding whether and when to devote attentional resources to access and read the full message while concurrently maintaining safe flight parameters. In addition to the highlighting of the soft buttons on the tablet display, the summary messages were displayed visually on a Samsung Gear S3 smartwatch, which all participants wore on their left wrist (see Figure 4). For the Vibration (V) participant group, vibratory cues from the smartwatch were also presented to coincide with the arrival of the WM and the summary statement (the NV participant group had all other display aspects except vibratory cues). The vibratory patterns persisted for 1 s in duration and were encoded to communicate the severity of the WM ("low", "moderate", or "severe") through properties of syncopation, intensity, and duration that were found to be maximally distinguishable and intuitively identifiable under varied workload conditions [30]. For the Vibration (V) participant group, vibratory cues from the smartwatch were also presented to coincide with the arrival of the WM and the summary statement (the NV participant group had all other display aspects except vibratory cues). The vibratory patterns persisted for 1 s in duration and were encoded to communicate the severity of the WM ("low", "moderate", or "severe") through properties of syncopation, intensity, and duration that were found to be maximally distinguishable and intuitively identifiable under varied workload conditions [30].

Procedure
After reviewing and signing the consent form, participants completed a demographics questionnaire based on flight qualifications and experience with mobile and wearable technologies and were given a formal flight briefing by CFIs from the experimental team. The briefing included a modified version of the aeronautical map illustrated in Figure 2 as well as current visibility and weather conditions (which were supportive of flying under VFR). Participants were then trained in the FTD in a 10-min simplified training scenario set in the eastern United States, which allowed them to practice manually controlling the aircraft, interacting over the radio with ATC, and accessing route and weather information via the tablet display. Participants in the "Vibration" group were also given several example presentations which were repeated until participants demonstrated an ability to determine the severity of incoming messages by correctly interpreting the vibratory cue pattern. Participants demonstrated their understanding and ability to perform the tasks to experimenters prior to the completion of the FTD training session.
Participants were also trained to provide think-aloud verbal protocol data and practiced this during the training session while piloting the aircraft. This technique provides insight into the decision-making thought process of the pilots, as has been used in previous aviation studies [9,[37][38][39]. The think-aloud protocol provided the experimenters with insight into when pilots noticed weather message cues and how they used the summary information to determine when to access the full message while concurrently managing other flight demands.
In all cases, participants were instructed to interact in the FTD and make flight-related decisions as if they were in an actual aircraft in a real flight context. In this sense, the pilots' primary task was always to safely fly the aircraft. Participants were told that performing the think-aloud protocol as well as attending to scenario events such communicating with ATC and receiving and reviewing weather messages were all secondary to flight safety and should only be performed when safety was minimally compromised.
After the training session, participants completed the experimental flight from KSAF to KABQ. The flight scenario lasted about 20-25 min and ended when one of the following conditions was met: (a) pilots requested an alternative flight plan from ATC; (b) via thinkaloud protocol, pilots expressed their clear intent to change the flight plan; (c) pilots flew into the IMC conditions and attempted to land at KABQ; (d) the pilots crashed the aircraft.

Results
The following analyses were performed using R Version 4.0.3. For reporting purposes, Group labels of "No Vibration" and "Vibration" were simplified to "NV" and "V" respectively. Table 3 summarizes the mean WM.AR across the two groups, "NV" and "V", and across the four weather messages within each group. WM.AR data from both groups ("NV" and "V") violated the assumption of normality as demonstrated by the Shapiro-Wilk test. To account for the violated normality assumption, Welch's t-test [40] was used to compare the two equal-sized groups in terms of WM.AR. The mean WM.AR for participants in the "V" group (M = 0.92, SD = 0.15) was significantly higher-t(28.30) = −2.08, p = 0.046-than for participants in the "NV" group (M = 0.78, SD = 0.24). This represented a medium-sized effect, r = 0.36.

Acknowledgement Rate
As shown in Table 3, the WM.AR was fairly consistent for both groups across WMs 1-3 but was much lower for WM4, especially for the "NV" group. Based on experiment notes, for several pilots, WM4 came in while they were talking on the radio (higher workload). In addition, for several pilots, the scenario ended shortly (~1.5 min) after WM4 came in, therefore not giving them much time to acknowledge it.
It is important to keep in mind the WM.ARs when considering later measures. For example, as will be shown, pilots in both groups had a quicker response time to WM4 than WM1, but this only includes the pilots who actually acknowledged the message, a number which was much lower than for WM1. Table 4 and Figure 5 illustrate the mean WM.AT to the four WMs for each participant Group. Since some participants did not acknowledge all the WMs, there were several missing data points, which were excluded from the time-based analysis. Note that WM1 was considered "low" severity, WM2 and WM3 "moderate" severity, and WM4 "high" severity, and flight-related workload generally increased throughout the scenario (refer to Table 2).   Figure 5 illustrate the mean WM.AT to the four WMs for each participant Group. Since some participants did not acknowledge all the WMs, there were several missing data points, which were excluded from the time-based analysis. Note that WM1 was considered "low" severity, WM2 and WM3 "moderate" severity, and WM4 "high" severity, and flight-related workload generally increased throughout the scenario (refer to Table 2).   However, not all ANOVA assumptions were met. Fourteen data points were identified as outliers but were determined to be due to natural variation rather than data entry or measurement errors and were therefore kept. The Shapiro-Wilk test and Q-Q plots indicated a deviation from normality in the data. Levene's test indicated a difference in variance across the between-subjects variable, Group ("NV" vs. "V"). Box's M-test indicated equal covariances. Mauchly's test of sphericity indicated that the variances of group differences are not equal.

Acknowledgement Time
Given the violated normality assumption, a robust mixed ANOVA [41], which makes use of trimmed means, was also performed on the data, which found Group, Again, WM.AT was calculated only for the pilots who actually acknowledged the messages, so while the WM.AT for WM4 was lower, it also had a much lower WM.AR, particularly for group "NV". Table 5 and Figure 6 show the mean WM.RT to the four WMs, divided by Group. Since some participants did not respond to all the WMs, there are some missing data points. Again, WM.AT was calculated only for the pilots who actually acknowledged the messages, so while the WM.AT for WM4 was lower, it also had a much lower WM.AR, particularly for group "NV". Table 5 and Figure 6 show the mean WM.RT to the four WMs, divided by Group. Since some participants did not respond to all the WMs, there are some missing data points.  0.08, were significant. However, not all ANOVA assumptions were met. Eleven data points were statistically identified as outliers but were determined to be legitimate and , p = 0.08, were significant. However, not all ANOVA assumptions were met. Eleven data points were statistically identified as outliers but were determined to be legitimate and therefore kept. The Shapiro-Wilk test and Q-Q plots indicated a deviation from normality in the data. Levene's test indicated homogeneity of variances, while Box's M-test could not be computed due to the large number of missing values for Message 4. Mauchly's test showed the assumption of sphericity to be met.

Response Time
Given the violated normality assumption, a robust mixed ANOVA was also performed [41], which found The robust ANOVA post-hoc comparison method [41] does not appear to be able to handle missing values, which our data contain. Therefore, in order to compare response time across the different message numbers, we made use of the non-robust post-hoc method. This post-hoc test showed WM.RT for WM1 to be significantly higher than for both WM3 (p < 0.001) and WM4 (p = 0.01). There were no other significant differences among the message responses.
Again, it should be noted that WM.RT was calculated only for the pilots who actually acknowledged the messages, so while the WM.RT for WM4 was lower, it also had a much lower WM.AR, particularly for group "NV".

Situation Awareness Probes
It is important to clarify that situation awareness probes (SAPs) were never associated with vibratory cues, and therefore, SAP presentations did not differ in any way between groups. However, Group was analyzed as the primary factor in the SAP response analysis to determine if the presence of (and, potentially, reliance on) vibratory cuing of WMs also impacted pilots' situation awareness and, therefore, response to SAPs.

Acknowledgement Rate
Three participants were not presented with the third SAP due to the scenario being terminated early (for example, after pilots called ATC to request a deviation, the scenario was stopped by experimenters). Besides these occurrences, 100% of SAPs were acknowledged; therefore, it was not deemed necessary or appropriate to perform a statistical comparison of acknowledgment rates between groups, and instead, time-based measures were more meaningful for assessing situation awareness. Table 6 and Figure 7 show the mean SAP.AT for each of the three SAPs, divided by Group. Due to the fact that some participants did not receive the third SAP, three of the (36 × 3) = 108 possible data points were missing and, thus, excluded from the time-based analysis.  A two-way mixed ANOVA was performed, which found neither Group, F(1, 31) = 0.79, p = 0.38, nor SAP Number, F(2, 62) = 3.06, p = 0.054, nor the interaction of Group and SAP Number, F(2, 62) = 0.51, p = 0.60, to have a significant effect on SAP.AT. Again, some of the ANOVA assumptions were violated. Of the 105 data points, six were identified as outliers but were determined to be due to natural variation rather than data entry or measurement errors and were therefore included in the analysis. The Shapiro-Wilk test for each combination of factor levels showed several p-values less than 0.05, indicating a deviation from normality in the data. Q-Q plots showed some points falling outside of the reference lines, which, again, indicated non-normality. There was homogeneity of variances, as assessed by Levene's test (p > 0.05). Box's M-test for homogeneity of covariances was not statistically significant (p > 0.001), indicating equal covariances. Mauchly's test A two-way mixed ANOVA was performed, which found neither Group, F(1, 31) = 0.79, p = 0.38, nor SAP Number, F(2, 62) = 3.06, p = 0.054, nor the interaction of Group and SAP Number, F(2, 62) = 0.51, p = 0.60, to have a significant effect on SAP.AT. Again, some of the ANOVA assumptions were violated. Of the 105 data points, six were identified as outliers but were determined to be due to natural variation rather than data entry or measurement errors and were therefore included in the analysis. The Shapiro-Wilk test for each combination of factor levels showed several p-values less than 0.05, indicating a deviation from normality in the data. Q-Q plots showed some points falling outside of the reference lines, which, again, indicated non-normality. There was homogeneity of variances, as assessed by Levene's test (p > 0.05). Box's M-test for homogeneity of covariances was not statistically significant (p > 0.001), indicating equal covariances. Mauchly's test showed the assumption of sphericity to be met.

Acknowledgement Time
Given the violated normality assumption, a robust mixed ANOVA [41] was also performed, which found neither Group, F(1, 29.93) = 1.49, p = 0.23, nor SAP Number, F(2, 21.69) = 1.93, p = 0.17, nor the interaction of Group and SAP Number, F(2, 21.60) = 0.87, p = 0.44, to have a significant effect on SAP.AT. Table 7 and Figure 8 show the mean SAP.RT for each of the three SAPs, divided by Group. Due to the fact that three participants did not receive the third SAP, three of the (36 × 3) = 108 possible data points were missing and, thus, excluded.  Again, however, not all ANOVA assumptions were met. Ten data points, while being statistical outliers, were deemed legitimate and therefore kept. The Shapiro-Wilk test and Q-Q plots both indicated non-normality. Levene's test showed homogeneity of variances, while Box's M-test indicated equal covariances. Mauchly's test of sphericity indicated that the variances of group differences were not equal.

Response Time
Given the violated normality assumption, a robust mixed ANOVA [41] was also performed, which found neither Group, F (1, 32.

Discussion
This study builds on previous works that used vibratory notifications to support pilot RT. Again, however, not all ANOVA assumptions were met. Ten data points, while being statistical outliers, were deemed legitimate and therefore kept. The Shapiro-Wilk test and Q-Q plots both indicated non-normality. Levene's test showed homogeneity of variances, while Box's M-test indicated equal covariances. Mauchly's test of sphericity indicated that the variances of group differences were not equal.

Discussion
This study builds on previous works that used vibratory notifications to support pilot situation awareness and performance by effectively guiding attention in the cockpit [13,27,29]. The current study investigated the extent to which pilots' awareness of weather dynamics and management of concurrent flight tasks could be supported when the availability of new weather information is announced via vibratory cues. Furthermore, as a follow-up to Tippey et al. [29], the current study took special steps to design vibrotactile cues that featured patterns which could reliably be distinguished and intuitively associated with the concept of "severity" [30]. Thirty-six general aviation pilots completed the study in a flight training device. The experimental scenario gradually added workload by having an autopilot failure, decreasing visibility until reaching IMC conditions, turbulence, rising terrain, and increasing proximity to weather cells, as listed in Table 2. Weather messages were delivered to the participants at specific points in time, and half of the participants also received a severity-coded vibratory alert.
The results indicate that the participant group receiving the severity-mapped vibrations through a smartwatch showed significantly higher likelihood of acknowledging the arrival of weather messages compared to the group that did not receive the vibratory cues. Particularly for WM4, which represents the highest flight-related workload context, the highly salient "high severity" vibratory cue led to a much higher reception of the message as compared to the No Vibration group. Furthermore, those in the Vibration group acknowledged the messages sooner than those in the No Vibration group, indicating that attention is effectively drawn when there is new information worth processing. After this acknowledgment, both groups took similar amounts of time to fully respond to the messages, indicating that there were not unforeseen adverse effects of display configurations (i.e., including vibratory cues or not) on the ability to visually process and act on the full message.
There was no statistical difference between the Vibration and No Vibration groups in terms of acknowledging and responding to SAPs. While the probes used in this study were quite simple and all of them were correctly responded to, the lack of impact on the timing of responses shows that SA was relatively consistent between these groups [34]. The inclusion of situation awareness probes in this study was not the primary measure of interest, and for future work, it is recommended that probes include queries for more complex responses as well as to evaluate pilots' awareness of critical flight variables over longer timescales (i.e., asking about current, trending, and predicted near-future levels of various safety-critical types of flight data).
As with any research involving the complexities of aviation, there are a number of limitations in interpreting the results of this study and scaling its findings to practice. First, the study was conducted in a flight training device with a controlled and scripted scenario, and while efforts were made to add realism to the experiment, the artificiality of this context likely led pilots to make decisions under considerably different stress and time pressure than those imposed by an aircraft in real flight during adverse weather. Additionally, the lack of key environmental stimuli such as motion cues (the FTD did not include a motion base) means that some aspects of pilot workload (such as physical reaction to the forces from aerodynamic maneuvering and the cognitive load involved in processing information in a moving frame) were not well represented. Finally, a key factor in the FTD-based study was the absence of substantial vibratory "noise" that originates from engine operation as well as turbulence and other external sources in real flight. This vibratory noise propagates through the airframe to the pilot and shows potential to mask encoded vibratory signals presented to the wrist [42,43], thus suggesting that the smartwatch-based vibrotactile cues may be less effective in real flight.
To investigate the concerns of vibrotactile masking in an aircraft, the experimenters conducted in-flight evaluations of the perceptibility and identifiability of vibratory signals [44]. While wearing a smartwatch on each surface (palmar and dorsal sides) of each wrist, assuming several common postures (e.g., resting the hand/wrist/arm on the flight yoke, seat armrest, and airframe itself) in several aircrafts of varied engine numbers and sizes (from 150 HP single-engine to 600 HP dual-engine) and during several phases of flight, it was found that signals with maximum intensity, higher dynamism, and moderate or high syncopation best supported perception and identifiability of the signals [44]. The characteristics of intensity, dynamism, and syncopation that were most effective in real flight also adequately describe the vibratory cues used in the current FTD study.

Conclusions
While general aviation pilots should check weather conditions before take-off, new and affordable portable devices are increasingly available to provide weather messages in the cockpit while mid-flight. However, more information availability does not translate to information received, as pilots can miss or misinterpret the weather messages. In each case, awareness of weather dynamics can be improved when the arrival of weather information is announced with a salient cue, such as a vibrotactile cue, that also provides sufficient partial information to support interruption management. In other words, pilots need to be aware of the presence of new weather information and should be able to make informed decisions about when it is appropriate to reallocate cognitive resources to process the message given concurrent flight demands. This study investigated the effectiveness of severity-encoded vibratory cues used to announce the availability of new weather messages, as well as the relative urgency in attending to them. The results indicate a higher acknowledgement rate and a shorter time to acknowledge the weather messages when accompanied with the vibrotactile cues. The acknowledgement rate for both groups was lower for the last message, but much more so for the non-vibration group. These lower acknowledgement rates should be kept in mind when interpreting the time-based measures. It appears that response time to the messages was affected by the encoded severity, as pilots completed responses faster to more severe messages. Pilots that did not have the benefit of the vibrotactile cues responded faster to the later messages than they did to the earlier ones, while those with the cues responded at a relatively consistent pace, which may reflect how overall workflow can be better managed in processing the full messages when pilots are aware of the messages earlier and can make informed decisions to switch their attention to the message when the workflow supports the switch (rather than becoming aware of the message later and having less information about the nature of the message, thus often leading to immediate transitions that could unnecessarily disrupt ongoing tasks). Finally, the lack of significant differences in situation awareness measures suggests that there are not obvious adverse consequences to the introduction of vibratory display functionality with regard to flight-relevant awareness and performance. As a topic of future study, more in-depth flight performance and safety metrics should be consulted to gain further understanding of the interaction among concurrent flight tasks with weather information-seeking and reasoning tasks so that the interruption management potential of vibratory cues can be better understood and applied more broadly in weather technologies. By including interruption management as a design goal, weather technologies can be made more effective, keeping pilots more informed of weather dynamics while minimally impacting performance on other flight-related tasks, therefore reducing the risk of GA accidents when adverse weather occurs.