Engagement in Non-Driving Related Tasks as a Non-Intrusive Measure for Mode Awareness: A Simulator Study

Research on the role of non-driving related tasks (NDRT) in the area of automated driving is indispensable. At the same time, the construct mode awareness has received considerable interest in regard to human–machine interface (HMI) evaluation. Based on the expectation that HMI design and practice with different levels of driving automation influence NDRT engagement, a driving simulator study was conducted. In a 2 × 5 (automation level x block) design, N = 49 participants completed several transitions of control. They were told that they could engage in an NDRT if they felt safe and comfortable to do so. The NDRT was the Surrogate Reference Task (SuRT) as a representative of a wide range of visual–manual NDRTs. Engagement (i.e., number of inputs on the NDRT interface) was assessed at the onset of a respective episode of automated driving (i.e., after transition) and during ongoing automation (i.e., before subsequent transition). Results revealed that over time, NDRT engagement increased during both L2 and L3 automation until stable engagement at the third block. This trend was observed for both onset and ongoing NDRT engagement. The overall engagement level and the increase in engagement are significantly stronger for L3 automation compared to L2 automation. These results outline the potential of NDRT engagement as an online non-intrusive measure for mode awareness. Moreover, repeated interaction is necessary until users are familiar with the automated system and its HMI to engage in NDRTs. These results provide researchers and practitioners with indications about users’ minimum degree of familiarity with driving automation and HMIs for mode awareness testing.


Introduction
The market introduction of vehicles equipped with SAE Level 3 (L3) automated driving systems (ADS) is only a matter of time. Automated driving promises numerous benefits: among others, it is expected to foster efficiency in terms of time usage. The driver may divert his/her attention to non-driving related activities while the ADS is executing vehicle guidance. SAE Level 2 (L2) driving automation-which is already commercially available-is also capable of controlling vehicle guidance while the driver still has to constantly monitor the system functioning [1]. L3 automated driving systems differ from L2 automation in such a manner that the driver has to be readily available as a fallback performer in case the system requests a transition to manual control. Thus, with the transition from L2 to L3 automation, the human driver's role shifts from that of an active system supervisor to a Information 2020, 11, 239 2 of 14 fallback-ready user who may engage in non-driving related tasks (NDRT). The availability of different driving modes (i.e., L1, L2, and L3) in one vehicle poses additional challenges to the driver to understand his/her role accordingly and not to confuse different automation modes and levels. Mode awareness as a critical issue in driving automation requires further research efforts for ensuring safe operation of different automated driving functions. Knowledge on the assessment of mode awareness, however, is scarce. Addressing this issue, the present study examines engagement in a representative visual-manual NDRT during different levels of automated driving as a non-intrusive measure for mode awareness. In the following, we first outline theoretical backgrounds on mode awareness and methodology to assess this construct. Subsequently, the research question and hypotheses are derived based on the preceding considerations.

Background
In the automotive context, the evaluation of HMIs has a long history. The distraction potential of in-vehicle information systems (IVIS) is the main focus for manual driving (SAE L0). Here, test procedures to assess visual workload associated with the IVIS have already been established [2,3]. However, the change of the driver's role from manual driver to supervisor in L2 and fallback performer in L3 automation renders the application of these methods unfeasible. For example, NHTSA distraction guidelines only permit 2 s per glance and 12 s total glance duration on IVIS. It might be questionable whether these numbers as they were proposed for manual driving are also suitable for L2 automation. In addition, with the driving automation executing longitudinal and lateral vehicle control, distance and lane keeping are not applicable measures for indicating the suitability of an HMI in this particular context. In contrast, a variety of constructs related to the safe driver-automation interaction such as trust [4][5][6][7] controllability [8][9][10], understanding in form of mental models [11][12][13], or usability [14] could be used as criteria. Research has shown that these pose challenges to the design and evaluation of automated vehicle HMIs. For an outline of evaluation methods for automated vehicle HMIs see [15]. One further step towards an ADS method validation concerns the investigation of mode awareness. This term was proposed by Sarter and Woods [16]. The authors report that even pilots who can be considered highly skilled and trained operators of flight automation can face situations where they are not certain of roles and responsibilities for the aircraft operation task. Such situations can lead to dangerous outcomes and consequently a safety-related assessment is indispensable.
Mode awareness is a central aspect for appropriate and safe human-automation interaction in general and in the context of driving automation in particular. For example, Gopinath and Johansen [17] outline that mode awareness of operators is of crucial importance for safety when interacting with production robots. By appropriate design of the automation and according HMIs, safety risks can be mitigated (e.g., [18]). In the driving automation context, Feldhuetter, Segler and Bengler [19] provide evidence that drivers' mode awareness is reduced when the vehicle is equipped with additional driving automation functions (see also [20]). Similar to the proposal by Gopinath and Johansen [17], they investigated whether an adaptive HMI design could support mode awareness, but could not find an effect. Other research supports their hypothesis that HMI design can affects drivers' visual behavior. For example, Kraft, Naujoks, Woerle and Neukum [21] report the impact of the HMI design on glance distributions during active L2 automation. In this study, a reduced and simple display produced positive effects in terms of distraction on both a self-reported and behavioral level. In addition, familiarity-dependent practice effects occurred for glance patterns. In general, behavioral adaptation to automated driving can be expected as outlined in [22]. An appropriate design of L3 automated vehicle HMIs can support self-reported usability and trust in automation (Hergeth, 2016). Since trust is expected to determine reliance behavior [6,23], we assume that such HMI variations can also affect behavioral parameters concerning NDRT engagement. This influence of HMI design on user behavior is of high importance since it must convey information about the driver's role during active L2 and L3 functioning. Investigating mode awareness between driving episodes, Feldhuetter and colleagues [24] tested whether manual driving episodes as intermittent features between transitions of L2 and L3 automation can help to promote mode awareness. In this experiment, they operationalized mode awareness via the visual attention towards driving-relevant areas and engagement in NDRTs. The study shows that there is a difference of visual attention allocation and NDRT engagement. However, it remains unknown whether this observation is stable or prone to changes over time. As there is research indicating behavioral changes in interaction with driving automation when interacting repeatedly [14,21], NDRT-related behavior might also change. Especially findings of more accurate mental models over time [11][12][13] lead to the question whether mode awareness is also dependent on the familiarity with the driving automation.
As indicated above, reliance behavior is suggested to be closely tied to NDRT engagement during automated driving [7]. The difference between L2 and L3 is that the driver is responsible for supervising the automation in L2 whereas he/she has to be readily available to perform driving task fallback in L3. For the HMI design, this indicates that L2 automation systems require a feature ensuring that drivers are attentive to the supervising task either by steering wheel input or gaze tracking to the forward roadway (see e.g., [25]). By issuing a so called "hands-on request" or "attention request", the system draws the driver's attention back towards the supervising task. In comparison, such interface features are not part of a L3 system as it allows NDRT engagement. L3 systems only request driver input at operational design domain (ODD) limits or system malfunctions [26]. Thus, NDRT-related behavior should differ depending on the understanding of the current level of automation (i.e., mode awareness) given an interface is designed in accordance with the prior considerations. The design of automated vehicle HMIs is therefore a crucial aspect for the facilitation of visual attention towards relevant events inside or outside the vehicle [27,28]. A study by Llaneras and colleagues [29] found that drivers tend to engage in NDRTs during reliable L2 automation that does not monitor or restrict behavior. This leads to risky driving and diverts attention away from the roadway and supervision of the system. Therefore, investigation and comparison of NDRT engagement during L2 and L3 automation is of high importance. It is expected that HMI features such as hands-on or attention requests during L2 automation should consequently lead to improved mode awareness with better understanding of his/her roles and responsibilities (i.e., supervising during L2). This understanding eventually translates in observable behavior of less NDRT engagement during L2 as compared to L3 automation.
The study outlined above shows that there is a growing body of research on mode awareness in the driving automation domain. Additionally, HMI considerations outlined above suggest that NDRT engagement can serve as an indicator of mode awareness. However, commonly agreed methodological approaches are still missing. In relation to the theoretical and conceptual developments, the present study's aim was to investigate how mode awareness can be assessed in a non-intrusive way. It seeks to extend the findings on understanding as reported in [13]. Results of this publication showed that the general understanding of roles and responsibilities (i.e., mode awareness) was high for both L2 and L3 automation. However, the question remained whether this understanding also translates in observable behavior. Non-intrusive measurements of mode awareness bear both advantages for researchers and practitioners as well as for the real-world application of driver-monitoring systems. On the one hand, during the development and evaluation of automated vehicle HMIs, mode awareness represents a critical issue that needs to be assessed. With the availability of a non-intrusive measure, research methodology benefits from the present research. On the other hand, real-world application could use driver monitoring technology to detect potential losses of mode awareness based on the driver's current behavior. Thus, an ADS might undertake necessary precautions such as displaying warning messages which are already in effect today for fatigue detection.

Research Question and Hypotheses
From theoretical considerations outlined above, the following research question is derived: How does NDRT engagement calibrate for different levels of automation (i.e., for different graphical HMI designs) and with rising system experience? The following two hypotheses are formulated for this research question:

Hypothesis 1 (H1). Drivers change their engagement in NDRTs over time;
Hypothesis 2 (H2). There is more NDRT engagement during an active L3 ADS compared to an active L2 driving automation.

Sample
A total of N = 59 participants took part in the driving simulation experiment. N = 10 drop-outs occurred because four participants did not complete the experimental procedure and six incomplete datasets were collected. This left N = 49 (13 female, 36 male) participants for data analysis. Mean age of the final sample was 30.96 years (SD = 9.08, MAX = 62, MIN = 21). All participants were BMW Group employees, held a German driver's license, and had normal or were corrected to normal vision.

Driving Simulation and Non-Driving Related Task
The study was conducted in a moving-base driving simulator (see Figure 1, left). The integrated vehicle's console contained all necessary instrumentation and was identical to a BMW 5 series with automatic transmission. Seven 1080p projectors provided a 240 • horizontal × 45 • vertical frontal field of view. One LCD screen positioned behind the back inside the vehicle mockup seats and two outside projections with the same specifications served as rear view. The motion system consisted of a hydraulic hexapod with six degrees of freedom, capable of up to 7 m/s 2 transitional acceleration and 4.9 m/s 2 continuous acceleration. The Surrogate Reference Task [30] was displayed on a 12.3" tablet mounted on the center stack console and was active during the entire experimental drive (see Figure 1, right). NDRT engagement is measured using a task that is representative for many NDRTs in terms of demands and distraction potential to obtain high external validity. The Surrogate Reference Task (SuRT, [31]) is such a representative task since it is used as a generic visual-manual secondary task in distraction studies. In addition to these, it has also been used for an NDRT in automated driving studies [7,9,32]. The SuRT requires participants to identify a target stimulus (i.e., large circle) within an array of distractors (i.e., small circles). By varying the amount of distractors and size difference between target and distractors, the NDRT demand and resulting workload can be adjusted specifically. An advantage of the SuRT is its potential to support high experimental control while on the downside, it is not a naturalistic NDRT and thus motivation to extensively engage in the SuRT could be limited. Hypothesis 2 (H2). There is more NDRT engagement during an active L3 ADS compared to an active L2 driving automation.

Sample
A total of N = 59 participants took part in the driving simulation experiment. N = 10 drop-outs occurred because four participants did not complete the experimental procedure and six incomplete datasets were collected. This left N = 49 (13 female, 36 male) participants for data analysis. Mean age of the final sample was 30.96 years (SD = 9.08, MAX = 62, MIN = 21). All participants were BMW Group employees, held a German driver's license, and had normal or were corrected to normal vision.

Driving Simulation and Non-Driving Related Task
The study was conducted in a moving-base driving simulator (see Figure 1, left). The integrated vehicle's console contained all necessary instrumentation and was identical to a BMW 5 series with automatic transmission. Seven 1080p projectors provided a 240° horizontal × 45° vertical frontal field of view. One LCD screen positioned behind the back inside the vehicle mockup seats and two outside projections with the same specifications served as rear view. The motion system consisted of a hydraulic hexapod with six degrees of freedom, capable of up to 7 m/s 2 transitional acceleration and 4.9 m/s 2 continuous acceleration. The Surrogate Reference Task [30] was displayed on a 12.3" tablet mounted on the center stack console and was active during the entire experimental drive (see Figure  1, right). NDRT engagement is measured using a task that is representative for many NDRTs in terms of demands and distraction potential to obtain high external validity. The Surrogate Reference Task (SuRT, [31]) is such a representative task since it is used as a generic visual-manual secondary task in distraction studies. In addition to these, it has also been used for an NDRT in automated driving studies [7,9,32]. The SuRT requires participants to identify a target stimulus (i.e., large circle) within an array of distractors (i.e., small circles). By varying the amount of distractors and size difference between target and distractors, the NDRT demand and resulting workload can be adjusted specifically. An advantage of the SuRT is its potential to support high experimental control while on the downside, it is not a naturalistic NDRT and thus motivation to extensively engage in the SuRT could be limited.
The interface on which the SuRT was presented did not display a score to the drivers to make NDRT engagement completely voluntary and free of a potential competitive character. The circles could be selected by touching the surface with a finger. When the participant selected the correct circle, it turned green before the subsequent pattern emerged. In case the wrong target was selected, it turned red and the pattern stayed until it was solved correctly.  The interface on which the SuRT was presented did not display a score to the drivers to make NDRT engagement completely voluntary and free of a potential competitive character. The circles could be selected by touching the surface with a finger. When the participant selected the correct circle, it turned green before the subsequent pattern emerged. In case the wrong target was selected, it turned red and the pattern stayed until it was solved correctly.

Study Design and Procedure
The study employed a 2 × 5 mixed within-between subjects design. The within-subject factor "block" had five levels from the first to the fifth block of use cases. The between-subjects factor "feedback" had two levels where participants either received feedback on their interaction success after each use case or not. Because the between-subjects factor was out of scope for the present research question, this research reports results of the within-subject factor "block".
Upon arrival, participants were welcomed and gave informed consent. After a brief explanation of the study purpose, the experimenter led them to the vehicle mockup. To accustom themselves with the simulator setup, participants had to complete at least two correct trials with the SuRT at standstill. Subsequently, they completed a five-minute manual familiarization drive without NDRT engagement. Prior to the experimental drive, the experimenter outlined the procedure and explained that participants would encounter two automated systems that are a L2 driving automation and a L3 ADS. They also received information stating that they would not have to constantly monitor the correct functioning of the L3 ADS. Concerning NDRT engagement, participants were instructed before each block that they could freely decide whether to engage in the NDRT when the automation was active. In doing so, the experimenter did not specify the level of automation or explicitly named any of the two functions. Furthermore, there was no additional incentive for executing the NDRT. The subsequent experimental drive included five blocks, each consisting of six driver initiated control transitions. After the successful completion of each interaction, there was a 20-s time window where users' NDRT-related behavior was observed. Table 2 additionally provides an overview of the windows of observation for NDRT-related behavior. Subsequently, there was a brief inquiry during the drive that occurred six times for each block [33]. Having finished use case specific questions, there was another time window of at least 20 s up to one minute where users could freely engage in the NDRT before the upcoming instruction of the next use case. After each block, participants were told to pull over to the right shoulder, stop there, and complete the block inquiry. The study employed a 2 × 5 mixed within-between subjects design. The within-subject factor "block" had five levels from the first to the fifth block of use cases. The between-subjects factor "feedback" had two levels where participants either received feedback on their interaction success after each use case or not. Because the between-subjects factor was out of scope for the present research question, this research reports results of the within-subject factor "block".
Upon arrival, participants were welcomed and gave informed consent. After a brief explanation of the study purpose, the experimenter led them to the vehicle mockup. To accustom themselves with the simulator setup, participants had to complete at least two correct trials with the SuRT at standstill. Subsequently, they completed a five-minute manual familiarization drive without NDRT engagement. Prior to the experimental drive, the experimenter outlined the procedure and explained that participants would encounter two automated systems that are a L2 driving automation and a L3 ADS. They also received information stating that they would not have to constantly monitor the correct functioning of the L3 ADS. Concerning NDRT engagement, participants were instructed before each block that they could freely decide whether to engage in the NDRT when the automation was active. In doing so, the experimenter did not specify the level of automation or explicitly named any of the two functions. Furthermore, there was no additional incentive for executing the NDRT. The subsequent experimental drive included five blocks, each consisting of six driver initiated control transitions. After the successful completion of each interaction, there was a 20-s time window where users' NDRT-related behavior was observed. Table 2 additionally provides an overview of the windows of observation for NDRT-related behavior. Subsequently, there was a brief inquiry during the drive that occurred six times for each block [33]. Having finished use case specific questions, there was another time window of at least 20 s up to one minute where users could freely engage in the NDRT before the upcoming instruction of the next use case. After each block, participants were told to pull over to the right shoulder, stop there, and complete the block inquiry.

Use Cases
The present experiment included driver initiated transitions between manual, L2, and L3 automated driving [34] as use cases (UCs). Considering both upward and downward transitions, one experimental block consisted of six use cases. For the present analysis, only transitions to an automated driving mode are of interest. Consequently, transitions to manual are not considered here. The use cases with transition type, automation level at use case initiation, target automation level, and use case numbering are shown in Table 1. To counteract sequential effects, participants were randomly assigned to one of six possible block sequences that were created using a Latin square. Each block consisted of six trials. In total, each participant completed 30 use cases. To standardize instructions, we recorded samples for each use case that were triggered by the experimenter.

Use Cases
The present experiment included driver initiated transitions between manual, L2, and L3 automated driving [34] as use cases (UCs). Considering both upward and downward transitions, one experimental block consisted of six use cases. For the present analysis, only transitions to an automated driving mode are of interest. Consequently, transitions to manual are not considered here. The use cases with transition type, automation level at use case initiation, target automation level, and use  Table 1. To counteract sequential effects, participants were randomly assigned to one of six possible block sequences that were created using a Latin square. Each block consisted of six trials. In total, each participant completed 30 use cases. To standardize instructions, we recorded samples for each use case that were triggered by the experimenter.

Automated Driving System
As soon as the driver activated the respective function, it carried out longitudinal and lateral vehicle guidance. The longitudinal and lateral vehicle guidance of the L2 and L3 automation was identical. The L3 ADS was capable of executing independent lane change maneuvers (e.g., overtaking slower vehicles ahead, pulling back to the right lane). The L2 driving automation set speed was the current velocity and could be adjusted without restrictions. The L3 ADS set speed was 130 km/h and could be adjusted to slower speeds. If adjusted to a faster speed than 130 km/h, it deactivated the L3 ADS and activated the L2 driving automation. Vehicle following distance (time headway) to a lead vehicle was 2 s.

Human-Machine Interface
The visual HMI was shown on the instrument cluster. It showed the vehicle and its surroundings in both L2 and L3 automated driving. The HMI for automated driving resembled a combination of adaptive cruise control and additional steering assistance [35]. The present HMI constitutes a representative solution for an automated system due to the conceptual similarity to solutions in prior research [4,36]. The L2 vehicle surroundings and L3 vehicle surroundings differed in (1) their informational content (i.e., higher level of detail in L3: visibility of adjacent lanes and vehicles) and (2) their perspective (i.e., larger field of view in L3). Thus, specifically the distance between the eye point and the vehicle, the angle between the direct line of sight and the road, and the opening angle of the field of view were manipulated. Figure 3 schematically depicts the configurations for L2 and L3 automation of the vehicle surround views from a profile perspective. An activated L2 automation was colored in green while an activated L3 ADS was colored in blue. In addition, during activated L3 ADS, the steering wheel was illuminated in blue color. The L2 driving automation displayed a hands-on request (HOR) after 15 s of hands-free driving. The HOR was displayed as hands grabbing a steering wheel [37,38] and yellow pulses on the illuminated steering wheel. The system functions could be activated with a button on the left side of the steering wheel for both levels of automation. For a more comprehensive description of the operating elements, see [14].
was colored in green while an activated L3 ADS was colored in blue. In addition, during activated L3 ADS, the steering wheel was illuminated in blue color. The L2 driving automation displayed a hands-on request (HOR) after 15 s of hands-free driving. The HOR was displayed as hands grabbing a steering wheel [37,38] and yellow pulses on the illuminated steering wheel. The system functions could be activated with a button on the left side of the steering wheel for both levels of automation. For a more comprehensive description of the operating elements, see [14].

Dependent Variables
The present study operationalized NDRT engagement as input with the finger on the NDRT surface. Table 2 visualizes the windows of observation for the dependent variables. To find out about the onset of engagement, we counted the total number of inputs on the surface for a time interval of 20 s after successful completion of each use case (NDRT observation window 1). Since it can be assumed that it takes some time for the NDRT engagement to set in and then to stabilize, we also investigated NDRT-related behavior at the end of an automated driving episode where the onset had most likely occurred and NDRT engagement was on a stable level. For that purpose, there was another window of observation covering the 20 s just before the onset of the subsequent use case (NDRT observation window 2).

Statistical Procedure and Data Analysis
NDRT data were pre-processed and visualized using Matlab Version 2015 (Mathworks Inc., Natis, MA, USA). Statistical tests were calculated using IBM SPSS Statistics Version 23 (IBM, Armonk, NY, USA). For observation window 1, means and standard deviations (SD) were computed for onset NDRT input frequency by use case and block. In contrast, when observation window 2 started, the transition of control already dated back too far so that a comparison of NDRT-related behavior on use case level (i.e., considering the respective previous level of automation) would not be useful for that period of time. Therefore, we compared NDRT engagement during observation window 2 only in regard to the level of automation that was active at that time. For that purpose, the sum of NDRT inputs during active L2 automation (after UC2 and UC4) and active L3 ADS (after UC1 and UC3), respectively, was calculated for each participant and block. Means and standard deviations (SD) were computed for these ongoing input sums. A significance level of α = 0.05 was applied for inferential testing unless stated otherwise. To control for alpha inflation due to multiple testing, correction after [39] was applied if necessary. Table 3 shows descriptive statistics (i.e., M, SD) of NDRT input frequency within the 20 s after UC completion by use case and block. Means and standard errors of onset input frequency by use case and block are depicted in Figure 4. Descriptive values revealed that the overall number of NDRT inputs during the 20 s after task completion was on a low level with mean input frequency not exceeding a number of two. Furthermore, there was a tendency towards more NDRT engagement with increasing system experience in all four use cases. However, the observed increase was stronger for transitions to L3 automation (UC1 and UC3) than for transitions to L2 automation (UC2 and UC4). Independent from the block, descriptive data showed considerably more NDRT engagement after transitions to L3 than after transitions to L2.  A 4 × 5 (UC × block) repeated measures analysis of variance (ANOVA) was conducted for onset input frequency. Results revealed significant main effects for both use case and block as well as a significant interaction effect (see Table 4). These inferential results indicate that mean input frequency differed significantly over time and for the different use cases, but the effect of the block depended on the respective use case. The effect sizes showed large effects ( [40]; see Table 4). To examine these effects in detail, planned contrast analyses were performed to compare onset input frequency for the two different levels of automation (L2: after UC2 and UC4; L3: after UC1 and UC3) and for consecutive blocks. Results are displayed in Table 5. Regarding the two levels of automation, results revealed that there was significantly more NDRT engagement during active L3 than during active L2 automation; the effect size (see Table 5) indicated a strong effect [40]. Comparisons between consecutive blocks showed a mixed picture: Mean NDRT input frequency was significantly higher in block 2 than in block 1. There were also significantly more NDRT inputs in block 3 as compared to block 2; medium to large effect sizes were obtained [40] (Cohen, 1988). The remaining contrasts between successive blocks did not reach significance (see Table 5). The results of the planned contrast analyses indicate that NDRT engagement increased within the first three system encounters and stabilized in subsequent system encounters. Table 4. Inferential statistics (i.e., F, df1, df2, p, ηp 2 -value) of main and interaction effects for onset input frequency. Statistically significant effects are colored in gray.  A 4 × 5 (UC × block) repeated measures analysis of variance (ANOVA) was conducted for onset input frequency. Results revealed significant main effects for both use case and block as well as a significant interaction effect (see Table 4). These inferential results indicate that mean input frequency differed significantly over time and for the different use cases, but the effect of the block depended on the respective use case. The effect sizes showed large effects ( [40]; see Table 4). To examine these effects in detail, planned contrast analyses were performed to compare onset input frequency for the two different levels of automation (L2: after UC2 and UC4; L3: after UC1 and UC3) and for consecutive blocks. Results are displayed in Table 5. Regarding the two levels of automation, results revealed that there was significantly more NDRT engagement during active L3 than during active L2 automation; the effect size (see Table 5) indicated a strong effect [40]. Comparisons between consecutive blocks showed a mixed picture: Mean NDRT input frequency was significantly higher in block 2 than in block 1. There were also significantly more NDRT inputs in block 3 as compared to block 2; medium to large effect sizes were obtained [40] (Cohen, 1988). The remaining contrasts between successive blocks did not reach significance (see Table 5). The results of the planned contrast analyses indicate that NDRT engagement increased within the first three system encounters and stabilized in subsequent system encounters.

Ongoing Input Frequency
Descriptive statistics (i.e., M, SD) of ongoing NDRT input sums within the 20 s before the onset of the upcoming use case by level of automation (L2: after UC2 and UC4; L3: after UC1 and UC3) and block can be found in Table 6. Figure 5 depicts means and standard errors of ongoing NDRT inputs by level of automation and block. The descriptive values showed similar tendencies as for onset NDRT engagement: The overall number of inputs during the 20 s before onset of the upcoming use case summed for active L2 and L3 automation, respectively, was relatively small with means not exceeding a number of four. Furthermore, a trend towards more NDRT engagement with rising system experience could be observed for both levels of automation with a seemingly weaker upward trend for L2 automation. However, descriptive NDRT engagement tended to stabilize after the first three system encounters. Descriptive data also indicated notably more ongoing NDRT engagement during active L3 automation than during active L2 automation in all five blocks.

Ongoing Input Frequency
Descriptive statistics (i.e., M, SD) of ongoing NDRT input sums within the 20 s before the onset of the upcoming use case by level of automation (L2: after UC2 and UC4; L3: after UC1 and UC3) and block can be found in Table 6. Figure 5 depicts means and standard errors of ongoing NDRT inputs by level of automation and block. The descriptive values showed similar tendencies as for onset NDRT engagement: The overall number of inputs during the 20 s before onset of the upcoming use case summed for active L2 and L3 automation, respectively, was relatively small with means not exceeding a number of four. Furthermore, a trend towards more NDRT engagement with rising system experience could be observed for both levels of automation with a seemingly weaker upward trend for L2 automation. However, descriptive NDRT engagement tended to stabilize after the first three system encounters. Descriptive data also indicated notably more ongoing NDRT engagement during active L3 automation than during active L2 automation in all five blocks.    Table 7. There was a significant main effect of level of automation as well as of block. This means that ongoing NDRT engagement was significantly higher during L3 automation than during L2 automation and differed over time. Furthermore, there was a significant interaction effect indicating that the effect of block on NDRT engagement depended on the level of automation that was active. The effect sizes (see Table 7) showed large effects [40].

Discussion and Conclusions
This research investigated the analysis of NDRT engagement at different levels of automated driving. The results of N = 49 participants showed that the levels of driving automation and accordingly designed HMIs lead to differences in NDRT engagement. An increase of NDRT engagement over time was observed for both automation levels whereas this increase was stronger in L3 as compared to L2 automation. These results indicate that users' behavioral adaptation occurs during initial system encounters. It also shows that the HMI design that follows considerations for L2 and L3 driving automation leads to specific behavioral patterns. The following section discusses the obtained results and relates them to prior considerations about NDRT engagement and mode awareness.
Overall, there were differences in NDRT engagement between the L3 and the L2 automation with significantly more engagement in L3 as compared to L2 automation as indicated by statistically significant main effects in Tables 4 and 7. Thus, these differences can be traced back to two sources. First, the L3 HMI permitted hands-free driving while the L2 HMI included hands-on requests. Second, the HMI designs differed in adaptations of informational content and perspective. Eventually, there is no final statement possible which HMI variation led to the differences in the observed behavior between the automation levels. Referring back to initial considerations of the HMI design for automated vehicles, it is important to include a form of feedback for L2 automation that prompts the drivers to supervise the driving automation. If these are not present (as in the present L3 case), there is high NDRT engagement. This observation supports the results by Llaneras and colleagues [29] The difference between NDRT engagement during L2 and L3 automation was observed for both the onset (see Figure 4) and ongoing (see Figure 5) NDRT engagement. These observations are in accordance with the findings reported in [19]. The results reported herein extend their findings by repeatedly observing the engagement in an NDRT. Here, similar results were obtained for L2 and L3 automation. Namely, engagement in NDRTs at initial contacts with driving automation-independent of the level of automation-is on a low level. The engagement rises in both instances as indicated by significant main effects for the block factor in both Tables 4 and 7. However, the rise in NDRT engagement was much stronger for L3 automation as compared to L2 automation as indicated by the significant interaction effects in the same tables. These results show that mode awareness might not only be captured by users' NDRT engagement in one block but also over the time course (e.g., five repetitions). The behavioral adaptation of NDRT engagement corresponds to related research that investigated human-automation interaction across repeated interactions [13,14,21]. A closer investigation of differences between the blocks by means of planned contrast analysis (see Table 5) showed that a change over time is present from the first up to the third encounter. From then on, stable engagement in NDRTs can be assumed. This has implications for study designs concerning automated driving and engagements in NDRTs. When setting up a study, researchers should be aware that behavioral adaptation requires a certain number of repeated trials until reliable user behavior is present. One example is the study by Hergeth and colleagues [7], where the authors investigated whether NDRT engagement and according glance behavior could be an indicator of reliance behavior and marker for trust in automation. Indeed, they considered familiarization with NDRT and automated driving system including N = 8 repeated NDRT engagements.
NDRT engagement was also present at L2 driving automation. By definition, users of L2 driving automation are responsible for supervising the driving task at all times and may not leave the control loop [1]. Even though NDRT engagement during L2 automation was on a descriptively low level, there were participants that diverted their attention away from supervising the driving automation. This observation has implications for the design of L2 automation. It has to be noted, that secondary task activities occur even in manual driving [41]. Such distraction during manual driving (i.e., engaging in NDRTs) is considered a safety risk and should be minimized [1]. In contrast, there is first evidence that this tendency can be used in a beneficial way during automated driving as it might be turned into controlled engagement. For example, Paetzold and colleagues [42] did not find differences in reaction time to automation errors between participants that were either engaged or not engaged in an NDRT. In the same vein, Hensch and colleagues [43] found effects of display position and secondary task on the driver's glance behavior in both automated and manual driving. They especially report longer eyes-on display time for NDRTs in head-up display configurations. However, due to its proximity to the driving environment it might enable a faster identification of and reaction to critical situations such as system failures. Thus, there are still challenges for conceptual developments of a HMI design for L2 automated vehicle HMIs.
Eventually, this study supports that NDRT-related behavior can be used to distinguish between levels of automation and their HMI conceptualization. Indeed, drivers' differences in behavior in NDRTs support the conclusion that mode awareness for the HMIs in L2 and L3 automation was on a high level. This difference is not only apparent overall, but also by differences in changes over time. Moreover, the study showed a methodological aspect on how to evaluate NDRT behavior during an episode (i.e., onset vs. ongoing) which led to similar results. Especially the fact that NDRT engagement changes over time implies that research needs to focus on prolonged periods and that drivers need to adapt to this technology first before it can be used appropriately.

Limitations and Future Research
This study comes with a number of limitations. First, there were no incentives for engaging in the NDRT. In real-road driving, drivers might disengage only if the NDRT has a rewarding character. It remains therefore unknown whether the NDRT engagement in especially L2 automation would remain at such a low level if rewards would have been applied in this study. Second, the NDRT consisted of the SuRT alone, which is a standardized method for visual-manual distraction. This NDRT does, on the one hand, only cover two modalities of distraction (i.e., visual and manual) and, on the other hand, it might not be a very motivating NDRT. For example, Purucker and colleagues [44] have used a more naturalistic set of NDRTs for their study that increases external validity of the findings. Third, the NDRT was mounted in a fixed way in the center console. It might be that engagement is increased if the NDRT is located closer to the line of sight [43]. Thus, future research has to determine how the NDRT-related behavior in a different level of automation evolves for differing activities, modalities, and locations in the vehicle interior. Moreover, the present research only supports insights on the group level that support the predictive character of the SuRT as a measure for mode awareness. However, this does not permit inferences on the individual level. There is still room for future research to determine whether and how predictive the engagement in the SuRT is for mode awareness on an individual level.