User Monitoring in Autonomous Driving System Using Gamified Task: A Case for VR/AR In-Car Gaming

Background: As Automated Driving Systems (ADS) technology gets assimilated into the market, the driver’s obligation will be changed to a supervisory role. A key point to consider is the driver’s engagement in the secondary task to maintain the driver/user in the control loop. This paper aims to monitor driver engagement with a game and identify any impacts the task has on hazard recognition. Methods: We designed a driving simulation using Unity3D and incorporated three tasks: No-task, AR-Video, and AR-Game tasks. The driver engaged in an AR object interception game while monitoring the road for threatening road scenarios. Results: There was a significant difference in the tasks (F(2,33) = 4.34, p = 0.0213), identifying the game-task as significant with respect to reaction time and ideal for the present investigation. Game scoring followed three profiles/phases: learning, saturation, and decline profile. From the profiles, it is possible to quantify/infer drivers’ engagement with the game task. Conclusion: The paper proposes alternative monitoring that has utility, i.e., entertaining the user. Further experiments with AR-Games focusing on the real-world car environment will be performed to confirm the performance following the recommendations derived from the current test.


Introduction
Automation in vehicles has brought about a paradigm shift in the industry with much anticipated Autonomous Driving Systems (ADS) technology at the turn of the 21st century. To date, driver assistance has attracted the interest of researchers and automobile manufacturers, as noted in the developed systems. Assistive technologies like emergency braking, lane-keeping, cruise control, and others have been deployed successfully [1]. In the next decade, assistive technologies will give way to conditioned autonomous driving, a situation whereby the autonomous system takes over longitudinal (acceleration and deceleration) and lateral (steering wheel and turns) control of the vehicle [2]. Automation levels are standardized by the Society of Automotive Engineers (SAE) standardization ranging from SAE level 0 for no automation to SAE level 5 for full autonomy [3]. Of particular interest for this paper is SAE level 3-conditional automation, level 4-high automation, and level 5-full automation. In these levels, the driver's obligation is primarily a supervisor with level 5 requiring no human intervention. In level 3, the system assumes environmental monitoring and control. If the system encounters uncertainties (e.g., missing road markings, foggy weather, system failure), it issues a takeover request (ToR) to a fallback system, which in this case is the human driver. In level 4, the system assumes control and issues a ToR as in level 3 with the difference that, in case the fallback driver is unresponsive, the system can assume control. If the human driver does not respond to a ToR within the stipulated time, the system will slow down and park the car. 2 of 18 In level 5, no human intervention is required. Use cases for level 3 would be a geo-fenced section that supports ADS, and the user must resume control once the vehicle exits the zone. Level 4 is best described as a robo-taxi, where the system can operate autonomously but stops and parks if there is no human driver to assume control in an issued ToR. To this end, for automated driving, the driver/user of ADS would be free to engage with secondary tasks (non-driving related tasks) during transit with a caveat that he/she is monitoring the road. The current research addresses the problem of vigilance decrement in autonomous vehicles and how engaging in a secondary task (gaming) can be employed to aid in road monitoring (hazard perception). The recognition time of staged popup traffic will be used to quantify road monitoring metrics, while game scores and user gaze information will be used to deepen insight on driver behavior.

Non-Driving Related Tasks (NDRT) and Driving-Related Tasks (DRT)
Non-driving related tasks encompass all tasks (secondary tasks) engaged by the driver [4]. NDRTs include operating handheld devices, operating in-car systems, communicating with passengers or on calls, amongst others. To date, research and policies have been focused on dissuading drivers from engagement in a secondary distractive task(s) owing to the threat these activities pose both to the driver and other motorists [5,6]. Research has been conducted to understand driver behavior in an NDRT environment for ADS. A paper by [4] investigated the effects of NDRT on the quality of takeover in varying traffic situations. The authors employed two tasks: a visual surrogate reference task representative of eyes-off-road and an n-back test as a mind-off-road engagement. There was no significant reported difference between the two types of distraction. A paper [7] evaluated the influence of drivers when reading news and emails, watching a video clip, and engaging with a tablet. Another paper [8] used video and tablet gaming NDRTs to evaluate driving behavior in a critical conditional takeover. The authors concluded that there was no influence of NDRTs on reaction time. The authors of [9] found that engaging in distractions can reduce up to 27% drowsy tendencies in the automated drive. From the standpoint of embracing the potential usage of distractions to enhance safety, the system designer's focus is to maximize transitioning from distraction to resuming control.
Since ADS will eliminate the need for active driving inputs and constant monitoring of the road, activities performed by the driver will not be categorized as a distraction [9]. This is the paradigm shift modulated by automation, where distraction is desirable in a car environment, i.e., the driving-related tasks (DRT) concept. DRT encompasses tasks performed by the driver to aid in the entire driving process [5]. DRT may include tasks like checking the speedometer or monitoring the side-view mirror, amongst others. As noted by the report [5], DRT can be a potential source of hazard in conventional driving. However, as ADS takes full shape, driving will be the distraction as roles get reversed. DRT in ADS is redefined to migrate from the conventional potentially hazardous task to a positive engagement that enhances the driving experience. In this case, DRTs seek to aid/promote overall improvement in the driving experience. To this end, activities that promote proper sitting posture, adherence to the proper hands-on-steering-wheel, road monitoring, and leg-pedal positioning would be considered DRT. Intuitively, tasks that promote road monitoring and the hands-on steering wheel would improve the quality of taking over.
Fatigue is expected to set in more quickly in autonomous cars than conventional driving due to reduced engagement [10,11]. Research by [12] reported that drivers in simulated automated cars showed physical signs of fatigue after 15-30 min of driving compared to manual drivers with longer endurance. The motivation for studies directed towards gaming stems from the fact that, in long-distance journeys, drivers and passengers alike are confined with few interactivity options. During transit, blank stares into the window, restlessness, impatience, and other experiences have happened to many travelers [13]. A remedy to this has been recreational activities allowable by the mode of traveling [14].
With the introduction of automation, gaming can be explored in a car environment without compromising the safety of stakeholders.
With the current technological advancements, the driver can be engaged in various activities, each soliciting the driver to different states. To reduce the chances of failure in a takeover, authors [9] argue that ADS will necessarily be tasked with monitoring the driver to assess the readiness to take over control. One way of achieving that is monitoring the task the driver is undertaking. Authors [15] argue that engagement with gamification in driving can reduce the risks associated with boredom and reduced vigilance. With this in mind, we have conceptualized a driver engagement model based on the content source and management routines, as shown in Table 1. No distractions The challenge with this level is that it is hard to maintain a non-distracted status for an extended period due to monotony in ADS.

ADS Managed Tasks
The driver engages with tasks like watching movies, games, etc., that are managed by the ADS system. This gives the advantage of the ease of passing relevant drive information and an indirect driver monitoring system.

External Devices Tasks
The driver engages with tasks with connected devices (smartphones, tablets, etc.). This allows for active sharing of relevant information 3 Passive State The driver engages with tasks unrecognized by the system. This covers all tasks, including unconnected devices and naps.
The desirable state is for the driver to be non-distracted and actively monitoring the road (monitoring state). Since this state is hard to maintain, we conceptualize three other states. At the primary level (state 1), the NDRT content is managed by the vehicles (ADS), i.e., start, stop, pauses interrupt, and other probes to focus the driver's attention. The foreseeable advantage of this is that information delivery can be optimized to integrate with the current road conditions. In this mode of engagement, an integrated environment (virtual or augmented) can be created that allows the user to perform both road monitoring and engagement with a desirable activity, the concept of DRT, thereby adding value to the driving experience. This research explores the possibility and the hurdles accompanying the development of such a system.
On the secondary level, external personal devices are linked to the system such that interruptions can rely on pertinent information instead of the driver having to build his/her situational awareness. Several authors [7,12,16] have investigated NDRTs at this level. The third state is the passive one, with unconnected devices or tasks blind to the system like a deliberate nap. A passive level will be the ultimate experience of an ADS in level 5.

Related Works and Current State of Automation
This paper focuses on gaming as a lucrative engagement that will be entertaining and an indirect environment contextualization scheme. We propose to monitor the driver's engagement as described in the literature with video and gaming tasks. This is an intuitive distraction that can keep the driver engaged. We found few works geared towards this line of inquiry. Similar research is reported in [17] focusing on a cross-car multiplayer game for the driver in level 3 and higher ADS. The authors designed a fully immersive multiplayer game for the driver, assuming full heads-up display (HUD) and ad-hoc vehicle-vehicle communication for co-located vehicles. A cooperative in-car game is proposed in [18], targeting cooperation between parents and passenger kids using handheld devices. Another paper [19] employed VR to an actual moving car with rendered underwater scenarios meant to offer a restful/mindful driving experience, while [20] used a similar moving vehicle with a VR system to render a flying (helicopter) environment with shootable game objects. One foreseeable challenge that is bound to affect the driver, who becomes the new passenger in ADS, is motion sickness [21,22]. An in-car game can remedy this by optimally synchronizing the experiences in the moving car and experienced content. This is the strategy proposed by holoride ® , VR ride, and others [23,24]. Other researches have focused on using handheld gadgets, tying game experience to geo-locations, passengers, and others [18,[25][26][27].
In the recent past, cars have been manufactured with inbuilt games, a very new concept, albeit being played on a parked car. To this end, manufacturers like Tesla ® , Mercedes-Benz ® , and others are paving the way for the future of gaming in vehicles. Tesla introduced the first game in cars (Atari games) playable on the screen [28]. In early 2019, Mercedes-Benz ® introduced a video game (Mario Kart) on the center screen [29,30]. The deployed games are principally targeting a parked vehicle, thus enabling the car steering wheel and pedals. As a development to the parked-car games, Audi ® , holoride ®, and the Disney team released an in-car VR experience focusing on passengers in transit [23]. The game integrates (meshes) with car motion to deliver synchronized content in the VR world which is argued to relieve carsickness. This and other developments will further solicit games in cars targeting different in-car users.
Automakers have tested several concept cars with different features. Two such concepts of particular interest to this paper are Zoox ® , an Amazon-owned robot-taxi, and Chevrolet Env 2.0 ® [31,32]. The concept cars feature an overhaul to the conventional interior design with a notable elimination of the current infotainment system. This implies that an alternative information/entertainment system is needed that best suits ADS and supports the new driving experience.
The current paper and the previous research focusing on gaming in a car have been predominantly prototypes and proof of concepts. The general objective has been timefiller, dislocation of cyber-physical space, immersive gameplay, and exploration of new entertainment gaming concepts. Time-filler emerges from the monotony of driving with limited engagement activities. Cyber-physical dislocation applies to individuals who prefer to be in a different (virtual) environment instead of a congested public transit system, as discussed by [33]. The proposed gaming approach, a form of human-machine interface, can sustain vigilance in ADS and increase information flow by aiding visual attention using gamified tasks.

Present Study
The overall objective of this paper is to study driver behavior in an ADS with gamified tasks in a 3D VR simulated driving environment. As previous studies have shown, boredom in an ADS, particularly in a monotonous route, would be far worse than a conventional driving experience [15]. With the introduction of AR, users would be presented with new content overlaid in any route, making the journey enjoyable and less tedious. We propose a game designed to be played by the driver of an ADS during autonomous mode. In-car gaming is a new concept that needs special consideration concerning interactivity, immersion, and situational awareness. From a design perspective, immersion and situational awareness can be regulated by matching the real world to virtual spaces, while interactivity is handled using the most convenient control scheme [18,34].
At present, the link of gameplay to awareness has not been explored; most inquiries are directed to the resumption of control and not on the development of awareness using games. Conventional approaches utilize quiz games or touch interfaces inside consoles or tablets to assess ToR [8,35]. The setup is predominantly an eyes-off-the-road setup that negatively affects situational awareness. Authors in [36] argue that situational awareness increases with an increase in takeover time, meaning that the more the user has eyes on the situation, the better the performance. The closest research towards awareness is on AR displays. AR and HUD that meshes actual world data overlaid with contextual information have been shown to increase awareness [37]. In the proposed model, game elements appear within the drive path region of interest to synchronize the actual with the virtual world. The position of this research is that an environment-centric design where the game mechanism is within the monitored zone will assist indirectly in road monitoring and increased situational awareness.
Situational awareness has been identified in driving as key to safe driving [36,38,39]. At the onset of level 3 automation in ADS, concerted efforts will be needed to maintain the driver's visual search path to the driving environment. The reported traffic incidences involving self-driving cars have partly been due to the safety driver being disconnected from the environment [40,41]. As ADS develops, there will be an improvement in system performance, but passengers and drivers alike will need to be aware of the developing situation on the roads. One such way is the use of un-obstructing AR-Games to guide vision to emerging/developing situations.
From [3], a fallback-ready user should be receptive to requests or eminent vehicle system failure whether a takeover request is issued or not. This calls for sustained vigilance on the driver's side, which is the focus of this paper. As such, the current research is concerned with the following:

1.
Investigate how well the driver can recognize threatening driving scenarios while engaging in a game. This will be indicated by the time taken to press the designated button (recognition time) accurately.

2.
Evaluate drivers' engagement with the task using score profiles to make inferences on the drivers' state. Game scores will be used as an indirect measure to infer engagement in the task and, by association, vigilance.

3.
Investigate driver's interaction with the virtual environment. Based on the interaction model, the driver may be overly engrossed or disinterested in the task. Eye data will be utilized to identify trends in gaze behavior to confirm road monitoring. The gaze trends will be used as design recommendations for indirect monitoring systems.
The contribution of the present study is an investigation on the use of VR and headmounted display as an alternative in-car infotainment system and characterization of the driver state using gaming modalities. A business case for level 4 and above is well captured by [42], where engagement state and visual attention metadata captured is applied in tourism. As a pilot study, the current paper seeks to shed light on design considerations for infotainment systems. The findings will be applied in a real-world environment with a moving car to study various aspects of entertainment in an automated vehicle.

Driving Simulator
We designed a custom car simulator scene with a Unity3D game engine for driver analysis. A virtual car is configured on a rule-based autopilot (using waypoints) to mimic autonomous drive in the configuration. Thus, no control inputs were required from the user. The vehicle was designed to move straight (along the z-axis) with minimal speed variation and turns. A rural terrain was adopted with a two-lane asphalt road and minimal terrain details. The scene was designed to be monotonous with only road markings, greenery (grass) terrain all throughout, and a clear blue sky. Several authors have explored the environmental details of the driving simulator. Turns, terrain details, and landscape colors, amongst others, have been found to affect driver vigilance [43,44]. The present design, informed by similar simulations targeting monotony, dissuaded the user's gaze from wandering to terrain details but rather focus on the gaming relevant objects (popup traffic and game elements). Fove ® 3D Head Mounted Display (HMD) was used for VR content rendering in the prototype game. The simulation was run on a Windows 10 PC with an Intel ® Core i7 processor and GeForce GTX 1070 graphics card.
In the design, we included staged popup traffic events to assess the driver's threat recognition time. The setup is shown in Figure 1. The ego vehicle moves along the z-axis in lane one of a two-lane drive path. Game elements comprise a controllable (paddle) object (denoted in blue) and corresponding collectible objects (shown in yellow) that appear along the drive path. Popup traffic appears in locations A, B, or C. Popup traffic appeared after every 2 Km in a total of 25 Km distance covered cruising at a constant speed of 30 m/s. Twelve (12) popup instances were displayed for each experiment. The test subjects were required to press a button immediately after popup traffic appeared. The entire drive-path region of interest covers the pavements and the two lanes, as shown in Figure 1. Figure 2 shows sample scenes with popup traffic and game elements as experienced by the driver.
gaze from wandering to terrain details but rather focus on the gaming relevant objects (popup traffic and game elements). Fove ® 3D Head Mounted Display (HMD) was used for VR content rendering in the prototype game. The simulation was run on a Windows 10 PC with an Intel ® Core i7 processor and GeForce GTX 1070 graphics card.
In the design, we included staged popup traffic events to assess the driver's threat recognition time. The setup is shown in Figure 1. The ego vehicle moves along the z-axis in lane one of a two-lane drive path. Game elements comprise a controllable (paddle) object (denoted in blue) and corresponding collectible objects (shown in yellow) that appear along the drive path. Popup traffic appears in locations A, B, or C. Popup traffic appeared after every 2 Km in a total of 25 Km distance covered cruising at a constant speed of 30m/s. Twelve (12) popup instances were displayed for each experiment. The test subjects were required to press a button immediately after popup traffic appeared. The entire drivepath region of interest covers the pavements and the two lanes, as shown in Figure 1. Figure 2 shows sample scenes with popup traffic and game elements as experienced by the driver.    gaze from wandering to terrain details but rather focus on the gaming relevant objects (popup traffic and game elements). Fove ® 3D Head Mounted Display (HMD) was used for VR content rendering in the prototype game. The simulation was run on a Windows 10 PC with an Intel ® Core i7 processor and GeForce GTX 1070 graphics card.
In the design, we included staged popup traffic events to assess the driver's threat recognition time. The setup is shown in Figure 1. The ego vehicle moves along the z-axis in lane one of a two-lane drive path. Game elements comprise a controllable (paddle) object (denoted in blue) and corresponding collectible objects (shown in yellow) that appear along the drive path. Popup traffic appears in locations A, B, or C. Popup traffic appeared after every 2 Km in a total of 25 Km distance covered cruising at a constant speed of 30m/s. Twelve (12) popup instances were displayed for each experiment. The test subjects were required to press a button immediately after popup traffic appeared. The entire drivepath region of interest covers the pavements and the two lanes, as shown in Figure 1. Figure 2 shows sample scenes with popup traffic and game elements as experienced by the driver.

Game Mechanism Design
The paper proposes introducing gaming in a car as a pass-time activity that will improve engagement and offer entertainment. Conventional games played in cars use handheld devices like tablets or phones. Thus, the primary mode of interactivity is by clicking or touch-panel. An example of these is the setup in [7,35] that interacts with games available using external tablets or phones, which we refer to as vehicle-centric design. The proposed system uses an elaborate control mechanism to manipulate an intercepting game in 3D space along the drive path-in this case, an environment-centric design. As opposed to conventional systems, the 3D perception will be an essential consideration, as discussed.
The driver actively engages with elements on the road while monitoring traffic on the game setup in the design. A controllable paddle object (player) is located a few meters from the car's position, visible by the driver. The paddle slides along the x-axis and is translated by the position of the car (z-axis). As the car moves autonomously, collectible objects are spawned ahead on the drive path by the gaming engine, as shown in Figure 2a. The spawned objects appeared at an interval of 2 s. When the controller paddle (Unity 3D game object) collides with the mesh of the spawned object, a score is registered (intercepted) and the contrary for a missed object. This is made possible by Unity-3D's physics system that checks for the interaction of game objects. Missed objects will be recorded alongside the spawn point to analyze the scoring profile of the position.
The driver moves the paddle position to intercept AR-Game spawned objects (elements) using a joystick controller shown in Figure 3. The player increases points upon a successfully intercepted object. A penalty is executed in the case of a missed object (decrement in displayed score), and data is recorded to analyze the missed object profile further. Score progress is logged and displayed in the dashboard of the car. Gaze information was recorded using the inbuilt eye-tracking system of FOVE ® HMD. The data is logged together with car position and game score progression for every frame at a sampling rate of 65-75 frames per second.  The user was expected to recognize the threat and push a corresponding button on the steering wheel shown in Figure 3 as a popup object button. No auditory warnings were issued when the traffic appeared. Therefore, the driver relied on a visual search to identify relevant information from the drive path and react accordingly. We experimented with the user engaged in a game and compared the hazard recognition time to a case where the user has no or mild physical engagement (i.e., watching a video clip or no-task).
Video and game tasks have been used to evaluate hazard perception in ADS by numerous studies due to the ease of presentation and the entertaining utility of the tasks. Video tasks can represent many handheld gadget usages like social media browsing, while games represent a plethora of user-based inputs. The user can shift visual attention momentarily in a video clip, leveraging audio content to keep up with content, which is thus a less demanding task. A no-task is included to represent a case where the driver monitors the road with no additional task. No-task and video are theorized to have similar attentional requirements contrasted with a game task in the study. The time taken (recognition time) to detect threats on the road is expected to be impacted by the task or the driver's loss of attention. Interaction with the AR elements is evaluated in the form of game scores and gaze information.

Participants
Students comprised the participants in the study and were recruited following ap- We propose to set up the video on the HUD so that the driver can transition from video to road monitoring effortlessly. This is achieved by projecting the contents in an HUD type of screen instead of having the video player located below the driver's view in a cockpit, as is the case in current car design. With this setup, the video is rendered in the upper part of the windscreen, leaving enough window for road monitoring. Furthermore, the setup is meant to ensure eyes on the road, which significantly affects recognition time from literature.

Experiment Setup
The subjects sat comfortably in an office chair with the camera view inside the virtual car positioned in a typical driver seat in a 3D VR environment. Figure 3 shows a test subject using 3D VR and steering wheel input controls for the game. We used the Thrustmaster ® steering wheel attached to the PC running the simulator for popup object buttons and game controller. The user was expected to recognize the threat and push a corresponding button on the steering wheel shown in Figure 3 as a popup object button. No auditory warnings were issued when the traffic appeared. Therefore, the driver relied on a visual search to identify relevant information from the drive path and react accordingly. We experimented with the user engaged in a game and compared the hazard recognition time to a case where the user has no or mild physical engagement (i.e., watching a video clip or no-task).
Video and game tasks have been used to evaluate hazard perception in ADS by numerous studies due to the ease of presentation and the entertaining utility of the tasks. Video tasks can represent many handheld gadget usages like social media browsing, while games represent a plethora of user-based inputs. The user can shift visual attention momentarily in a video clip, leveraging audio content to keep up with content, which is thus a less demanding task. A no-task is included to represent a case where the driver monitors the road with no additional task. No-task and video are theorized to have similar attentional requirements contrasted with a game task in the study. The time taken (recognition time) to detect threats on the road is expected to be impacted by the task or the driver's loss of attention. Interaction with the AR elements is evaluated in the form of game scores and gaze information.

Participants
Students comprised the participants in the study and were recruited following approval from the Gifu University ethics committee. Thirteen subjects (four female and nine male) took part in the experiment with an average of 26.6 years (std 6.2) from different nationalities. Real-life driving or gaming experience was not considered in the current study. Out of the participants, four had prior exposure to gaming, including VR games.
A preparatory drive scene was presented before the recording of data. In this test, the subjects were introduced to the controls and buttons and the general objective of the experiment. The participants were divided into two groups: a mixed-content group (3 subjects comprising gamer, non-gamer, and a female subject) and the game-only group (10 subjects). The focus of the grouping was not the investigation of task effects on recognition time. Instead, it was to identify games as the most involving of the tasks and hence, ideal for the present investigation. The grouping above had a sampling size of 23% of the total population. The mixed-content group was presented with three tasks: video, game, and no-task in random order. Each subject conducted 3 experiments, each lasting approximately 30 min with sufficient (user informed) break time between sessions. Owing to the prolonged VR usage, we limited the size to a sample of subjects who can complete the experiment with minimal or no simulator sickness. The game-only group was presented with gaming tasks to avoid pre-exposure. Each subject's gaze information, button presses, scores, and interaction with game elements were logged in an Excel file for further processing. The experiment lasted between 20-30 min per subject. No incentives were offered to the subjects. Data analysis was performed using Matlab ® software.

Comparison of Tasks
The mixed-content group tried the three tasks in random order to compare the performance of recognition time with different tasks. The performance was evaluated as shown below. Figure 4 shows a boxplot of the recognition time recorded in all the popup cases with or without a DRT. The evaluated tasks (No-Task, Game (AR-Game), and Video) had a recognition time difference of less than 1 s for the subjects. Outliers in a No-Task scene represent instances where the driver was not paying attention to road events. The figure shows a case where driving with an AR-Game would have a slightly slow recognition time with an advantage of consistency (a compact interquartile range).
shown below. Figure 4 shows a boxplot of the recognition time recorded in all the popup cases with or without a DRT. The evaluated tasks (No-Task, Game (AR-Game), and Video) had a recognition time difference of less than 1 s for the subjects. Outliers in a No-Task scene represent instances where the driver was not paying attention to road events. The figure shows a case where driving with an AR-Game would have a slightly slow recognition time with an advantage of consistency (a compact interquartile range). There were statistically significant differences between the means of the groups as reported by one-way ANOVA (F(2,33) = 4.34, p = 0.0213, η 2 = 0.2081). From the figure, the driver in No-Task had the best recognition time (mean = 2.492, std = 0.083) followed by AR-Video (mean = 2.506, std = 0.179). AR-Game (mean = 2.627, std = 0.081) was the slowest, as expected. In No-Task and Video tasks, the user's hands are not occupied with any activity and as such, the pressing of buttons is instantaneous immediately after traffic pops up. In the case of an AR-Game, the user is actively controlling a gamepad and takes more time transitioning to button press from active game control. A noticeable difference is an interquartile range (IQR), which was found to be 0.09, 0.2, and 0.07 s. for No-Task, AR-Video, and AR-Game tasks, respectively.

Gaming Recognition Time
The focus of the study was to investigate the impacts gaming would have on reaction time. Figure 5 shows the boxplot of the recognition time for 10 of the users evaluated in the experiment. In all cases, the drivers reacted with an average of 3 s after popup traffic. The data captures a case where the driver did not press any button within the time. This There were statistically significant differences between the means of the groups as reported by one-way ANOVA (F(2,33) = 4.34, p = 0.0213, η 2 = 0.2081). From the figure, the driver in No-Task had the best recognition time (mean = 2.492, std = 0.083) followed by AR-Video (mean = 2.506, std = 0.179). AR-Game (mean = 2.627, std = 0.081) was the slowest, as expected. In No-Task and Video tasks, the user's hands are not occupied with any activity and as such, the pressing of buttons is instantaneous immediately after traffic pops up. In the case of an AR-Game, the user is actively controlling a gamepad and takes more time transitioning to button press from active game control. A noticeable difference is an interquartile range (IQR), which was found to be 0.09, 0.2, and 0.07 s. for No-Task, AR-Video, and AR-Game tasks, respectively.

Gaming Recognition Time
The focus of the study was to investigate the impacts gaming would have on reaction time. Figure 5 shows the boxplot of the recognition time for 10 of the users evaluated in the experiment. In all cases, the drivers reacted with an average of 3 s after popup traffic. The data captures a case where the driver did not press any button within the time. This is shown as an outlier in driver no. 7. In the plot, the line represents the median, and the boxes are the interquartile ranges (IQR) of the 12 popup instances. Participants who had prior gaming experience had a quick response, as shown (driver 1, 2, 3, and 9). Figure 6 shows the progression of the average reaction time of all ten subjects. From the figure, reaction time improves (as shown by mean and standard deviation) as the game progresses and deteriorates towards the end of the task. The popup incidences in the middle (popups 6, 7, and 8) had the quickest reaction time while the last incidence reported the worst performance, compared to the average of 3 s.

User Gaze Tracking
As mentioned, the experiment was inwardly recording eye gaze as the user interacts with both popup traffic and game elements. The results below show gaze direction progression overlaid onto the conceptual scene setup shown in Figure 1. Naturally, users' gazes react to moving and fixed objects differently. As popup objects appear, there is a quick reflex to attend to the stimuli. Similarly, when guiding the player object to intercept a static object in 3D space, a pursuit (tracking) gaze is employed. Gaze information was recorded to reveal the interlink/tendencies of fixation or scanning the environment around the objects of interest. The results are shown in Figure 7. The figure shows scatter plots of the driver's gaze direction (x-axis) 5-7 s before a button press. The x-axis of the drive path (region of interest) ranged between −5 and 5 from the center view in the figure. Center view (zero points) represents the position the gaze would make in a VR environment if the driver looked straight into the environment. From the result, the user's gaze is actively engaged in the road environment both on the left and right sides of the travel lane. The strength of the scatter plots (concentration points) reveals gaze fixation, instances where objects of interest are located within close proximity (localized) in the scene. Weak scatters are during gaze movement (scanning of environment or transitioning to the next object). Two distinct patterns in the gaze information are identified as localized fixation and scanning patterns. The patterns were identified in all of the participants (with varying scattering strength) as participants visually interact with objects in the scene. is shown as an outlier in driver no. 7. In the plot, the line represents the median, and the boxes are the interquartile ranges (IQR) of the 12 popup instances. Participants who had prior gaming experience had a quick response, as shown (driver 1, 2, 3, and 9). Figure 6 shows the progression of the average reaction time of all ten subjects. From the figure, reaction time improves (as shown by mean and standard deviation) as the game progresses and deteriorates towards the end of the task. The popup incidences in the middle (popups 6, 7, and 8) had the quickest reaction time while the last incidence reported the worst performance, compared to the average of 3 s.   is shown as an outlier in driver no. 7. In the plot, the line represents the median, and the boxes are the interquartile ranges (IQR) of the 12 popup instances. Participants who had prior gaming experience had a quick response, as shown (driver 1, 2, 3, and 9). Figure 6 shows the progression of the average reaction time of all ten subjects. From the figure, reaction time improves (as shown by mean and standard deviation) as the game progresses and deteriorates towards the end of the task. The popup incidences in the middle (popups 6, 7, and 8) had the quickest reaction time while the last incidence reported the worst performance, compared to the average of 3 s.   travel lane. The strength of the scatter plots (concentration points) reveals gaze fixation, instances where objects of interest are located within close proximity (localized) in the scene. Weak scatters are during gaze movement (scanning of environment or transitioning to the next object). Two distinct patterns in the gaze information are identified as localized fixation and scanning patterns. The patterns were identified in all of the participants (with varying scattering strength) as participants visually interact with objects in the scene.   Figure 8 shows the intercepted AR-Game elements (score) distributed in a 100-s interval (segment) from the start to the end of the simulation. Each duration represents a period in which 50 elements were spawned in the driving scene, and the score is the cumulatively spawned objects. The accumulated scores have a direct relation with user game interactions. After initial learning of the basics, the scores are expected to rise to a level allowable based on the user's hand coordination skills. The results show a general profile of scoring progression, which we refer to as intercepted objects profile. Three distinct profiles/patterns were observed: a learning phase (positive gradient), saturation (constant gradient), and a decline (negative gradient). This is highlighted in Figure 8a,c using trend lines. An intermediary stage of saturation was also noted but is subsumed in  Figure 8 shows the intercepted AR-Game elements (score) distributed in a 100-s interval (segment) from the start to the end of the simulation. Each duration represents a period in which 50 elements were spawned in the driving scene, and the score is the cumulatively spawned objects. The accumulated scores have a direct relation with user game interactions. After initial learning of the basics, the scores are expected to rise to a level allowable based on the user's hand coordination skills. The results show a general profile of scoring progression, which we refer to as intercepted objects profile. Three distinct profiles/patterns were observed: a learning phase (positive gradient), saturation (constant gradient), and a decline (negative gradient). This is highlighted in Figure 8a,c using trend lines. An intermediary stage of saturation was also noted but is subsumed in Figure 8b during the transition. The overall percentages of trends are shown in Figure 8d. From the figure, 70% of users showed a positive gradient, while 30% of users had a saturated trend. A declining trend was noted in half (50%) of participants.

Intercepted Objects Profile
The phases were generated after observing the data from all users. Extra testing of subjects conformed to the three phases and did not provide new information. The significance of each of the profiles is discussed in a later section. Figure 9 shows the percentage score of the drivers for the entire course. The figure shows an average scoring of 83.4%, verifying the game's playability to a satisfactory level.

Missed Objects Profile
Like the intercepted objects, missed objects are equally significant as a pointer of the reasons for a missed object/error. As with any gaming engagement, the rules for losing are equally essential and telling. In the game, a missed object is reported when the paddle did not intercept the game element. Missed objects profile is formulated by plotting deviation index (distance from paddle to the target game element). This information is also tied to the respective spawn point to formulate a profile. The deviation index is calculated as shown in Equation (1) below.
where i represents the spawn position 1:5; Paddle_pos, in this case, represents the current position of the paddle; and missed_obj represents the position of the currently missed object.
using trend lines. An intermediary stage of saturation was also noted but is subsumed in Figure 8b during the transition. The overall percentages of trends are shown in Figure 8d. From the figure, 70% of users showed a positive gradient, while 30% of users had a saturated trend. A declining trend was noted in half (50%) of participants. The phases were generated after observing the data from all users. Extra testing of subjects conformed to the three phases and did not provide new information. The significance of each of the profiles is discussed in a later section. Figure 9 shows the percentage score of the drivers for the entire course. The figure shows an average scoring of 83.4%, verifying the game's playability to a satisfactory level.  The phases were generated after observing the data from all users. Extra testing of subjects conformed to the three phases and did not provide new information. The significance of each of the profiles is discussed in a later section. Figure 9 shows the percentage score of the drivers for the entire course. The figure shows an average scoring of 83.4%, verifying the game's playability to a satisfactory level.  Deviation index, I dev , is given as the absolute separation distance between the paddle and the spawned AR-Game element. For each of the spawn points 1-5, the corresponding average deviation index is logged and is shown in Figure 10. The deviation increases as the object moves from the middle "U-shape" around spawn point 3 (drivers' center view). The deviation index standard deviation error also increases from the center spawn point. position of the paddle; and missed_obj represents the position of the currently missed object.
Deviation index, Idev, is given as the absolute separation distance between the paddle and the spawned AR-Game element. For each of the spawn points 1-5, the corresponding average deviation index is logged and is shown in Figure 10. The deviation increases as the object moves from the middle "U-shape" around spawn point 3 (drivers' center view). The deviation index standard deviation error also increases from the center spawn point.

Discussion
This paper aimed to investigate the use of games in an autonomous car environment. To this end, we sought to gather driver behavior and tendencies to infer the engagement and driver state level. The investigation sought to answer the question of what the effects will be of engaging in a simple interception game in an ADS environment. We designed an AR-Game inside a driving simulator to analyze driver's engagement with game elements and staged popup traffic. Similar research had no agreed-upon standard of reference or evaluation scheme with more emphasis on the design of approach [15,17,26,27].

Recognition Time and Visual Search
In the experiment, recognition of popup objects was compared between three engagement levels: no task, watching a video clip, and gaming tasks. The results are shown in Figure 4. The means for all the tasks were statistically significant (p = 0.0213, η 2 = 0.2081). Of the two tasks, the game was the slowest in recognition time, making it an ideal test ground for investigating the tradeoff between engagement and road monitoring. An important observation is the interquartile ranges of the tasks evaluated. The tasks had an

Discussion
This paper aimed to investigate the use of games in an autonomous car environment. To this end, we sought to gather driver behavior and tendencies to infer the engagement and driver state level. The investigation sought to answer the question of what the effects will be of engaging in a simple interception game in an ADS environment. We designed an AR-Game inside a driving simulator to analyze driver's engagement with game elements and staged popup traffic. Similar research had no agreed-upon standard of reference or evaluation scheme with more emphasis on the design of approach [15,17,26,27].

Recognition Time and Visual Search
In the experiment, recognition of popup objects was compared between three engagement levels: no task, watching a video clip, and gaming tasks. The results are shown in Figure 4. The means for all the tasks were statistically significant (p = 0.0213, η 2 = 0.2081). Of the two tasks, the game was the slowest in recognition time, making it an ideal test ground for investigating the tradeoff between engagement and road monitoring. An important observation is the interquartile ranges of the tasks evaluated. The tasks had an IQR of 0.09, 0.2, and 0.07 s for No-Task, AR-Video, and AR-Game, respectively. This suggests that the AR-Game driver has a consistent RT compared to other tasks that fluctuate with attention shifts. The overall recognition time for all subjects yielded a reaction time of 2.9 s, which agrees with the findings of other researchers [8,45]. The findings suggest that a driver engaging in an AR-Game would not be impaired by the gaming elements in recognizing threatening scenarios. On the contrary, it might help in maintaining vigilance. As seen in Figure 6, reaction time improved with time as users got used to the control mechanism and deteriorated towards the end possibly due to fatigue or lost interest in the game.
As far as takeover time is concerned, the literature review suggests no difference in performance with visual-loaded or cognitive-loaded secondary tasks [7,16]. The main effect has been reported on metrics like time-to-hands-on-steering and time-to-eyes-on-road, amongst others [8]. These effects point to the mode of task presentation and interactivity as opposed to loading. From the literature and the comparison of tasks with no task, we assume that multimodal secondary tasks would not alter the recognition of hazards.
In addition, we explored a potential advantage of AR-Games focusing the visualsearch and patterns of the driver to different sections of the road. From Figure 7, we identified two visual search patterns: scanning and localization of the visual search. As objects are populated in the drive path (popup traffic and game objects), the users' gaze reacts to each item. If the objects appeared within the same region, gaze fixation was noted instead of dispersed gazes while scanning. The two patterns identified above represent cases where the user shifts his/her focus to interact with objects. Authors [46,47] reported that the driver's gaze is more dispersed in the environment than manual driving for ADS. As opposed to a dispersed gaze, the results presented in the paper follow a systematic and focused transition from one object to another.
We confirmed positive tracking of AR elements that can be applied in refining the users' gaze to relevant information along the drive path. At the advent of self-driving cars, highlighting personalized content that the user is most intrigued by or relevant information outside of the car environment, such as traffic signs, would be an added advantage. Researchers have linked gaze wandering with boredom and lost interest [48]. In this paper, we have only tracked the progression of the gaze to ascertain the way users react visually to objects. Further analysis would be needed to ascertain cases of lost interest in the gaming activity using gaze information.

Score Profile as an Engagement Model
We managed to introduce and control gaming elements in the driving path to reduce the monotony in the driving scene. The overall score for the drivers is as shown in Figure 9. The overall percentage score represents individual control skills which reflect the ease of game control and interaction. All drivers had an average score of 83.4% and a minimum of 72.3%. Game score progression was theorized to reflect the user's engagement which is related to the driver state. Considering the monotonous scene employed in the setup, the driver is only engaged with game-control or road monitoring. From this, missed objects will arise from loss of focus or shortfall in hand coordination skills. Consequently, as the participant engages in-game control, the progression trend will highlight the participant's engagement level. The intercepted object trends in Figure 8 show how the user interacts with gaming elements and indicates engagement as either enthusiasm, saturation, or declining interest.
Considering this is a game the users have never encountered before, a learning phase is expected at the onset, followed by either a decline or sustained scoring depending on the user's impression of the game. A positive gradient trend is indicative of an aspect of learning as the game progresses. This was present in 70% of the drivers new to the simulation. The saturation trend would indicate users who have learned the control scheme and actively perform at personal peak allowable within the constraints. This was majorly present in the gamer group, with a prevalence of 30% shown in Figure 8d. The trend is sustained in cases where the user enjoyed the game or gave way to a downtrend. A declining trend was reported in 50% of participants towards the end of the experiment. This agrees with the deteriorated recognition time in Figure 6 towards the end of the experiment pointing to fatigue or lost interest. A transition from positive to negative gradient would be ideal for introducing levels and or other gaming elements to keep the user engaged. If fed to the system, the trends identified in the result would add to the pool of feedback information of the current state of the driver. This way, the system will have a form of contextualization of the state of the driver.

Game and Design Consideration
As noted from Figure 10, AR-Game elements on the sidewalks were easily missed compared to center objects. This agrees with the results from different authors on the anisotropic perception of object size, distance, and positioning in a VR environment [17,49]. From these results, the deviation index increases as the position of the playable object departs from the center. The "U-shape" error pattern is a critical consideration in the enjoyment/challenge of AR elements. It points to the hurdle that must be considered in terms of 3D world reconstruction due to visual perception and biases in the human brain. Careful consideration of the playable/interactable environment is recommended.
Various research inquiries have been made touching on VR usage and interfaces in cars [22,50,51]. The key points are on collision with constrained spaces, social acceptance of HMDs, and the control mechanism, amongst others. From the proceeding pilot test, we recommend a limited application of full-body control schemes. This would include tasks that require standing, sways/body rotations, and raised hands, amongst others. The current game did not require any full-body activity, but inadvertently, most users were observed to exhibit body sways and shifts that inherently alter the center of gravity. In a moving car, the interplay of user head motions has been cited as a potential source of discomfort [52]. Besides this, moving body parts increases the chances of collision with physical barriers.
In the foreseeable future, games will become an integral part of the in-car experience. A car in motion is a potent source of somatosensory information that would significantly enhance the game experience if adequately integrated. At present, video games are predominantly audio-visual in delivery mode. The inclusion of car movement dynamics will add to the realism of the experience. A primary consideration is safety. In the face of an accident and the user is immersed in a disconnected game from a developing road situation, the physiological unpreparedness might be too high. The way to think of this is an airplane on a nosedive with no one to issue a brace-for-impact warning. From this paper's discussion and consideration points, we recommend that games in cars take on an augmented approach to build on the drive path environment to maintain situational awareness irrespective of the automation level. An obscured drive path would need an intervention mechanism to build awareness.

Limitations and Future Works
The experiment employed a 3D VR game prototype in place of an actual driving environment. A real driving scenario would be preferable, but the ADS technology is not fully matured; a substitute of VR has been utilized to give insights and design for future development. The limitation in this has to do with the lost sense of danger which might impact the generalization of the recognition time. However, intuitively, in a real threat scenario, threats would be processed with higher priority, not lesser. Motion sickness manifesting as mild eye fatigue was reported by one subject, but the greater majority did not have any physical discomfort. The test subjects comprised a relatively young population. In the case of older test subjects, the effects of VR usage might be more pronounced.
The design of the experiment is also not exhaustive; as a proof of concept, the study was conducted with limited test subjects, targeting university students in a controlled environment. Additionally, the setup considered a traffic scenario with no competing stimuli. Further investigations will be conducted incorporating diverse groups with a variety of stimuli in a real-world environment. Studying behavior in a real-world environment with near-natural stimulation of physical car movements, acceleration, braking, and other dynamics, will offer important information for the design of automated vehicles and infotainment systems. In the future, the 3D VR/AR game will be tested in an actual moving car following the recommendations derived from this test.

Conclusions
In conclusion, this paper has explored what the driver will be engaged with for SAE Level 3 automation and above. The proposed scheme uses VR game tasks that add value to the automation system instead of just entertaining. In this paper, the driver engagement model identified as ADS-managed content has been evaluated. The proposed scheme is the use of an AR-Game that meshes with the driving scene. During the driving scene, the user was presented with popup traffic to evaluate recognition time. We found little to no effect on recognition time when drivers engaged in an AR-Game from the experiment. In the advent of ADS systems, secondary tasks will be needful to add value to the driving experience and maintain vigilance.
From the discussion, driver monitoring through score progression on the road environment by gaming modality has been achieved. Learning, saturation, and decline profiles were identified as the prominent trends that would be useful in contextualizing the engagement model. When the ADS manages the content, it will be possible to inference the driver state with no adverse effects on recognizing threatening driving scenarios.
Gaze information results suggested that it is possible to focus drivers' visual attention and tie it to a relevant source of information using game elements. The results confirmed the anisotropy of objects in a 3D environment, as seen in the missed object profile. This is a design feature that should be considered when designing in-car games that have 3D interactivity. Further research should be conducted to understand drivers' behavior in a multi-stimuli environment and various gaming options. The overall findings indicate that gaming in-car would be advantageous with negligible impacts on road monitoring performance where such is needed.