Multirobot Conﬁdence and Behavior Modeling: An Evaluation of Semiautonomous Task Performance and Efﬁciency

: There is considerable interest in multirobot systems capable of performing spatially distributed, hazardous, and complex tasks as a team leveraging the unique abilities of humans and automated machines working alongside each other. The limitations of human perception and cognition affect operators’ ability to integrate information from multiple mobile robots, switch between their spatial frames of reference, and divide attention among many sensory inputs and command outputs. Automation is necessary to help the operator manage increasing demands as the number of robots (and humans) scales up. However, more automation does not necessarily equate to better performance. A generalized robot conﬁdence model was developed, which transforms key operator attention indicators to a robot conﬁdence value for each robot to enable the robots’ adaptive behaviors. This model was implemented in a multirobot test platform with the operator commanding robot trajectories using a computer mouse and an eye tracker providing gaze data used to estimate dynamic operator attention. The human-attention-based robot conﬁdence model dynamically adapted the behavior of individual robots in response to operator attention. The model was successfully evaluated to reveal evidence linking average robot conﬁdence to multirobot search task performance and efﬁciency. The contributions of this work provide essential steps toward effective human operation of multiple unmanned vehicles to perform spatially distributed and hazardous tasks in complex environments for space exploration, defense, homeland security, search and rescue


Introduction
Researchers have long sought to enable multiple robots working together as a team [1][2][3][4][5][6][7][8][9][10] to perform distributed tasks such as area exploration, search, and surveillance [11][12][13][14][15][16][17][18][19] and complex tasks in hostile conditions such as the assembly of structures in orbit, lunar, and planetary environments [20][21][22][23][24][25][26]. Advances in sensing, computing, and materials may make fully autonomous multirobot systems possible in certain circumstances. However, many applications will demand human supervision and intervention to satisfy safety requirements, overcome technical limitations, or authorize critical actions. Human operators will be expected to perform tasks such as approving targets and resolving navigational impasses for manned-unmanned teams with robotic or optionally manned vehicles. Near-term teams of robots and humans will benefit from the unique advantages of human cognition, reasoning, ingenuity, and soft skills. Even after significant advances, humans will likely often retain a vital role as the authority ultimately responsible for safety and operating within established constraints.
Human interaction with multiple mobile robots involves information from many sources, multiple frames of reference, and competing tasks. Factors affecting single robot control via video-based interfaces include restricted fields of view, difficulty ascertaining orientations of the environment and robot, unnatural and occluded viewpoints, limited depth information, time delay, and poor video quality [27]. Increasing the number of robots multiplies these challenges, with each robot having potentially unique and dynamic orientations, camera perspectives, and sensory frames of reference. The demands of multitasking can overload the operator and limit the scalability of human-robot interaction as the number of robots increases [28][29][30][31][32][33].
Increasing automation does not necessarily improve performance. Our group's prior user study measured search task performance with four robots operated at each of three levels of autonomy [34,35]. Automation and augmented reality (AR) graphics were intended to allow the participant to focus on higher-level tasks. With a fixed number of robots, successively higher levels of robot autonomy were expected to improve performance. However, the results revealed performance might decrease as autonomy increases past some threshold. We observed that many operators over-relied on automation and that operator inattention might have contributed to the unexpected drop in performance.
In this paper, we hypothesize that measuring attention and incorporating it as feedback in the system can mitigate these factors and improve performance. This work describes a robot confidence model which varies robot behavior in response to indicators of operator attention and the results of a user study that demonstrate its utility. The term robot confidence is used here as a metaphor to describe the mapping of attention-related inputs to robot behaviors. This research contributes techniques of incorporating operator attention as feedback to enable effective and efficient control of multiple semiautonomous mobile robots by a human operator.

Robot Confidence
Concepts related to confidence are often linked to human trust in autonomy and allocation of control or how a human operator uses available autonomy levels. Operator confidence typically refers to the self-assurance of a human in their ability to perform a task or trust in a robot's ability to function autonomously. Research includes the impact of transparency and reliability on operator confidence [36]. Models estimating human self-confidence have been developed for purposes such as automatically choosing between manual and autonomous control [37].
Research related to robot confidence is typically aimed at altering human trust in autonomy or allocating control authority. A common objective is convincing the operator to shift the allocation of control toward autonomy or manual operation as appropriate to optimize performance. For example, a robot may provide visual feedback indicating its self-confidence to influence the operator's trust [38]. Alternatively, a model of robot confidence might be used to directly distribute authority, such as setting shared-controller gains to amplify or attenuate inputs from a teleoperator and ultrasonic sensors [39]. Other research includes a robot expressing its certainty in performing policy learned from a human teacher [40][41][42] and modeling a robot's confidence in a human co-worker [43] or its ability to predict human actions in a shared environment [44]. A similar concept is algorithm self-confidence applied, for example, to a visual classification algorithm [45].
Other works have modeled aspects of human cognition enabling robots to self-assess and adapt their behaviors [46][47][48]. The authors in [49] proposed an artificial neural networkbased model of emotional metacontrol for modulating sensorimotor processes, and used intrinsic frustration and boredom to intervene in a visual search task. The authors in [50] modeled curiosity in an intrinsic motivation system used to maximize a robot's learning progress by avoiding unlearnable situations and shifting its attention to situations of increasing difficulty.

Human Interaction with Multiple Robots
Automation is necessary for a human operator to effectively control multiple robots. Research often focuses on how many robots can be operated [31] and methods to do so efficiently [28,32]. General approaches to address operator overload due to multitasking include redesigning tasks and interfaces to reduce demands, training operators to develop automaticity, improve attention management, and automating tasks and task management [51]. Research toward interaction with multiple semiautonomous robots includes task switching and operator attention allocation [52][53][54], such as identifying where an operator should focus and influencing the operator's behavior accordingly via visual cues in a graphical user interface [55]. Other work includes determining which aspects of a given task are most suitable for automation [16], measuring and influencing operator trust in team autonomy [19], using intelligent agents to help human operators manage a team of multiple robots [13], and augmented reality interfaces to integrate information from multiple sources and project it into a view of the real world using a common frame of reference [35,56,57].

Understanding the User's Intent
Dragan et al. [58] address the fundamental problem of teleoperation interfaces being limited by the inherent indirectness of these systems. The report discusses intelligent and customizable solutions for the adverse effects of remote operations. They state that the decision on what assistance must be provided to operators must be contextual and dependent on the prediction of the user's intent. Their main recommendation is that a robot learns specific policies based on examples. Chien, Lin, and Lee [52] proposed a hidden Markov model (HMM) to examine operator intent, and performed offline HMM analysis of multirobot interaction queuing mechanisms. Several groups [59][60][61] (including our own) have used eye-gaze tracking to determine the user intent for zooming the camera. Eye-gaze data clusters can be used as inputs into a classification algorithm (such as one based on linear discriminant analysis) to determine the user intent for zooming the camera. Similarly, [59] uses simultaneous eye-gaze displays of multiple users to show their mutual intent.
Goldberg and Schryver [60] developed an off-line method for predicting a user's intent to change or maintain the zoom (i.e., magnification/reduction) and gave camera zoom control as an example application. Latif, Sherkat, and Lotfi [62,63] proposed a gaze-based interface to drive a robot and change the on-board camera view. Zhu, Gedeon, and Taylor [64] developed a gaze-based pan and tilt control that continually repositioned the camera viewpoint to bring the user's fixation point to the video screen center. Kwok et al. [59] applied eye-gaze tracking to allow control of two bimanual surgical robots by independent operators.

Eye Tracking for Human-Robot Interaction
Many metrics have been proposed for evaluating the performance of human-robot teams [53,65,66]. However, operator awareness, intent, and workload are influenced by various factors linked to task conditions, human perception, and cognition, which make these challenging to define and measure. Established methods are typically subjective [67][68][69][70][71][72][73][74]. A number of physiological indicators can be observed, including many related to eye movements that are measurable using non-invasive techniques [69,75,76].
Attention is a cognitive function and thus is difficult to measure directly. Eye tracking technology enables physiological measurements linked to various aspects of human cognition, including attention [77][78][79][80][81]. The role of eye movements and visual attention in reading, scene perception, visual search, and other information processing has long been studied [82][83][84][85]. Eye gaze describes the point where a person is looking. A fixation is a relatively stable visual gaze at an area of interest. A saccade is a rapid ballistic movement to a new area of interest. The detection of fixations and saccades from raw eye movement data was a principal focus of early work toward eye-tracking for human-computer interaction (HCI) [60,[86][87][88][89][90]. Advances in video-based eye tracking [91,92] and automatic fixation and saccade detection [93] have inspired applications for human-computer interaction (HCI) [94,95].
Gaze-directed pointing is the archetype of interactive eye-tracking use cases and core motivation for fundamental work such as fixation and saccade detection [93]. Overt attention directed at a user interface element is a strong indicator of user intent to interact with that element. Spatial input is a highly intuitive use of eye-tracking for interactive systems. Considerable research has been conducted toward gaze-based pointing [77][78][79]82,86,87,94,96]. Interactive applications include hands-free user input for the disabled [96,97], camera control [60][61][62][63][64]98,99], and automotive applications [100][101][102][103]. Techniques have been developed to teleoperate a robot using eye gaze, including an interface to drive a robot and change the view of an on-board camera [62,63]. The user interface featured graphical overlays for control elements. Gaze input commands were activated by either dwell-time or a foot clutch, enabling hands-free teleoperation.
More subtle than gaze-based control, the human eye is a unique window into perception and cognition processes. Eye gaze has been used as a proxy for attention [77][78][79]. The authors in [80] modeled dynamic operator overload based on the operator's attention to a critical situation associated with impending failure. The response time before initial fixation represented delayed attention. The number of fixations on an object represented the allocation of attention. Fixation has been applied as a measure of attention allocation for an online predictive model of operator overload during supervisory control of multiple unmanned aerial vehicles (UAVs) within a simulation environment [81]. A logistic regression model, developed to predict vehicle damage when an operator failed to correct a collision course, was applied to generate real-time alerts. The model was a function of the delay prior to allocating visual attention to the vehicle, how much attention was diverted away from the vehicle once attended, and how much time remained before the collision will occur. Fixation has also been used to measure situation awareness [104,105], operator fatigue [106], and workload [75,76,[107][108][109][110].

Augmented Reality for Human-Robot Interfaces
Human control of multiple mobile robots requires considerable divided attention, integrating information from many sources, and switching between multiple frames of reference. Projecting sensed data onto the real-world scene, at the point of observation or the point being observed, may help alleviate the cognitive burden of mentally integrating information from various sources. Augmented reality (AR) is the registration and visual integration of computer-generated graphics and real-world environments [111,112]. Demonstrated techniques include overlaying sensed data onto individual robots via wearable head-up display [56] and superimposing arrows on 20 robots to create a gradient toward a target location [57].
Telerobotic systems very often rely on real-time video from the perspective of or external to the robot. One challenge of teleoperation is limited visuospatial perspective. AR techniques such as color-coded orientation cues that visually map controller input axes (on the joystick hardware) to end effector axes (on the display) can improve telemanipulator navigation, with significant reductions in trajectory distance, deviations from the ideal path, and navigation error [113].
AR can also reduce visual search and mental integration demands. During traditional neuronavigation, a surgeon must mentally transform two-dimensional medical imaging data into three-dimensional structures and project this information on her or his view of the patient. Systems for augmented neuronavigation can perform transformations by computer and display composite video with models of structures of interest projected on the surgical site, resulting in significantly lower task time and fewer errors [114].

Robot Confidence Model
A discrete robot confidence model was developed to dynamically adapt the behavior of a robot in response to operator attention directed at the robot or its activities. For this work, robot confidence is a value that increases upon attention to the robot and decreases over time while the operator does not attend the robot. has a corresponding weight in p which can be used to bias the inputs. The maximum value u of the weighted inputs is computed to establish the highest estimate of attention given the inputs. Robot confidence c k at the current timestep k is a function of the maximum weighted input (u), the confidence at the preceding timestep, and a constant minimum confidence value. The previous confidence value is decreased by a constant decrement value before taking the maximum value of these inputs. Similar to the calculation of u, the maximum value is again taken to yield the highest confidence given all inputs, feedback, and constraints. The computed confidence value is then used to adapt robot behaviors according to predefined rules.
back, and constraints. The computed confidence value is then used to adapt robot behaviors according to predefined rules.
The key features of this model are the aggregation of weighted attention indicators as a maximum value, a second maximum for the actual confidence value calculation, and the decremented previous confidence value. It should be noted that confidence at any given time might exceed the maximum weighted input ( ) if the decremented previous confidence is higher. That is, confidence can only decrease by at most , even when has a lower value than the decremented confidence ( − ). In other words, a sudden drop in attention does not result in an immediate dramatic reduction in confidence. Instead, confidence gradually decreases over time. Additionally, note the confidence value continues decreasing during long periods of inattention but does not drop below the minimum value . Figure 1 acknowledges that robot behaviors observed by the operator may influence attention. Although this could be exploited to create a second feedback loop by, for example, exercising exaggerated or unexpected motions or flashing visual alerts, our implementation of the model sought to avoid overtly influencing attention in order to minimize external factors affecting overall system performance.  The key features of this model are the aggregation of weighted attention indicators as a maximum value, a second maximum for the actual confidence value calculation, and the decremented previous confidence value. It should be noted that confidence at any given time might exceed the maximum weighted input (u) if the decremented previous confidence is higher. That is, confidence can only decrease by at most c d , even when u has a lower value than the decremented confidence (c k−1 − c d ). In other words, a sudden drop in attention does not result in an immediate dramatic reduction in confidence. Instead, confidence gradually decreases over time. Additionally, note the confidence value continues decreasing during long periods of inattention but does not drop below the minimum value c min . Figure 1 acknowledges that robot behaviors observed by the operator may influence attention. Although this could be exploited to create a second feedback loop by, for example, exercising exaggerated or unexpected motions or flashing visual alerts, our implementation of the model sought to avoid overtly influencing attention in order to minimize external factors affecting overall system performance.
For n binary indicators of operator attention in x and corresponding parameters in p, Equation (3) computes a maximum weighted input u, where p • x is the element-wise product of x and p. Equation (4) defines robot confidence c k at timestep k, where c d is a confidence decrement subtracted from the previous confidence value c k−1 and c min is a minimum confidence value. Figure 2 illustrates an implementation of the model. The diagram shows how confidence value changes during a notional sequence of inputs. This example has an input vector x = [x 1 x 2 ] with two binary indicators of attention x 1 , x 2 ∈ [0, 1] and two associated weight parameters p = [p 1 p 2 ] = [25 10], a confidence decrement c d = 10, and minimum value c min = 0. min a minimum confidence value. Figure 2 illustrates an implementation of the model. The diagram shows how confidence value changes during a notional sequence of inputs. This example has an input vector = with two binary indicators of attention , ∈ 0,1 and two associated weight parameters = = 25 10 , a confidence decrement = 10, and minimum value = 0.  Starting with initial robot confidence c 0 = 0, Figure 2 depicts the following sequence of events: This example highlights several key features of the model. First, multiple simultaneous indicators of attention are not cumulative. The indicator with the highest weighted value takes precedence. Second, consecutive instances of the same indicator are not cumulative. Confidence is sustained at the same value until the indicator is no longer present or another indicator takes precedence. Lastly, confidence decreases at most c d per timestep and never drops below the floor value c min . In other words, confidence never decreases by more than c d even when the inputs would otherwise yield a lower confidence value.
The model uses a maximum-value approach to aggregate indicators of attention and update the confidence value. The weighted maximum in Equation (3) for input aggregation avoids the problem of multiple indicators influencing confidence more or less depending on whether they are registered simultaneously. The maximum in Equation (4) to update the confidence value prevents consecutive indicators from having a cumulative effect, especially for high-frequency indicators such as fixations. Thus, the model accommodates indicators that might occur simultaneously but at different and potentially variable frequencies.
For example, the indicators implemented for the user study described later are eye gaze fixations and direct input commands from the operator. A single fixation duration could be less than 100 ms, and multiple fixations are likely between operator commands, which might occur seconds or minutes apart. In addition, multiple fixations are likely during periods of focused attention, but the number and duration of fixations can vary. These could be a relatively small number of long fixations or a higher number of short fixations. Individual operator differences and the design of fixation detection algorithms also contribute to variability in fixation counts and durations.

Multirobot Test Platform with Eye Tracking
A multirobot test platform was developed to implement and test the robot confidence model described above. The platform consisted of four semiautonomous robots in a flat, unobstructed environment, an operator control station, and a camera mounted above the operating environment that supplied video to the control station. Figure 3 shows the control station which the operator used to command and observe the robots. The display showed video of the test environment with graphics superimposed to provide information about the robots, tasks, and user input. The user inputted robot trajectories using a computer mouse (not shown). An ET1000 (The Eye Tribe ApS, Copenhagen, Denmark ) eye tracker with a reported accuracy of 0.5 • -1 • was mounted below the display and used to obtain eye-gaze data at 60 Hz for online estimation of screen coordinates where the operator directed their attention. Video S1 in the Supplementary Material highlights the robots as shown by the operator display (0:02), the operator in contact with forehead and chin rests to stabilize head position and orientation (0:12), and the eye tracker located below the display (0:24). Figure 4 summarizes how the test platform was used to implement and test the confidence model. The eye tracker seen in the left panel ( Figure 4a) provided gaze data which were processed to estimate operator attention directed at specific robots. These data and direct operator interaction with robots via command input were used to compute a confidence value for each robot. This value increased in response to attention directed at a robot and decreased over time while the operator attended to other objects. The center panel (Figure 4b) shows a visual representation of the confidence value. The platform could superimpose a light green arc on the robots to communicate confidence values to the operator; however, this graphic was not employed for the study presented here. The arc would gradually shorten to indicate decreasing confidence during periods of inattention to the robot, as depicted in the series of three time-lapsed images from top to bottom within the panel. The behavior of a robot changed according to its confidence value. A user study was conducted to measure search task performance and efficiency in relation to confidence and behavior changes, represented by the right panel (Figure 4c).  (a) (b) (c) Figure 4 summarizes how the test platform was used to implement and test the confidence model. The eye tracker seen in the left panel ( Figure 4a) provided gaze data which were processed to estimate operator attention directed at specific robots. These data and direct operator interaction with robots via command input were used to compute a confidence value for each robot. This value increased in response to attention directed at a robot and decreased over time while the operator attended to other objects. The center panel (Figure 4b) shows a visual representation of the confidence value. The platform could superimpose a light green arc on the robots to communicate confidence values to the operator; however, this graphic was not employed for the study presented here. The arc would gradually shorten to indicate decreasing confidence during periods of inattention to the robot, as depicted in the series of three time-lapsed images from top to bottom within the panel. The behavior of a robot changed according to its confidence value. A user study was conducted to measure search task performance and efficiency in relation to confidence and behavior changes, represented by the right panel (Figure 4c).    Figure 5 shows the robots within the test environment (left) and the presentation of this environment augmented with superimposed graphics via the operator display (right). The robots were 23 cm (9 in) wide and 25.4 cm (10 in) long. Independently driven rubber tracks enabled differential steering, including pivot turns (i.e., turning in place). Design details are available at https://github.com/lucas137/trackedrobot. The AprilTag visual fiducial system [115,116] was used to estimate the location and orientation of the robots in the test environment using video frames from the overhead camera. These data were used to identify operator interactions with specific robots, predict collisions, and superimpose graphics. A solid black circle was drawn on the robots as a high-contrast background for additional color-coded graphics, including a smaller solid light gray circle indicating the robot's navigation status. The operator defined robot trajectories by inputting one or more waypoints which the robot automatically maneuvered to visit. The platform predicted robot collisions based on their trajectories, suspended the motion of robots just prior to the collision, and changed the color of the robots' status graphic from light gray to orange. The robots remained suspended until the operator canceled or redefined one of the trajectories to resolve the collision. Software-defined obstacles were used to create navigation constraints. These virtual obstacles were drawn on the operator display as solid black rectangles. The platform did not allow the operator to input waypoints that would result in a trajectory crossing an obstacle. A continuous obstacle was placed around the perimeter of the test environment to prevent the operator from defining trajectories outside this space. The platform had two trajectory input modes. The operator initiated the primary input mode by left-clicking a robot and then inputted waypoints by moving the mouse to each desired waypoint and using a single click to add it to the trajectory. The input was concluded by double-clicking to add the final waypoint. The robot then immediately executed the trajectory until it reached the final waypoint, or the operator canceled execution. The operator could cancel execution by clicking the robot, which stopped all motion and discarded the remaining unfinished trajectory. A trajectory being inputted could be discarded prior to execution by right-clicking anywhere on the screen. Inputted trajectories were drawn with a thin white line from the robot's location to its current destination waypoint and between each subsequent waypoint. This provided the operator a visual representation of the remaining path during execution. During trajectory input, a thicker line and a circle at each waypoint were drawn. A light blue circle at the mouse cursor location indicated the pending waypoint that would be added upon a single or double The software projected graphics in relation to objects in the test environment, presented the resulting composite video via the operator display, computed confidence values for each robot based on operator attention, and issued commands to the robots according to motions requested by the operator, robot confidence, and behavior rules.
The platform had two trajectory input modes. The operator initiated the primary input mode by left-clicking a robot and then inputted waypoints by moving the mouse to each desired waypoint and using a single click to add it to the trajectory. The input was concluded by double-clicking to add the final waypoint. The robot then immediately executed the trajectory until it reached the final waypoint, or the operator canceled execution. The operator could cancel execution by clicking the robot, which stopped all motion and discarded the remaining unfinished trajectory. A trajectory being inputted could be discarded prior to execution by right-clicking anywhere on the screen. Inputted trajectories were drawn with a thin white line from the robot's location to its current destination waypoint and between each subsequent waypoint. This provided the operator a visual representation of the remaining path during execution. During trajectory input, a thicker line and a circle at each waypoint were drawn. A light blue circle at the mouse cursor location indicated the pending waypoint that would be added upon a single or double click. Invalid pending waypoints were drawn yellow, and a yellow box was drawn around the obstacle the proposed trajectory would violate.
The second trajectory input mode provided the operator a means to orient the robot and move in tight quarters. The operator initiated this input by clicking a robot, holding the mouse button, and dragging the mouse to a destination waypoint. The robot immediately executed a pivot turn to face the destination and then moved in a straight line until either the robot reached it or the operator released the mouse button. If the destination was behind it, the robot pivoted to align its back with the destination and then moved backward to reach it. Otherwise, the robot pivoted to align its front and moved forward. If the operator moved the mouse while still holding the mouse button, the destination changed accordingly. The robot pivoted again as needed to face the new destination before resuming its motion toward it. As the robot always pivoted first, the operator could reorient the robot at its current location by picking a destination in the desired direction and releasing the mouse button when the robot achieved its new orientation but before it started moving toward the destination. The robot confidence model was implemented with two indicators of operator attention (x): operator eye gaze fixations on the robot and operator input commands to the robot. For both indicators, the weighed maximum parameter value was 240 (i.e., p = [240 240]). The confidence decrement was c d = 1 and the minimum value c min = 0. Each robot's confidence value was updated at approximately 60 Hz, when data samples from the eye tracker were processed to determine fixations. Operator input was processed at approximately 24 Hz, the test platform video display framerate.
Three robot behavior modes were defined with rules affecting a robot's velocity according to its confidence value while operating autonomously: • Velocity boost; • Velocity drop; • Constant velocity (control).
In velocity boost mode, a robot moved faster than the nominal baseline velocity during periods of high confidence. Similarly, velocity drop mode reduced robot speeds during periods of low confidence. Robot speed was not affected by confidence in constant velocity mode, which served as a control for comparing performance in the other modes.

Experiment
A user study was conducted to measure search task performance and efficiency with respect to the 3 behavior modes (velocity boost, velocity drop, and constant velocity). Figure 6 contains a video frame from the control interface with added labels to point out features of the study. The task was to find multiple targets distributed within the test environment. The positions of these targets were software-defined, similar to the obstacles described above, but were hidden until located by the user. A robot "detected" a target when it was positioned within a configured distance, faced the target within ±45 • , and had no obstacles between it and the target. As seen at the top of Figure 6, a green line was drawn to indicate a target detection. The line started at the robot and went through the target to a point beyond it. The length of this line was constant to avoid revealing the exact target location.
Video S2 in the Supplementary Material contains annotations noting target detection (0:07, 0:39, 0:46, 1:17) and target location (1:01, 1:24) events. To localize a target, the user positioned two robots to detect it, resulting in detection lines intersecting at the target point. The user then clicked the intersection to get credit for finding the target. This requirement was designed to simulate a real-world task that must be accomplished using multiple robots. A green circle with a light gray border was drawn to indicate a located target (top of Figure 6). A target could only be located once during the study trial, so this graphic persisted for the remaining duration of the trial.
The path plan on the left side of Figure 6 shows a robot trajectory inputted by the user. In addition, small blue circles were drawn to mark the robot's location at regular time intervals, as seen on the right side of the figure. These persistent breadcrumbs provided a history of areas explored in search of targets. Study participants were asked to locate as many hidden targets as possible during each 5-min trial. A circular light green trial countdown graphic, seen in the center of Figure 6, shortened until disappearing at the end of the trial and displayed the number of seconds remaining.
In velocity boost mode, a robot moved faster than the nominal baseline velocity during periods of high confidence. Similarly, velocity drop mode reduced robot speeds during periods of low confidence. Robot speed was not affected by confidence in constant velocity mode, which served as a control for comparing performance in the other modes.

Experiment
A user study was conducted to measure search task performance and efficiency with respect to the 3 behavior modes (velocity boost, velocity drop, and constant velocity). Figure 6 contains a video frame from the control interface with added labels to point out features of the study. The task was to find multiple targets distributed within the test environment. The positions of these targets were software-defined, similar to the obstacles described above, but were hidden until located by the user. A robot "detected" a target when it was positioned within a configured distance, faced the target within ±45°, and had no obstacles between it and the target. As seen at the top of Figure 6, a green line was drawn to indicate a target detection. The line started at the robot and went through the target to a point beyond it. The length of this line was constant to avoid revealing the exact target location. Figure 6. The control interface displayed video with superimposed graphics, including robot paths, obstacles, targets, and a countdown timer showing how many seconds remained during a user study trial. Figure 6. The control interface displayed video with superimposed graphics, including robot paths, obstacles, targets, and a countdown timer showing how many seconds remained during a user study trial.
A within-subjects design was selected to minimize participant-level variations such as spatial ability [117][118][119]. Three sets of 11 target locations were used to enable repeated measures for each robot behavior mode. Targets were randomly selected such that the overall difficulty of finding targets was roughly equal for all 3 sets. Each participant completed 9 study trials during a single session, 1 trial per combination of robot behavior mode and target set. Thus, 3 repeated measures for each behavior mode per participant. The presentation of behaviors and target sets was randomized and presented in counterbalanced order. Participants were presented with on-screen instructional material, received hands-on training, and completed self-paced practice exercises to develop proficiency using the test platform and performing the search task. The platform's eye tracker was calibrated for each participant. A smaller number of targets that were easier to find than the 3 study target sets were used for the training trials to ensure participants would quickly discover them and gain experience completing the target localization task. In all cases, the number of discoverable targets was not revealed to participants during the study.
The study collected data for three search task metrics. Search performance was measured for each trial by recording the ratio of targets detected at least once by any of the robots and the ratio of targets located by the study participant. Search efficiency was computed for each trial by multiplying the ratio of targets located by the average robot motor speed during the trial. Motor speeds were normalized values ranging from 0 (no motion) to 1 (maximum speed capable). Thus, all three search task metrics could range from 0 to 1, with 1 being the best value possible.

Data Analysis
To understand how the robot behavior modes and other factors were related to the observed search performance and efficiency, mixed-effects regression models were constructed to explain the observed data by study trial. Linear mixed-effects models offer a robust statistic method capable of handling a variety of situations such as unbalanced data and missing values, and can be extended via generalized linear mixed-effects models to analyze data with non-normal error distributions [120][121][122][123][124]. Regression analyses were conducted using R (version 3.6.1) [125]. Linear mixed-effects models were fit by maximum likelihood using the lmer function of the R package lme4 (version 1.1.21) [126]. The general form of the R formulas used to specify the models is: where behavior is an unordered categorical variable with 3 levels for the robot behavior modes (velocity boost, velocity drop, constant velocity), confidence is a continuous variable for average robot confidence value during a given trial, targetset is an unordered categorical variable with 3 levels identifying which target set was used for the trial, and time is a continuous variable for time of day in decimal hours. The explanatory variables for robot behavior mode, confidence value, target set, and time of day were entered into the model as fixed effects terms. The behavior and confidence terms were the primary interest, while target set and time of day were included to account for variations in the data that may be due to these other factors. Both continuous variables, confidence and time of day, were centered and scaled for model fitting. The model also included a random effects term with random intercepts by participant to account for correlation due to repeated measures.
The response variable y is the measure of performance for which the model is fitted. A total of three models were fitted: • Ratio of targets detected; • Ratio of targets located; • Search task efficiency.
The PBmodcomp function of the pbkrtest package (version 0.4.7) [127] was used to perform parametric bootstrap model comparisons to test whether each explanatory variable contributed significantly to the model fit. For each comparison, PBmodcomp compared the full model with a reduced model which omitted the variable being tested, and reported the fraction of simulated likelihood ratio test (LRT) values greater than or equal to the observed LRT value. A total of 30,000 simulations were performed per comparison.

Results
Data were collected with 12 healthy volunteers who had normal or corrected-tonormal vision (three females, nine males; mean age = 28.9, SD = 4.4). Each study session was approximately 2 h in length, during which all nine study trials were tested with the same participant, one trial per combination of the within-subject conditions (robot behavior mode and target set) for a total of 108 samples collected. About 1 h of the session was spent inducting the participant, conducting practice trials before the study trials, and receiving feedback from the participant after all trials were completed. The size of the study was in-part influenced by the two-factor counterbalancing scheme used to balance both robot behavior and target set, preliminary data collection, and prior studies conducted by our group.
Linear mixed-effects models were fitted to explain the observed search task metrics data while accounting for correlation due to repeated measures with each participant. Figure 7 shows added variable plots with fitted lines from each mixed-effects model, with grouping levels for robot behavior mode and target set. Marginal prediction values were obtained via the predict.merMod function from the lme4 package [126]. The fitted models have two continuous variables: average robot confidence and time of day. We focused on confidence as the continuous variable of interest, while holding time at its median value. Raw confidence values were projected onto the x-axis to show the distribution of data with respect to fitted values. The y-axes are set to the same scale for comparison. Parametric bootstrap model comparisons were used to test the contribution of each explanatory variable to the fitted mixed-effects models. The test statistic was the ratio of simulated likelihood ratio test (LRT) values greater than or equal to the observed LRT value. p-values less than 0.05 indicated a model term that contributed significantly to the model fit (i.e., removing the term from the model significantly decreased the goodness of fit). Table 1 summarizes the model comparison results. No main effect of robot behavior mode was found for all three metrics (p ≥ 0.32). In other words, whether and which confidence-based behavior (or the control) was used for a given trial does not help account for differences in search performance or efficiency. However, a significant main effect of average robot confidence was observed for targets detected and located (both p < 0.01) as well as search efficiency (p < 0.05). In addition to these principal results, Table 1 shows results for the target set and time terms. Although care was taken to select random target locations such that the target sets were of equal difficulty, it is reasonable to expect some variation in the data due to differences between the sets. Time of day was included as a fixed-effects term to account for potential variation due to circadian rhythm (e.g., operator fatigue). However, neither target set or time contributed significantly to any of the model fits (p ≥ 0.074 and p ≥ 0.18, respectively).

Discussion
This work developed a generalized robot confidence model which transforms multiple indicators of operator attention to a single confidence value which can be used to adapt robot behaviors. Specifically, we employed confidence as a metaphor relating indicators of operator attention and robot behaviors which respond to these indicators, and observed The fitted lines for ratio of targets detected (Figure 7a) exhibit less variation than those for ratio of targets located (Figure 7b) and search task efficiency (Figure 7c). Predicted values for target set three (dashed line) reflect somewhat lower performance across all three metrics, perhaps indicating this set of targets was generally more difficult to find than the other sets. Taking this into consideration, predictions for targets located and efficiency are noticeably higher for both velocity boost and velocity drop robot behavior modes versus the control (constant velocity mode). However, these differences appear similar in magnitude to those by target set. The slopes of the fitted lines indicate potential positive relationships observed between average robot confidence by trial and all three measures of performance. The slopes for targets detected and targets located appear similar, and higher than search task efficiency.
Parametric bootstrap model comparisons were used to test the contribution of each explanatory variable to the fitted mixed-effects models. The test statistic was the ratio of simulated likelihood ratio test (LRT) values greater than or equal to the observed LRT value. p-values less than 0.05 indicated a model term that contributed significantly to the model fit (i.e., removing the term from the model significantly decreased the goodness of fit). Table 1 summarizes the model comparison results. No main effect of robot behavior mode was found for all three metrics (p ≥ 0.32). In other words, whether and which confidence-based behavior (or the control) was used for a given trial does not help account for differences in search performance or efficiency. However, a significant main effect of average robot confidence was observed for targets detected and located (both p < 0.01) as well as search efficiency (p < 0.05). In addition to these principal results, Table 1 shows results for the target set and time terms. Although care was taken to select random target locations such that the target sets were of equal difficulty, it is reasonable to expect some variation in the data due to differences between the sets. Time of day was included as a fixed-effects term to account for potential variation due to circadian rhythm (e.g., operator fatigue). However, neither target set or time contributed significantly to any of the model fits (p ≥ 0.074 and p ≥ 0.18, respectively).

Discussion
This work developed a generalized robot confidence model which transforms multiple indicators of operator attention to a single confidence value which can be used to adapt robot behaviors. Specifically, we employed confidence as a metaphor relating indicators of operator attention and robot behaviors which respond to these indicators, and observed correlations between average confidence and three measures of multirobot search performance and efficiency.
Prior works related to robot confidence have focused on the allocation of control between human and robot [39], influencing operator behavior [38], or otherwise directly communicating the robot's self-assessed state [40][41][42][43]. Other work is aimed at intrinsic motivations of the robot [49,50] or understanding of its environment [44]. While our model of confidence could drive overt feedback to the operator or be applied only to internal processes of the robot, the implementation presented here is directed at minimally intrusive adjustment of physical behavior to mitigate the challenges of human interaction with multiple mobile robots. The online application distinguishes this work from others which estimated operator attention offline [52] or used human eye gaze for training [128].
Our model produces a confidence value for each robot using a weighted-maximum to aggregate any number of inputs that may exhibit a high degree of variability, such as eye gaze fixations near a point of interest, along with a decremented previous value as feedback and a minimum confidence limit. A maximum-value approach was used to aggregate attention indicators and update the confidence value (see Equations (3) and (4)). This approach makes selective use of the available information to determine the confidence value. Future work might explore more sophisticated methods such as artificial neural networks and learned behaviors [49,50], hidden Markov models (HMMs) [52], and graph convolution networks [128].
We implemented the proposed robot confidence model using eye gaze fixation and user input as indicators of attention, along with adaptive behaviors which were automatically selected at threshold levels of confidence. The resulting system assessed operator attention in real-time to determine the confidence value of each robot and altered robot behavior accordingly.
We expected the user study to demonstrate improved search task performance and efficiency when robot velocity increased or decreased in response to high or low robot confidence, respectively. Parametric bootstrap comparisons of mixed-effects models found this confidence-based robot behavior was not significant to models explaining the observed search task metrics. Instead, we found the by-trial average confidence value contributed significantly to the models. This finding was evidence of positive relationships with all three metrics: targets detected, targets located, and search efficiency. This result suggests the confidence value itself has utility as a predictor of task performance and efficiency.
Future work might incorporate our confidence model as a real-time predictor of search task outcomes which can be leveraged to improve effective human supervision of multiple mobile robots in the field or for training. For example, graphics might be superimposed on a control interface display to communicate robot confidence to the operator to inform the allocation of resources and decision making. Our implementation centered on robot velocity. Future work might examine other robot behaviors, including team behaviors, adapted according to real-time confidence.
We envision a team of robots that will, through both imitation and reinforcement learning, automatically create robot behavior policies that improve performance. In this scenario, the human operator would seamlessly (via observation and interaction) adjust the robot's confidence based on its performance. When mission objectives or milestones are reached, the robot will again be rewarded to boost confidence. Additionally, other factors can be added to attenuate robot behavior which includes signal time delay, signal corruption, power degradation, or even terrain constraints. Robot behavior policies such as "stay in pairs" or "move to be in a camera's field of view" could automatically be learned and executed. Robot teams pairing with humans would, over time, become more efficient.
Furthermore, if simulation environments with sufficient resolution could be developed, robot policies under various configurations and conditions could be learned over many iterations in simulation. Applications include using duplicate robots in controlled environments for policy training before execution in the field, for example, training with terrestrial robots to develop policies for planetary exploration.

Conclusions
In this paper, we hypothesize that measuring attention and incorporating it as feedback in the system can mitigate factors affecting the function of multiple semiautonomous robots and improve performance. We presented a generalized robot confidence model which transforms key operator attention indicators to a robot confidence value for each robot to enable the robots' adaptive behaviors. This model was implemented using operator eye gaze fixations and command inputs as the attention indicators, and successfully evaluated to reveal evidence linking average robot confidence to multirobot search task performance and efficiency. This work provides essential steps toward effective human operation of multiple unmanned vehicles to perform spatially distributed and hazardous tasks in complex environments for space exploration, defense, homeland security, search and rescue, and other real-world applications.