How Can Physiological Computing Beneﬁt Human-Robot Interaction?

: As systems grow more automatized, the human operator is all too often overlooked. Although human-robot interaction (HRI) can be quite demanding in terms of cognitive resources, the mental states (MS) of the operators are not yet taken into account by existing systems. As humans are no providential agents, this lack can lead to hazardous situations. The growing number of neurophysiology and machine learning tools now allows for efﬁcient operators’ MS monitoring. Sending feedback on MS in a closed-loop solution is therefore at hand. Involving a consistent automated planning technique to handle such a process could be a signiﬁcant asset. This perspective article was meant to provide the reader with a synthesis of the signiﬁcant literature with a view to implementing systems that adapt to the operator’s MS to improve human-robot operations’ safety and performance. First of all, the need for this approach is detailed regarding remote operation, an example of HRI. Then, several MS identiﬁed as crucial for this type of HRI are deﬁned, along with relevant electrophysiological markers. A focus is made on prime degraded MS linked to time-on-task and task demands, as well as collateral MS linked to system outputs (i.e., feedback and alarms). Lastly, the principle of symbiotic HRI is detailed and one solution is proposed to include the operator state vector into the system using a mixed-initiative decisional framework to drive such an interaction.


Introduction
In recent years, user state monitoring based on psychophysiological and neuroscientific methods has developed in various fields, such as in the gaming and transportation domains [1,2]. However, to this day, and to our knowledge, these methods are mostly used for ex post analyses and are seldom implemented to provide online measures and system adaptation. Yet, with the rise of increasingly complex and autonomous systems, the state of the human agent is of crucial interest to enhance both operation safety and performance, be it for local or remote operation. What is more, operations are also being increasingly performed at a distance. That is why the following subsections detail the need for human-centered research in remote human-robot interaction (HRI) and, more specifically, a physiological feature-based approach, as well as what are the interaction modes and autonomy levels to consider for taking into account the human agents' state derived from these physiological features. It should be noted that this article is not centered on safety assessment; therefore, we recommend readers to refer to Reference [3,4] for details on HRI safety.

Interaction Modes and Autonomy Levels
Fairly intuitively, one can identify two general modes of interaction between humans and robots/artificial agents for remote operation: supervisory control vs. direct control. However, the difference might not be that drastic and interaction modes could in fact be viewed as a continuum [6,[16][17][18], depending on: • the frequency of human intervention; • the type of control (i.e., manual vs automatic); • and the embedded capacities of the robots/artificial agents (i.e., to what extent they can achieve tasks autonomously).
While automation can be seen as replacing routine manual processes, autonomy is referring to tsomething more complex, emulating human processes rather than replacing it [19]. In the literature, there are differing views of what "autonomy" is. Here, we will consider a continuum that is reflected by the various degrees or levels of system autonomy [20] ranging from what is usually considered as true teleoperation, a.k.a. direct control, with no artificial support at all and the human who does all the work, to the opposite case of no human intervention and the artificial agent that does all the work, a.k.a. an extreme form of supervisory control [21]. The use of such extreme setups is scarce and usually the interaction relies on more mitigated levels of autonomy. In addition, having a fully autonomous system does not mean that humans will necessary be excluded from the loop. Indeed, rule of engagement [22] or ethical decisions [16,23] are, until now, preferably entrusted to a human agent decision-making process. New forms of adaptive or adjustable autonomy levels have been designed to take into account the involvement of the human operator [24] and to answer a need for authority sharing while modeling conflicts between human and artificial agents [25,26]. In a human-centered point of view, the systems can help the operator, for instance, by means of an artificial cognitive agent during the mission [27]. In another vein, the mixed-initiative framework proposes to the humans and artificial agents to opportunistically seize the initiative from each other [28]. This idea has been proposed in order to ease the control of large robotic teams by a human operator [29]. But, the open question is how to determine when, or quantify why, a given agent should take over the other during mission execution.
To this day engineers and researchers mostly use activity modeling and sometimes subjective [30] and behavioral data [31] to determine these autonomy levels [32]. However, as stated above, since human performance cannot reflect all the mental phenomena that arise during operations, there is a need for an in-depth evaluation of operators' mental states using physiological measures. The offline use of physiological measures to assess professional tasks' operation is a first step towards increases in both performance and safety. Yet, a step further is the online adjunction of information about the human operator directly into the system. This is known as physiological computing [33], and such systems can be called biocybernetic [34], or, more recently, symbiotic systems [35], passive brain-computer interfaces [36], or physiologically attentive user interfaces [37,38]. Such systems take as inputs physiological parameters from the operator and thanks to various processing methods, which generally include a machine learning step, they can derive an estimation of a given mental state [39]. Hence, global systems that are composed of human and artificial agents and which take information on all involved agents would allow dynamically reallocating tasks between humans and automation, a challenge listed by Sheridan [6].
This task reallocation, which can be roughly defined as a Mixed-Initiative Interaction (MII) [28,40,41], is particularly interesting as it will mitigate the occurrence of critical situations. MII is a promising and flexible framework that offers the possibility to integrate the notion of agents' current capabilities [42]. An MII system would allow the best current agent to seize control when necessary. However, it implies using of agent monitoring systems, potentially comprising physiological computing tools when a human agent is considered. To better detail the current research on physiological computing and how we argue it could successfully be applied to HRI, and, in particular, to remote operation, in the following sections, mental states that are deemed relevant to characterize and estimate in a remote operation framework are defined, along with their classical electrophysiological markers. Then, details are given about the current research on how to estimate these mental states and how to integrate this information into the whole system.

Situation Awareness, Resource Engagement and Associated Mental States
Humans' mental states are numerous, and it seems impossible-and possibly even irrelevant-to try and estimate every one of them. However, several ones play a major part in error occurrence and are therefore particularly relevant to characterize and estimate in order to improve human-system interaction in a general manner, including human-robot interaction, in the case of remote operation. In the Human Factors domain, a mental state that has gathered much attention since its creation in the aeronautical context is Situation Awareness (SA). Endsley defined SA as "the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future" [43]. Therefore, lacks of SA can occur due to difficulties in perception (low level) and/or in comprehension and projection (high level) [44]. With the current rise in automation development, the challenge is to design systems that provide sufficient information to the operator to compensate for the cues that are not perceived directly (see Endsley [45] for a review). Cognitive processes, such as perception, attention, memory, and integration processes, are necessarily involved for SA to occur. Lacks of SA-due to either low and high level impairment-result in performance deterioration, such as piloting errors, and can therefore have critical results. As indicated by Endsley and collaborators Endsley [45], up to 76% of SA errors in pilots would be due to a problem in perception either due to system failures or cognitive issues (e.g., perceptual or attentional failures). Fatigue and attentional problems, as well as elevated stress and workload, are several well known mental states that impact SA [45,46].
Due to its multifaceted nature, SA is difficult to directly measure at the physiological level. Therefore, researchers mainly focus on mental states that are linked to SA and have physiological markers that are easier to detect. These states are all dependent on resource engagement. Several researchers proposed that the existence of a finite set of information-processing resources would explain the occurrence of performance degradation under heavy task demands or concurrent tasks performance [47]. Therefore, over-engagement can be seen as the fact of engaging all the resources for processing only one sub-task or one sensory canal (e.g., vision; a.k.a. attentional tunneling), while disengagement can be seen as the fact of reallocating the resources to another-usually internal-task [48][49][50]. Since both over-engagement and disengagement lead to performance degradation, it seems reasonable to estimate resource engagement and, more particularly, to detect resource depletion.

Prime Mental States
Several factors, external and internal, can generate such a depletion of resources. Among these, one can list the time spent on a task, also called time-on-task, and task demands. These two factors are usually main characteristics of the task at hand, they relate to a temporally global resource engagement, and both directly generate several mental states which we will consider as prime mental states. When operators spend a growing time on their task at hand, their performance is known to fluctuate with periods of degraded performance (i.e., increase in reaction time and decrease in accuracy) [51]. This phenomenon can be explained in terms of engaged resources and is due to the occurrence of several mental states, among which one can list mental fatigue and mind wandering.
Mental fatigue is a state that occurs when a long and tiring task that requires subjects to remain focused is performed [52]. Mind wandering is defined as an attentionnal disengagement from the task during episodes when thoughts are in competition with information processing for the task at hand. This leads to a reduction of external events' processing in a general manner [53,54] and in a performance decrement for the task at hand. These episodes of resource disengagement from the task occur in a non-linear fashion when time-on-task increases. Both mental states would impact situational awareness from the first processing steps, that is to say, the perceptual steps. Moreover, although mental fatigue seems particularly relevant to estimate during both prolonged supervisory and direct control, mind wandering seems more likely to occur during supervisory control. An example is the frequent occurrence of boredom during Unmanned Aerial Vehicle (UAV) monitoring tasks [55].
Regarding task demands, when operators are faced with a particularly difficult task, their performance decreases, and it is the same when the task is too easy. Hence, the performance of an operator follows an inverted U-shape [56]. In neuroscience and human factors, this modulation in task demands or difficulty and the associated effort invested in the task is usually referred to as cognitive workload [57]. This very wide concept can also be understood in terms of required and engaged resources. Cognitive workload can be modulated by varying several factors, such as the load in working memory (e.g., number of items to keep in memory) and divided attention or multitasking (i.e., number of tasks to perform in parallel), as well as stress imposed on the operator (e.g., temporal or social pressure). All these factors are, of course, often overlapping in a given task, in particular, during remote operation.

Collateral Mental States
A resource depletion can also indirectly generate other mental states that we will call collateral mental states. These collateral mental states, e.g., automation surprise, can, for instance, be generated when there is a conjunction of prime mental states, e.g., high workload, and the occurrence of specific events, such as critical system responses localized in time, that is to say feedbacks, parameter display, and alarms, in a general manner. Hence, in this example, an alarm will not be processed by the operator the same way when all resources are engaged (e.g., over-engagement) compared to when the operator is in nominal state. In this case, these system output-related mental states are linked to a temporally local resource engagement. Examples of such system-output related mental states are the following:

•
Inattentional sensory impairments, such as inattentional blindness and inattentional deafness. These attentional phenomena consist in "missing" alarms when all attentional resources are engaged in another sensory modality. Hence, for the inattentional deafness phenomenon well studied in the aeronautical context, pilots under high workload miss auditory alarms when they are over-engaged in the visual modality (e.g., fascinated by the landing track) [58,59]. • Automation surprise, in which the operator is surprised by the behavior of the automation [60]. Although cases reported in the aeronautical domain are generally several minutes long, a subtype of automation surprise is the confusion in response to a brief unexpected event, such as a specific alarm.
In order to go back to the nominal state of the global system, it is important to detect such a state from the operator. It does not matter whether the confusion of the operator arises from a failure of the artificial agents or the human ones. It might also be elicited by a general attentional disengagement of the operator, who is then incapable of correctly processing system-outputs and is confused by any negative feedback. This state might, in any case, lead the operator to take bad decisions and should be detected and taken into account in order to avoid system failure.
In the authors point of view, the main mental states listed above seem particularly relevant to characterize and estimate for hazardous tasks, such as the ones performed by remote operation. In the next section, the classical electrophysiological markers that reflect theses mental states are given.

Physiological Features
The mental states described below are directly linked to situation awareness and to the previously defined related mental states. They can all be directly measured and assessed to a certain extent using portable, cheap, and non-invasive recording methods, such as electrocardiography (ECG) and electroencephalography (EEG), which, respectively, record cardiac and cerebral activities. For recent neuroergonomic literature on eye-tracking measures (i.e., measures of ocular behavior and pupil diameter), readers can refer to Reference [61,62]. As for literature on near infra-red spectroscopy measures (i.e., other measure of cerebral activity), readers can refer to Reference [63][64][65]. For this article, we chose to focus on electrophysiological markers since the current acquisition devices-electroencephalography (EEG) and electrocardiography (ECG)-are particularly cheap, non-invasive, and portable means to record physiological data and are therefore well suited for recordings in real-life settings. In theory, any type of physiological measure can be used to perform physiological computing. Yet, in practice, the easier to compute and the most reliable ones are of course selected. The following list of physiological features is not a comprehensive one but merely reflects the main trends identified for research and development in the physiological computing domain. However, amongst electrophysiological measures, one can also list EMG (electromyography) and EDA (electrodermal activity, or galvanic skin response). We have chosen not to focus on these ones for the following reasons: they require attaching electrodes onto the hands and/or forearm of the human operators, which we believe could both impede the actual performance of the teleoperation tasks, as well as generate corrupted signals, for manual operation. Moreover, the temporal resolution of EDA is of several seconds, which is quite slow and not adequate in critical settings. Therefore, we have chosen to focus on electrophysiological metrics of high temporal resolution (ms) that could be worn during manual operation and that have been proven to be efficient in allowing mental state monitoring in other fields.

Temporal Features
Temporal features are frequently used to characterize mental states. A well-known time-domain metric that can be computed from ECG is the heart rate (HR) expressed in beats per minute (bpm) and computed as the inverse of the Inter-Beat Interval (IBI) [66,67]: with r n as the timestamp of the nth R peak (i.e., highest positive peak), and IBI the mean interval between two pulses (two R-R intervals). In addition, another relevant metric is the heart rate variability (HRV), which can be computed in the time domain as the variability of the R-R interval: HR and HRV are both impacted by engagement. HR increases, while HRV decreases, with an increase in engagement linked to an increase in cognitive workload [68,69]. Conversely, HR decreases and HRV increases with a decrease in engagement linked to an increase in time-on-ask or a decrease in workload [68][69][70]. The automation surprise phenomenon has been reported to increase the HR [71]. Regarding EEG features in the time domain, the main marker is what is called an event-related potential (ERP) [72]. An ERP consists of the EEG signal starting at the occurrence of a specific stimulation, or event, such as an alarm, for instance, and ends at a selected time, (e.g., 800 ms post-stimulation). ERPs can be averaged across trials to better reveal slow modulations in voltage (i.e., positive and negative deflections), which are quite specific to the nature of the stimulation and/or the operator's state. This averaging increases the signal to noise ratio [67]. When only one window of signal is used, the analysis is called 'single-trial'. The single-trial data are of course better suited for online mental state estimation than averaged ones. The 'raw' ERP (i.e., all the samples of EEG signal in a given time window) can be used to estimate a given mental state. Yet, to reduce the number of features, researchers often compute the mean amplitude or select the peak value in specific time windows that correspond to documented deflections, called ERP components.
The amplitude of the various deflections, or components, has been repeatedly linked to resource engagement. Hence, mental fatigue and mind wandering are known to reduce the amplitude of these components, such as the P300 component, which is a positive deflection that occurs roughly between 300 and 500 ms post-stimulation and maximal at posterior electrode sites [53,54,73]. Task demands and cognitive workload are reflected the same way by an attenuation of the ERP deflections [69,70,[74][75][76] and so are the inattentional sensory impairments with, for instance, reduced N100 and P300 amplitudes when auditory stimuli are not consciously perceived and reported [77,78].
Regarding automation surprise, a relevant EEG temporal feature that can be extracted is an error potential (ErrP). This type of event-related potential is specific to the detection of an unexpected event with amplitudes proportional to the frequency of errors [39,79]. ErrPs are notably elicited by an unexpected system output and are characterized by a negative deflection at fronto-central electrode sites, followed by a positive component at centro-parietal sites. The latency of these deflections depends on the type of error that elicits ErrPs (for a review on ErrPs, see Reference [80]).

Spectral Features
The HRV ECG feature can be computed in the time domain as seen above, but also in the frequency domain. In practice, to do so, one first needs calculating the power spectral density of the ECG signal (i.e., random time signal x(t)), which can be expressed, for the Fourier transform X( f ) of the signal, as the square of its magnitude: x Next, the frequency domain HRV is computed by using the LF/HF ratio, which consists of a ratio of the power in a low frequency band ([0.04 0.15] Hz) with the power in a high frequency band [66].
The power of the EEG signal in several frequency bands can also be extracted. The main bands of interest for mental state monitoring of awake operators include the δ (1 to 4 Hz), θ (4 to 7 Hz), α (8 to 12 Hz), and β (13 to 30 Hz) bands. Table 1 details the power modulations commonly reported in the literature for the following mental states: mental fatigue, mind wandering and mental workload. In addition to the potential use of a single frequency band's power, several authors have proposed power ratios as good indices of workload and engagement. Hence, the θ power at the Fz electrode site over the α power at the Pz electrode site ratio is frequently used (θ Fz /α Pz , [75]), or also, the β power over the θ plus α powers (β/(θ + α)) at all electrode sites, as in Reference [81].

Spatial Features
Beyond simple temporal and spectral features one can find in the literature a variety of feature extraction pipelines. For instance, in order to increase the discriminability of two mental states that need estimating, one can use spatial information, that is to say the information on which sensor is more relevant for detecting a given mental state, or information on how the different signals are linked to one another. In order to do so, one can use sensor selection algorithms that automatically detect the relevant ones, or spatial filtering algorithms which combine the signals into more discriminant ones [85]. Temporal or spectral features are then usually extracted from the new signals acquired through this signal conditioning step. For instance, after a spatial filtering step, one can compute the log variance of a signal filtered in the α band, or extract event-related potentials, to estimate the workload of an operator [76,86].
Furthermore, spatial features, such as connectivity matrices, can be computed. For instance, correlation, covariance, or coherence matrices can be computed from the signals of all sensors. Indeed, it has been shown that mental fatigue can be estimated using EEG covariance matrices [87].

Operator Mental State Assessment
The previous section described some noticeable mental states that encourage error occurrence (e.g., cognitive workload), along with associated markers (e.g., Heart Rate Variability (HRV)). These markers (also called features) and, more broadly, data streams coming from the human operators, can be exploited to infer useful information, like mental state estimation. As physiological signals are highly susceptible to noise, most processing pipelines include a preprocessing step before feature extraction in order to enhance the signal to noise ratio. This preprocessing is detailed in the following section. Next, the machine learning framework and the usual tools for estimating mental states are the following topic of this section, while the subsequent section describes some techniques for supervising man-machine teams based on the resulting estimates.

Preprocessing
Electrophysiological signals have to be preprocessed before stepping into the feature extraction stage. Indeed, this type of signal is quite impacted by electromagnetic noise present both in laboratory, office, and ground station conditions (e.g., current, 50 or 60 Hz depending on the country), as well as in operational settings, such as inside vehicles and aircrafts. The usual first step is to apply frequency filters to remove signal drifts and the noise from electromagnetic external sources.
Next, one can add a denoising step that aims at removing influences from physiological sources that are not of interest for a specific application. For instance, one can remove the impact of eye movements on the cerebral signal. Indeed, eye movements produce noises of high amplitude in the EEG signal. In order to remove this information when it is considered artifactual, one can use regression or source separation methods. This is usually done using a reference signal acquired through electrodes positioned above, below and at the outer canthi of the eyes, a method called electro-oculography (EOG). Yet, one should note that ocular activity can in fact be quite relevant to estimate mental states linked to time-on-task, task-demands, and system-outputs and might be rightly conserved inside the EEG signal. Using ocular activity extracted from the EEG signal allows avoiding the use of facial electrodes and is relevant to monitor mental fatigue in operators [88].

Classification Principle
A physiological marker-or, more generally, a vector of markers-is reliable for a given mental state if the values of this marker characterize well the mental state of interest. In other words, the underlying probabilistic distribution of such a vector is known (or supposed) to be significantly different depending on the state of the operator. This property often provides the possibility to generalize from examples: the aim of statistical classification here is to compute (or learn) a prediction function from a dataset that contains vectors of feature values. This function has to associate the most plausible mental state to any new vector, not just to those present in the dataset [89][90][91]. Figure 1 illustrates the classification process. The dataset used for learning purposes, i.e., to compute the prediction function, is called the training set. Within the framework of statistical classification, the training set contains for each vector (of feature values), the corresponding desired output of the prediction function.
In the case of physiological data classification, features are physiological markers and the desired output is the condition: "human in the mental state number 1" or "number 2". In practice, physiological data are recorded on volunteers who have been asked to perform specific tasks, known to make them reach particular mental states or to avoid them. The considered datasets are therefore called labeled datasets, the labels being the desired outputs (i.e., the mental state under which the vectors of features have been recorded). Since it uses labeled datasets, the classification is referred to as supervised learning.
More formally, a n-sized, d-dimensional labeled dataset (X, y) ∈ R n×d × {0, 1} n is a dataset in which each sample (vector of features) is denoted by X i ∈ R d , i ∈ {1, . . . , n}, and the associated label, or class, is y i ∈ {0, 1}, with, for instance, "1" for "high workload" and "0" for "low workload".  Within this formalism, a prediction function is a function c : R d → {0, 1} predicting the label c(X n+1 ) ∈ {0, 1} of any new sample X n+1 ∈ R d (a sample not present in the training set). An algorithm aiming at computing such a prediction function is called classifier. Usually, the labeled dataset is divided into two parts: the training set used to train the classifier and the testing set used to assess the error of the resulting prediction function. Since the testing set is not used to learn the computed prediction function, it is an appropriate dataset to check the generalization properties of this function.

Classification Performance
Many classifiers are used in passive brain-computer interface research to compute more or less powerful prediction functions, depending on the number of dimensions d, the size of the dataset n, its values (X i ) n i=1 ∈ R d and the mental states of interest (y i ) n i=1 . The usual performance metric for a prediction function is the mean accuracy: the number of samples (of the testing set) in which labels are well predicted divided by the size of the testing set. In Figure 2, an example of resulting prediction functions for some popular classification methods on three datasets is given. The datasets used are such that: ∀i ∈ {1, . . . , n}, X i = (HR, HRV) ∈ R 2 + (as detailed in Section 2.2.2, HR: Heart rate; HRV: Heart rate variability). Labels encode the mental states of interest: y = 1 if the human operator performs a robot teleoperation task (blue dots), and y = 0 (red dots) if he/she is resting.
A & B Figure 2. Prediction functions of some popular classifiers on three datasets computed using scikit-learn [92]. The datasets are HR (Heart Rate) and HRV (Heart Rate Variability) values of a human operator during a rest session (red dots, y = 0), and during a mission, described in Reference [42,93], involving a robot teleoperation task (blue dots, y = 1). The first two rows consider two different participants (Part. A and Part. B), while the last row is based on the union of the previous datasets (Part. A & B). The accuracy, which ranges from 0 to 1, is indicated in black on the top right of each graph. Data from the testing set are the more transparent points.
Formally, given a testing set (X, y) ∈ R n×d × {0, 1} n , the mean accuracy of c is a(c) = #{ i | y i =c(X i ) } n . If the testing set is unbalanced i.e., if the number of samples with label 1 (or "positive" data) P := # { i | y i = 1 } is very large (or very small) compared to the number of samples with label 0 ("negative data") N := # { i | y i = 0 }, an adjusted version of the mean accuracy may be used instead based on the following more specific metrics. The number of samples for which the label is l ∈ { 0, 1 } and the prediction is p ∈ { 0, 1 }, denoted by m p,l (c) := # { i | c(X i ) = p and y i = l }, allows a more precise evaluation of the classifier c : R d → { 0, 1 }. Using this notation, the number of true positives-respectively, false positives, true negatives, and false negatives-is TP := m 1,1 (c), respectively, FP := m 1,0 (c), TN := m 0,0 (c), and FN := m 0,1 (c), and these values may be summarized by a confusion matrix 2 × 2 generally used as an approximation of prediction probabilities: with n = P + N, and TP P (respectively, TN N ) often referred to as sensitivity or true positive rate (respectively, specificity or true negative rate). The mean accuracy can be computed from these metrics a(c) = TP+TN n , as well as the adjusted oneã(c) = 1 2 ( TP P + TN N ), for unbalanced datasets.

Some Famous Classifiers
Many classifiers have been developed on theoretical or empirical bases and have pros and cons for each type of data. The following section details: linear and quadratic discriminant analyses (LDA and QDA), Support Vector Machine (SVM), and k-Nearest Neighbours (KNN). LDA is surely one of the most famous classifiers. It has been used on features extracted from ECG or EEG data to predict quite efficiently mental fatigue (e.g., Reference [68,86,94]), mental workload (e.g., Reference [68,86,95,96]), and inattentional deafness (e.g., Reference [97,98]). A combination of classifiers can also be used, such as done by Singh and collaborators, who use KNN and SVM to detect periods of rest, stress, or cognitive workload [38]. For a complete description of the state of the art of mental state classifiers from EEG signals, please read Reference [91].

Linear and Quadratic Discriminant Analyses
A method derived from classical statistics, known as discriminant analysis [99], suggested by R.A. Fisher, assumes that for each class l the data X i ∈ R d y i = l are normally distributed. In a nutshell, by estimating the parameters of the distributions, the predicted class of a new vector simply will be the class of the distribution for which it has the highest likelihood. While covariance matrices of the normal distributions are supposed to be equal in Linear Discriminant Analysis (LDA), this assumption is not taken up with Quadratic Discriminant analysis (QDA). After estimating the parameters of these two Gaussian functions, one for each label l, the prediction is based on the posterior probabilities of the classes. Indeed, using the Bayes rule, the decision for a new vector is the class with the highest resulting probability.

Support Vector Machine
A newer algorithm, called Support Vector Machine (SVM) [100], does not assume that the data are normally distributed. This classification algorithm takes as input a penalty parameter C > 0 and a function called kernel. The kernel function is used to map the vectors that we need to classify from a lower dimensional space (R d ) to a higher dimensional space in which it is more easily linearly separable, i.e., in which we can find a hyperplane that separates the two classes. Some popular kernel functions are the linear kernel K l (x, y) := x, y = ∑ d i=1 x i · y i , the polynomial kernel K p (x, y) := ( x, y + r ) p (with p ∈ N), and the Gaussian radial basis function (RBF) kernel K r (x, y) := e −γ x−y with x 2 = ∑ n i=1 x 2 i = x, x . The classification results using these three kernels are visible on Figure 2. Given a kernel K : R d × R d → R, and thanks to an important theoretical result called the representer theorem [101], a solution f is computed using convex optimization. The predicted class of a vector x ∈ R d is given by the sign of the resulting function f . Thus, the set of all x ∈ R d such that f (x) = 0 is a separating boundary. In the formulation of SVM optimization, the margin, that is the smallest distance between the points x such that f (x) > 1, and those such that f (x) < −1, is maximized as is the classification error multiplied by C: a larger value of C leads to a smaller margin, but more training data that is correctly classified. This algorithm is considered as a state-of-the-art in classification performance, with guaranties due to the convex optimization.

k-Nearest Neighbors
The k nearest neighbors classifier (k-NN) [102] is one of the simplest classification algorithms in machine learning. It is based on a distance defined in the feature space (e.g., the Euclidean ) and defines prediction as the majority label among the nearest k neighbors according to this distance.

Other Algorithms, Recent Advances, and Challenges
The previous list of classification algorithms is far from comprehensive for brain-computer interface applications. Among the remaining algorithms, one can cite random forests (RF) [103] that are based on the majority vote (ensemble learning) of decision trees. Neural networks (NN) [104], such as the multi-layer perceptron (MLP), are also successful and have given birth to deep learning [105], which is beginning to be used to classify EEG data when the database is large enough [106]. They optimize the parameters of successive transformations applied to the data, usually using gradient descent algorithms (backpropagation [107]) to minimize the classification error. The transformations are usually composed of a linear combination of weights (e.g., convolution) and a non-linear function (e.g., sigmoid) called activation function. The intermediate results of each transformation, up to prediction, are called neurons. Since each step outputs several neurons, they are often represented as successive neurons' layers in a network. The more layers of neurons there are, the deeper the network is considered to be. Recent improvements in deep neural networks (e.g., network structure, new transformations, sampling training data) have allowed deep learning methods to reach performances comparable to the state of the art for a motor-imagery EEG data set [108]. The authors even implemented a method to visualize the features used by the resulting classifier. However, up to now, in physiological computing, neural networks have not yet shown their supremacy over other machine learning algorithms, as is the case in image classification. This is probably due to the size of the physiological datasets, which do not allow them to learn enough. The presented machine learning techniques are rather classical algorithms, and their use in BCI are presented in more detail in the review of Reference [109].
New algorithms have been developed based on matrices and tensors as features. These matrices and tensors can be built from connectivity features between sensors (e.g., EEG electrodes) or sources (after a source reconstruction step; for more information on source reconstruction, see, e.g., Reference [110]). Examples of such measures are correlation, covariance, or coherence matrices. The estimation of a given mental state can next be done by computing distance metrics between these objects. This has notably been done for mental fatigue estimation using the Frobenius distance between the covariance matrices of EEG signals [87]. The current use of the Riemannian distance has given rise to high accuracy mental state predictions [111].
There are currently three main technical challenges: • Finding physiological features that are robust to the acquisition environment and tasks. Indeed, interactions between features have been found to significantly impact and decrease classification performance [86,95]. Therefore, one should try and find markers that are context-independent and that could efficiently be used both in the lab and in the field.

•
Developing classification pipelines that are capable of transfer-learning. Classifiers are indeed rarely immune to performance decrements generated by a switch of task, participant, or even session. Pipelines that are robust to inter-subject, inter-session, and inter-task variability are, therefore, to be aimed at.

•
Performing the estimation in an online fashion and closing the loop, that is to say, feeding the mental state estimates to a decisional system that can, e.g., adapt the functioning of the whole system accordingly (e.g., assign tasks or send alarms to the operator). This topic is addressed in the next part.

Closing the Loop: Towards Flexible Symbiotic Systems
The present section considers research work that aims towards adding human operators as measurable agents into the control loop (see Figure 3). Here, we explain how the adjunction of computational steps (e.g., data preprocessing, features computing, classification) and their outputs could be beneficial to drive the human-robot interaction. As stated earlier, the systems developed following this approach are called neuroadaptative or physiological computing systems, as well as passive Brain-Computer Interfaces (BCIs), and are likely to grow considering the increasing use of machines by a limited and decreasing number of human agents. Particularly, Brain-Computer Interfaces (BCIs) can provide useful information about the human operator to the automated system. A BCI is a system that performs direct information transfer from a brain to a computer, through brain activity measurements, therefore enabling to achieve control of devices without the use of psycho-motor activity [112]. In this article, we are interested in passive BCIs, i.e., BCIs in which the human operator does not try to voluntarily control his/her brain activity: the latter is only used to improve the interaction between the operator and the automated system [113,114]. In addition to the difference between active and passive BCIs which relates to the type of control exerted by the user, system interventions, or counter-measures, can be explicit or implicit, i.e., system adaptations can be consciously registered by the user, or not [34]. The use of explicit or implicit adaptations mostly depends on the limit one sets concerning the quantity of information the operator should get.

Goal
In this section, first, the principle of these symbiotic systems is presented. Next, current work on human-robot interaction driving systems exploiting some human state detection is described. An emphasis is given on approaches based on sequential decision-making, where automated planning models under uncertainty have been used. Note that human behavior and the events encountered during remote operation are rarely deterministic or fully observable in our view.

Symbiotic Systems: Principle
As much as the human agent adapts his/her behavior to the feedback given by the system, an automated system should adapt its behavior to the human state vector, either at the user interface level-shallow adaptation-or at the global decisional level-deep adaptation. For instance, Singh and collaborators proposed a physiologically attentive user interface to perform shallow adaptations [38], while Prinzel and collaborators [115] developed a psychophysiological adaptive automation system with adaptive task allocation based on the engagement index [81]. Mixed-initative systems have also been used to produce deep adaptations, such as to decide to launch alarms [116], to decide how to present the information to the operator depending on the task priority and mission goals [9], or even to decide when to request an action from the human operator [117].
In order to improve the performance of remote robot operation by allowing this adaptation to take place, a big technical challenge for these systems is the requirement to function online, or in "real time". Here, the expression "real time" differs from "Real Time Computing": it just means that the system is reactive to data with quite small delays.
Concerning the data from the human operator, it can be classified according to the way they are acquired: • Proximal behavioral data: operator actions on the interface through the mouse, keyboard, buttons, joystick, etc. [31,118]. • Distal behavioral data: obtained using remote sensors (passive operator), such as eye tracker, audio and video streams, etc. [117,119,120].
Next, we propose a general framework we believe to be a strong candidate for such symbiotic systems.

One Solution: Mixed-Initative Interaction Driving Systems
As discussed earlier, physiological and behavioral markers could be used to estimate the operator state vector. As illustrated by Figure 3, this operator state vector can be exploited by a decisional system jointly with the artificial agents' state vectors. More precisely, a decisional framework can decide, based on current states, which action is the most relevant to perform given the mission context and long-term goals. Examples of such actions are to wait for an answer of the human operator, set an artificial agent to autonomous mode, or take over the human operator's task if he/she fails. In the literature, this approach is known as mixed-initiative interaction [28,40,41].
The classical mixed-initiative approach defines the role of the human and artificial agents according to their recognized skills [40,41]. In our point of view, mixed-initiative should be considered in depth, especially for human-robot interaction. Indeed, the agents should be allowed to take the initiative of performing tasks that would not be necessarily defined or specified to them. This position is also advocated by Jiang and Arkin [28], who define the mixed-initiative human-robot interaction (MI-HRI) as: 'A collaboration strategy for human-robot teams where humans and robots opportunistically seize (relinquish) initiative from (to) each other as a mission is being executed, where initiative is an element of the mission that can range from low-level motion control of the robot to high-level specification of mission goals, and the initiative is mixed only when each member is authorized to intervene and seize control of it.' An interesting example of such a mixed-initiative system is given in Reference [118]. This approach relies on a statistical analysis to determine which agent (i.e., human or artificial) is the most efficient for a given task, but not the only one capable of it. This is certainly an interesting topic concerning roles allocation and authority sharing between the human and artificial agents. In other words, it means that the human or the artificial agents are both able to perform the same task, and, when the agent initially expected to perform a task fails (even the human agent), the other can take the initiative to accomplish it. In this sense, we advocate that the human operator should not be considered as a providential agent any more, contrary to the classical operational context which consider that the human operator will be able to take over when sensors or automations fail [121][122][123].
As discussed in the first sections, degraded mental states could diminish human capabilities. Hence, cybernetic systems should be able to compensate such a weakness while ensuring application or mission performance. In the next section, we discuss some works from the literature that report cybernetic (closed-loop) systems that make use of behavioral and/or physiological data to infer the human state vector and to adapt its behavior in consequence.

Mixed-Initiative Symbiotic Interaction Systems: Existing Work
To our knowledge, the literature on interaction based on mixed-initiative symbiotic systems is still scarce. Yet, a few studies have shown the feasibility of the approach. Some of them using only subjective and behavioral data and closing the loop for triggering adaptations for mission accomplishment (long-term decisions), others, using physiological data although applying reactive human-centered strategies (short-term decisions) without taking into account the overall system performance. The works discussed in the following section approach the main idea of such closed-loop systems. As far as the authors know, mixed-initiative interaction driving systems searching for mission performance maximization, and that include physiological computing to monitor the human operator, were not yet fully implemented [42] Gombolay and collaborators studied a mixed-initiative human-robot teaming in which human factors are considered by a robot in the decision-making process [30]. This latter defines tasks to the team by taking into account subjective workload and workflow preferences from human teammates. Interestingly, they found that human workflow could be orthogonal to the goal of maximizing team's overall performance. Unfortunately, in this work, subjective feedback is considered for a priori task allocation, and no online human state estimation is performed for tasks (re)planning. -

Actions and sequences of actions
Beyond subjective measures, which can only be performed before/after the task or in an interrupting manner during the task, behavioral measures can be easily and unintrusively performed online. Hence, de Souza and collaborators used a search and rescue mission in which human operators and artificial agents (UAVs) must collaborate to deliver first-aid kits [9]. This approach proposes to model the human utility based on the Prospect Theory considering subjective perceived probabilities learnt from experimental data. Based on this model, the supervisory system can predict the human operator's response for a given request from artificial agents in a given context. Then, it can choose how to present the information to the operator. The approach is based on Game Theory and is designed to maximize the chances the human operator takes an aligned decision with respect to the operational guidelines. The results demonstrate the system can influence humans' decision, in particular, when operators are emotionally involved.
In Charles et al. [31], an interaction model learning approach is proposed to approximate a Markov Decision Process (MDP) based on crowdsourcing collect data. The authors integrated the human actions on the interface as a state variable which models the user intention dynamic. As well as, its influence over the others state variables evolution during manual control or autonomous robot control mode. Simulation results showed the optimized collaboration strategy (MDP policy) based on the learned interaction model increased the overall mission performance compared to a random or a fixed strategy.
In the same vein, Nikolaidis and collaborators proposed an elegant way to estimate different types of human operators (safe or efficient) based on their sequence of actions in an industrial human-robot interaction context [124]. The decisional framework, which estimates the behavioral profile of the operator, is based on a Partially Observable Markov Decision Process (POMDP). The POMDP adapts the behavior of the artificial agent considering the current estimation of the human operator profile. The same decisional framework is also used by Hoey and collaborators in another operational context [120], in which the system explores video inputs and proposes an assistance for people with dementia, such as (i) verbal or visual prompts or (ii) through the enlistment of a human caregiver's help. -

Vocal commands
Atrash and Pineau proposed a human-robot interaction approach also based on the POMDP framework to drive an automatic wheelchair [119,125]. High-level user commands are inferred by a vocal recognition system, and a feedback is given to the user via a mounted display. In Reference [125], a method to learn the reward function of a such POMDP is presented, while, in Reference [119], the observation function is learnt. These works demonstrate the capability of Bayesian techniques to adjust the POMDP model from (numerous) experiences.

-Ocular behavior
Gateau and collaborators proposed an integrated system that models the non-deterministic behavior of the human operator based on his/her time-to-answer, and his/her availability, which is measured by means of an eye-tracker [117]. The eye-tracking device indicates the regions of the screen the human operator might be paying attention to. Exploring these pieces of information into the closed-loop allows to design a decision-making system that performs requests to the human operator respecting his/her supposed availability. The approach based on a POMDP shows that the human operator's performance on the secondary-task increases when the system takes into account the operator's availability information in the closed-loop, while not decreasing the overall system's performance.

Adaptive Interaction Exploiting Physiological Data for Human State Estimation
To our knowledge, physiological measurements to estimate human (hidden mental) states (cf. Section 2) have never been tested in order to be included into the human-robot mixed-initiative interaction control system (e.g., high-level mission control loop). For instance, in Reference [116], a POMDP-based approach is proposed, in which a mixed-initiative human-robot mission is modeled considering that a degraded (partially observable) cognitive state could be estimated [126].
However, this work did not evaluate experimentally a such system, and the study only provides simulations results.
Yet, outside of the mixed-initiative approach, work has been done to use physiological data in a closed-loop fashion. Indeed, some works in human-machine interaction, adaptive automation, or active and passive BCI, in different operational contexts, have integrated physiological data to trigger adaptation. Examples of such works are detailed hereafter.
-Active BCIs In the active BCI literature, i.e., works that enable the voluntary control of interfaces, exoskeletons or wheelchairs, the adjunction of physiological data in the control-loop has been studied for a few decades (see Reference [127] for a review). However, these systems usually use the outputs of a classification algorithm in a straightforward manner and do not use planning algorithms, nor even consider the potential use of mixed-initiative designs. However, Ghosh and collaborators did propose a Markov Decision Process (MDP) approach to control a wheelchair using EEG data [128]. In their study, the planning problem is solved by reinforcement learning methods. The framework aims to deduce users' intentions and adapt the system's behavior in consequence. Based on the detection of ErrPs (see Section 2.2), the system learns the value related with an action performed in a given state. Perspectives are proposed in the sense of using Partially Observable MDPs to handle with the misclassification errors of user intention, and the reward in terms of cognitive load of the user during policy learning.
-Passive BCI for active BCI Interestingly, the work of Zander and collaborators demonstrates the interest of using a passive BCI to detect the errors generated by an active BCI thanks to the extraction of ErrPs [114]. After detection, the system handles a correction action which speeds up user performance in a short-term horizon. Yet, again, no automated planning technique has been used in this work in order to plan a sequence of actions. The development of dynamic model able to mimic future ErrPs from users in function of the context, which is a necessary step for long-term automated action planning, was out of the scope in this reactive system. However, this work highlights the potential of a such passive and implicit estimation of user's hidden states (or mental states) to increase human performance.
-Passive BCIs for mental workload management Since the early years of passive BCIs, mental workload has been one of the most studied mental states. For instance, Prinzel and collaborators presented a study in which adaptive automation was performed to track performance and to decrease participants' workload [115]. The system used EEG-based spectral features, to decide greedily (based on a threshold) when to switch between automatic and manual control modes during a tracking task (modified version of Multi-Attribute Battery Task; MATB) coupled with an auditory oddball task. Their adaptive automation improved performance while lowering workload compared to a random decision strategy. Such a state-of-the-art work also demonstrates the possible benefits in taking into account physiological features to adapt the system's behavior. Note that, in this study, the decision rule was only based on an EEG-based engagement index [81] evolution. Here, again, no long-term planning technique, potentially based on the evolution of an engagement index, was used. In our view, a model able to predict the evolution of such an index would favor best suited adaptations compared to reactive decision rules, being less prone to short-term variations and triggering actions only when necessary for long-term performance maximization.
More recently, Arico and collaborators have studied the effect of adaptive automation to reduce mental workload in a realistic Air Traffic Management (ATM) task, in particular, during the high-demanding conditions [129]. Various automation schemes were defined beforehand by specialists and were triggered when an EEG-based mental workload index was higher than a threshold, which was user-defined during the training phase. Again, no automated planning technique that would be able to reason by taking into account long-term mission or task goals was used.
Note that, in the existing literature, besides being seldom applied to remote operation, the approaches are either human-centered or artificial-agent-centered: they design a system that models the behavior of the humans and adapt itself to the type of human it interacts with (e.g., Reference [30,114,115,119,120,124,128,129]); or a system that drives the human actions in order to maximize the performance of the entire system (e.g., Reference [9,116,117]). However, considering the definition of MII-HRI given by Jiang and Arkin, [28], and also advocated by us, these works are only first steps and pave the way toward mixed-initiative collaboration strategies. Hence, the next research steps should promote the design of systems in which the initiative is genuinely mixed, i.e., each agent (human or artificial) can intervene and seize the control. Besides, from a human point of view, the utility (or necessity) that such an artificial system could seize the control from us still remains to be well defined, notably regarding ethical reasons.

Research Gaps and Future Directions
In our view, in order to advance the improvement of symbiotic systems' safety and performance, researchers and designers should no more consider the human operator as an unfailing agent. Indeed, as discussed previously, the human operator's mental states can impact their performance and even prevent them to make efficient decisions or, should the artificial agents fail, to adequately take over. As argued by Reference [42], the mixed-initiative framework presents a reasonable solution because it offers the opportunity to determine a cooperation strategy defining the role of involved agents according to their recognized skills and current capabilities. Incidentally, such a framework, if used as an interaction driving system, requires: (i) to monitor the capabilities of all involved agents (human and artificial agents) given the operational context, and (ii) the ability to model the evolution of agents' individual behavior [31,130], as well as monitoring systems output performance.
Automated planning techniques are based on systems models. Note that interaction models are not straightforward to obtain [31]. However, if enough data are available about agents actions' effects, and if monitoring systems performance is known, it is possible to explore planning models that could determine the mixed-initiative policy strategy. In Reference [116,117], automated planning models (e.g., POMDPs) were applied to trigger actions (e.g., role assignment, implicit or explicit counter-measures launch) for operation performance maximization while respecting safety specifications. These works have demonstrated the interest of long-term reasoning to mitigate decrease of performance or critical situations. It paves the way for the integration of richer monitoring systems (e.g., physiological computing-based ones) into such mixed-initiative interaction driving models. It goes without saying that both requirements-physiological computing for monitoring systems and long-term model-based actions planning-still need further developments, in particular, in ecological settings.
In this article, a non-exhaustive review of relevant mental states of interest for operator monitoring was given. It does not include work on affective-related states which are, however, also relevant to characterize and enhance human-robot interaction. Affective computing is a well developed field and affective states can be estimated quite efficiently using machine learning tools on a variety of physiological markers [131]. Therefore, in addition to estimating time-on-task, task demands, and system-related states, further research and development should also focus on incorporating affective computing pipelines into the system. As detailed by Pongsakornsathien and collaborators, research and engineering work also needs to focus on sensor fusion and sensor networks, by taking into account the specificity and minimum performance requirement of each sensor to increase mental state estimation reliability and accuracy [132].
In addition, this paper advocates the use of sensors and its specific pipelines for mental state estimation purposes in HRI to enhance mission performance. However, to plug in the human operator in such a way brings out social and ethical issues. Despite the importance of those aspects, these points are out of the scope of this article and readers should refer to studies that present formal methods for linking ethics and automated decision-making [23], that propose a user-centered method to design, develop, and test assistive robots [133] or that discuss the acceptability of wearable sensors [134].
Lastly, it should be noted that, although remote operation is a rising form of HRI with applications in risky settings that justify a need for research and development to enhance both operation safety and performance by taking into account the state of the operator, local operation of robots, such as in the 4.0 industry or in the operating room, could also benefit from physiological computing and MII systems [135][136][137].

Conclusions
This perspective article was meant to provide the reader with a thorough understanding of a recent and growing field that is called physiological computing, with a focus on the benefits it could bring to human-robot interaction developments for remote operations. It stems from the review of the literature that there is a need for studies that would concentrate on using physiological data to infer operators' mental state in an online fashion to adapt the interaction, particularly in the context of remote operation, and that would use methods, such as automated planning techniques, in order to progress towards mixed-initiative architectures. Such developments would in our view provide safer and more efficient human-robot interaction systems, which would be an invaluable contribution for remote operation in risky settings.
Author Contributions: Conceptualization: R.N.R. and T.G. Writing: R.N.R., N.D. and C.P.C.C. Review and editing: T.G. and F.D. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.