A Robot Learning Method with Physiological Interface for Teleoperation Systems

: The human operator largely relies on the perception of remote environmental conditions to make timely and correct decisions in a prescribed task when the robot is teleoperated in a remote place. However, due to the unknown and dynamic working environments, the manipulator’s performance and efﬁciency of the human-robot interaction in the tasks may degrade signiﬁcantly. In this study, a novel method of human-centric interaction, through a physiological interface was presented to capture the information details of the remote operation environments. Simultaneously, in order to relieve workload of the human operator and to improve efﬁciency of the teleoperation system, an updated regression method was proposed to build up a nonlinear model of demonstration for the prescribed task. Considering that the demonstration data were of various lengths, dynamic time warping algorithm was employed ﬁrst to synchronize the data over time before proceeding with other steps. The novelty of this method lies in the fact that both the task-speciﬁc information and the muscle parameters from the human operator have been taken into account in a single task; therefore, a more natural and safer interaction between the human and the robot could be achieved. The feasibility of the proposed method was demonstrated by experimental results.


Introduction
With significant progress in computer science, information science, automation, and artificial intelligence techniques, telerobots have been extensively applied in areas as diverse as telemedicine [1,2], telerehabilitation [3], minimally invasive surgery [4], disaster rescue and relief operation [5], maintenance and exploration in deep sea or out space [6,7], surgeon training [8], and telemanufacture [9], etc. Telerobots provide an alternative interactive way between the human operator and the teleoperation in order to enhance perception and motion ability of the human beings [10,11]. It is the integration of human intelligence and the robot's advantages under the constraint of long distance [12,13]. The performance of teleoperation largely depends on the perception of remote environmental conditions.
During the recent years, many achievements indicated that some related algorithms such as impedance control, virtual fixture, and shared control, could further improve the performance of teleoperation. According to literature [14], a switched-impedance control algorithm was presented and implemented for teleoperated heart surgery and two switched reference impedance models were designed for the master and slave robots to achieve both the stabilities of motion and force feedback. Mojtaba et al. proposed a nonlinear adaptive impedance controller with two impedance models in Cartesian space for the master and slave robots based on the consideration of dynamic uncertainties. Besides, the parameters of the impedance model were adjusted according to the objectives and requirements of the applications. This proposed adaptive impedance control method has been successfully applied in certain medical applications [3]. Furthermore, according to literature [15], a force reflection framework based on gain force observer, was developed to acquire the force information. By considering internal uncertainties and time-varying delay of the system, a satisfactory performance of the system was guaranteed for the teleoperation system. Moreover, haptic assistance method involving shared control was a choice to enhance the performance of teleoperation [16][17][18]. In a previous study [16], a shared-based control method was proposed for a teleoperation system and implemented in haptic shared control (HRC) and state shared control (SSC). A decision-making model with impedance control algorithm was presented to improve the tracking capability in partially unstructured environments for a teleoperation system [18]. Virtual fixture-based control algorithms were proposed to improve the performance of robot teleoperation for a cooperative task [19,20]. In our previous study, a neural network (NN) control method with wave variable was proposed to cope with the problems of time delay and dynamics uncertainties for the teleoperation system [21]. According to literature [22], a variable gain control method with tremor attenuation was developed to adapt to the unstructured environments and to reduce the reliance on humans' operation skill. The above-mentioned algorithms could successfully deal with the problems of model uncertainty. However, they could not guarantee the improvement of efficiency of the prescribed task and release of the work pressure of the human operator.
In order to improve the efficiency of the teleoperation and to reduce the workload of the human operator, many researchers have focused on how to deliver humans' skills to the robots with a short processing time and few trials. The results indicated that robot learning method could significantly improve the efficiency of the teleoperation and reduce the operational pressure for the human operator. Schaal proposed a learning form demonstration (LfD) framework for the learning control method [23]. Lin et al. [24] proposed a remote lead through teaching (RLTT) with a dynamic time warping (DTW) method to learn the task knowledge from human demonstration [24]. Moreover, a hidden semi-Markov model (HSMM) was presented to learn a task model for the purpose of offering assistance for remote operation tasks [25]. A probabilistic method was developed to construct a task model in the assistive teleoperated operation. The performance of the teleoperation was improved by the method and demonstrated by remotely operated vehicle (ROV) tasks [26,27]. In order to simplify the complexity of the system and to cope up with the varying dynamical interaction, Huang et al. [28] developed a hierarchical interactive learning (HIL) algorithm with dynamic movement primitives (DMPs) and locally weighted regression (LWR) to learn the task trajectories for an exoskeleton system. Furthermore, researchers also employed the human-in-the-loop method with shared control and LWR for robot learning online in the non-trivial task [29,30]. Pervez et al. utilized Gaussian mixture model (GMM) with DMP model to encode the demonstrated trajectories in peg-in-hole tasks. The experimental results demonstrated that the developed method could cope up with the problem of teleoperation tasks with large variabilities [31]. Ravankar et al. [32] investigated the path smoothing algorithms for the robot navigation task. A novel human-exoskeleton interface was proposed [33] to demonstrate the performance of cable driven arm exoskeleton (CAREX) with a cable tension planner for smooth motion. The above-mentioned robot learning methods could successfully improve the work efficiency of the teleoperation by learning human skills. However, these algorithms rarely consider the relationship between remote environments and the human operator operational characteristic in the process of human-robot interaction (HRI).
In a specific task with HRI, a human operator can adjust his stiffness of upper limb to deal with external disturbance. For example, the human operator can increase his stiffness or force when the resistance force of external environment gets larger in a natural way. Inspired by this, we tend to transfer the human stiffness to the robot in a real-time way enhance its interactive abilities. To be specific, electromyography (EMG) signals represent the comprehensive impact of motor unit action potentials (MAUP) of the muscle fiber both in time and in space. The EMG signals have the following three applications: indicator of the muscle activation, representation of force generated by the human muscle, and a viewer of the fatigue for the human muscle. As an indicator of the muscle activation, the surface electromyography (sEMG) signal includes more information such as force and stiffness related to human motion control information. Therefore, we extract the stiffness from the sEMG signals and transfer it to the slave device of the teleoperation system. EMG-based technologies are widely used for the robotics. Meattini et al. [34] proposed an sEMG-based HRI interface with pattern recognition and factorization methods to provide a natural interaction between humans and robots. Furthermore, a dynamic switching method [35] was presented to predict human operator's behavior and to provide the most effective control according to the given context or the specific task in myoelectric training tool (MTT) platform. Antuvan et al. embedded a human as a part of the controller in the system to provide a myoelectric interface in different control tasks. The method was verified through the experiments involving the muscle activation of human upper limb [36]. According to literature [37], an HRI control interface was developed by integration of EMG signals and different functionally effective muscle (FEM) methods for a robotic wheelchair. A position-independent decoding movement intention method based on proportional myoelectric interface was proposed for the purpose of estimation of possibility of various arm positions [38]. It was concluded that the muscle activation could be used as an interaction tool to capture the information details of the remote operation environments. In our previous study [39], a hidden semi-Markov model with Gaussian mixture method was proposed to carry out the repetitive tasks with enhanced intelligence.
Inspired by the robot learning algorithms and EMG-based control method, in this study, a nonlinear regression model with physiological interface was constructed to relieve workload of the human operator to improve efficiency of the teleoperation system, and to capture the information details of the remote operation environments. First, the DTW method was employed to guarantee the demonstration data in the same time scale before proceeding with other steps in demonstration process (In our previous work [39], we did not consider the influence of the length of demonstration data). Secondly, a human-centric interaction method based on muscle activation was developed to collect the operation information and to actively capture the remote environment on-line. Besides, in order to improve the efficiency of the teleoperation system and to reduce human operator's workload in a natural way, the LWR method was proposed to model the teleoperated prescribed task based on the collected task trajectories and the human muscle stiffness. Finally, the feasibility and efficiency of the proposed method was verified by the experimental results.
The remaining part of this study is organized as follows. Section 2 presents equipment used for a MYO armband, Touch X and Baxter robot. In Section 3, the proposed algorithms including physiological interface design, task demonstrations, and learning method are described. Results are presented in Section 4. The conclusion and the future work are presented in Section 5. The main notations are presented in Table A1 in order to facilitate reading. Figure 1a shows that the MYO armband was used to detect raw sEMG signal, which consists of eight electrodes and nine-axis inertial measurement units (IMU). The MYO armband was produced by Thalmic Labs, Canada and it can be easily adorned in upper limbs of human with a default frequency of 200 Hz. Figure 1b demonstrates the detection of the sEMG signals using the MYO armband with eight channels.

Touch X
The Touch X serves as a master device to control remote slave device ( Figure 2a). The Touch X with six degrees of freedom (DoF) was manufactured by 3D Systems, Inc. and it could capture the position (three DoFs) and its orientations (another three DoFs) with force feedback. The Touch X is widely applied in various areas in terms of simulation, virtual assembly, robot control, etc. (The kinematics information of the Touch X with respect to its forward/inverse kinematics and Jacobian matrix have been deeply studied [40,41]). In this study, Touch X was controlled via the Matlab/Simulink [42].

Baxter Robot
Figure 2b exhibits that the Baxter robot (Rethink Robotics. made) was composed of two arms with seven DoFs, a torso and a head with a radar. Each robot arm was designed with eight links and seven rotational joints (The mechanical structure between Baxter and Touch X is different, therefore, a workspace matching method was developed in our previous studies [21,22]).   Figure 3 shows that the developed scheme consists of the following three modules: physiological interface module, demonstrations and robot learning module, and robot execution module.

Method
• Physiological interface module. Physiological interface module consists of sEMG signal processing unit and sEMG-signals-stiffness translation unit. In this module, raw sEMG signal is collected by the MYO armband. The human stiffness can be obtained through the sEMG-signals-stiffness unit.

•
Demonstrations and robot learning module. This module is the main part of the proposed frame. The data processing unit is used to process task trajectories from the remote Baxter robot.
The collected data contains task trajectories and human stiffness. The collected task trajectories and collected stiffness are ready for the demonstration and learning process. DTW unit is used to align the demonstration data in a united time scale. The robot learning model is mainly used to obtain a generative model according to the demonstrated information related to the collected task trajectories and collected muscle stiffness. • Robot execution module. This module is mainly used to enable the remote Baxter robot to execute the task according to the learned task trajectories and learned muscle stiffness.

Physiological Interface Design
In the teleoperation system, the human operator manipulates the robot remotely, and the slave movement follows that of the master synchronously. In general, the interaction information between the human operator and the remote environments can be reflected by variability of the muscle activation of the human operator in the process of manipulation.
In this study, the MYO armband with 200 Hz was used to collect the raw sEMG signal with eight channels [22]. According to the sampled sEMG signal, the sum of sEMG signal can be defined as follows: where u indicates the amplitude of sEMG signal andû i , i = 0, 1, . . . , 7 represents the raw signal. The obtained sEMG signal is influenced by noise, therefore, a moving average filter is presented as follows:ǔ where K is the moving window size.ǔ is the filtered sEMG signals through the moving average filter. Then, an envelope of sEMG signals can be obtained as follows: where W M−j indicates the weight parameter and M is an empirical parameter in the experiment. According to literature [43,44], the relationship between the sEMG signals and the indicator of the muscle activation can be presented as follows: where −3 ≤ β ≤ 0 in this study.
In order to achieve friendly and natural interaction between the human operator and the remote robot, muscle stiffness of human upper limbs is taken into consideration. To this end, muscle stiffness can be presented as a linear relationship according to literatures [22,39] whereā ∈ [ā min ,ā max ] indicate the parameters to guarantee stability of the slave. ρ ∈ [ρ min , ρ max ] represent the muscle activation.

Task Demonstrations
In the data collection stage, the demonstration data consist of several observations of the same task. After collecting demonstration data, a DTW method was proposed to synchronize the collected data.

Data Constitution
The proposed teleoperation system employed a heterogeneous master-slave structure, therefore we utilized motion of the slave as the data source for demonstrations. Owing to the workspace differences between the master and the slave, a workspace mapping between the master and the slave was carried out to ensure the accuracy of tracking.
The collected demonstrated information contains the position of the slave and human muscle stiffness. In the process of demonstration, the demonstration data can be organized as follows: where P = {P x , P y , P z } andā indicate the collected task trajectories and collected muscle stiffness, respectively.

Data Preprocessing
Lengths of the demonstration data differ in time frame, thus it is impossible to train a task model in teleoperation. Therefore, the first step of data preprocessing was to align the demonstration data with different data lengths (if the demonstration data are not aligned in a united time scale, the task model could change both in temporal states and spatial states).
The alignment of the demonstration data can be performed according to the entire obtained demonstration data in comparison with a reference demonstration data. In this study, the DTW method was employed to find the optimal alignment of the demonstration data in the time scale [45]. The introduced DTW method could deal with the problem of the spatial distortion for the demonstration data. By employing the DTW method, the demonstration data can be presented as where L(.) indicates the length of trajectory, and x 1 and x 2 are the two different time lengths of trajectories. D{L(x 1 ), L(x 2 )} is the wrap path distance (Figure 4). D opt {L(x 1 ), L(x 2 )} is the optimal wrap path distance.

Learning Algorithm
The demonstration data are high dimensional; therefore, it is difficult to find a function to describe the demonstration data globally. In this study, the locally weighted regression (LWR) algorithm was employed to explore the approximate function locally between the input and the output for the given aligned demonstration data.
According to the demonstration data T = {(x(i), y(i)), i = 1, 2, . . . , N} N i , the local relationship between the output y(i) and the input x(i) can be obtained as follows: where y(i) denote the output values of the ith local model, w(i) represent the weights of the y(i).
In the learning algorithm, input x i consists of the collected task trajectories P = {P x , P y , P z } and the collected stiffnessā. x i are four-dimensional vectors, y i are four-dimensional vectors.
Based on the receptive fields of the model, the weight w i can be computed by Gaussian forms as follows: where x is the fitting position. ϑ > 0 is used to adjust the Gaussian weight. ϑ = 2 is used in this study. Equation (9) concludes that the weight value becomes larger as the input x(i) is near to the fitting position x. For the given input data x(i), the log likelihood for the probability p(y(i)|x(i)) between y(i) and x(i) can be presented as where φ is parameter vector of the LWR method. With Inspired by literature reports [46,47], the maximum of (φ) by minimizing the H(φ) can be obtained as follows: On minimizing the H(φ), we have By using partial derivatives notion for Equation (13), we obtain To set Equation (14) to be 0, a diagonal weight matrix W can be represented as follows: Based on Equations (14) and (15), the nonlinear model can be obtained as with After the task is learned by using the LWR method, the slave robot was operated according to the learned task trajectories and learned human stiffness.

Experimental Setup
In order to verify the performance of the proposed algorithm, a practical sweep task was performed in this study. Figure 5 demonstrates the construction of the experimental platform as follows: • Hardware equipment. The experiment hardware equipment consist of the Touch X, the Baxter robot, and the MYO armband with eight channels. The left panel of Figure 5, exhibits that a red garbage bucket and a yellow tapeline (as garbage) are placed on the testbed. A paint brush is installed in the right arm of the Baxter robot. In the sweep task, a paint brush was used as a sweeping tool, and a yellow tapeline as garbage. Touch X was performed by the human operator to telecontrol the Baxter robot to sweep the garbage into the red bucket. By performing the prescribed sweep task, the task trajectories and human stiffness can be learned by using the proposed method. The slave robot was manipulated according to the learned task model. Figure 6 exhibits the experimental process of the proposed method. Figure 6 (left) shows the demonstration and learning processes. The human operator directly telecontrols the slave. The proposed approach was trained by collected prescribed task trajectories and human stiffness. In the robot execution process, the slave executed the task based on the learned trajectories and learned human stiffness.

Results
The human operator performed the sweep task three times. By collecting data in the process of demonstration, the demonstration data are presented in Figure 7a-c. Clearly, the completion time of the sweep task was inconsistent for each demonstration. Accordingly, the demonstrated trajectory curves were mismatched in the time scale for demonstrations.   The DTW algorithm was employed to normalize the demonstration data in the same time scale. Figure 8a-d, show that the demonstration data retain its original characteristics in the same time domain after the DTW algorithm preprocessing. Compared to the results shown in Figure 7a-c, the preprocessing results guaranteed the uniformity of the demonstrations and make preparations for the next steps of robot learning task. In Figure 8d, values of demonstrated muscle stiffness in starting point were 0.4060, 0.1340 and 0.0110.     Based on the learned model by using the LWR algorithm, the Baxter robot can execute the sweep task automatically with the learned human muscle stiffness. The sweep task fully indicates the dexterity of human arms in the process of operation. In this experiment, human muscle stiffness was employed based on muscle activation to reflect the interaction information in demonstration phase. Figures 10I-VI show the process of robot execution for the sweep task. By employing the proposed LWR algorithm with physiological interface, the sweep task was performed successfully within 0-10 s.     In order to validate the performance of the learning model, we evaluate average value of robot in the process of robot execution in Table 1. It can be seen that the average error for learned trajectories in x-y-z directions and learned stiffness were 0.0130, 0.0018, 0.0346 and 0.0124, respectively.

Discussion
First of all, the proposed LWR method constructs a nonlinear model for a specific task. Compared to traditional robot learning methods [23,24], the proposed method combines the task trajectories with the muscle stiffness during the demonstration process.
Moreover, the human operator can regulate the muscle stiffness according to the remote environments in the HRI. That is to say, the variability of muscle stiffness represents the operational characteristics of human operator. Therefore, a model to describe the relationship between the prescribed task, task trajectories, and the muscle stiffness are very useful in the learning process. Thus, the learned model can be used to improve the efficiency of the reproduction of the repetitive task. In the proposed method, we transmitted the learned model, i.e., learned trajectories and learned human stiffness, to the controller of the slave robot of the teleoperation system after robot learning. Then, the slave robot can be executed according to the learned model.
Noteworthy, the experimental results in this work are preliminary and the experiments were conducted mainly to show the feasibility and effectiveness of the proposed method. The stiffness of the human operator varies from person to person. Moreover, even for the same person, the stiffness property at different time could be different (time-varying). Therefore, it is hard to find a baseline to compare the results under "with stiffness" and "without stiffness" conditions. Different people may have different stiffness, and thus may show different motion model. Therefore, it is difficult to define fair comparison criteria for different/multiple experiment subjects. There is no problem foreseen to generalize the proposed method to different subjects as the implementation procedures will not change and each step is not affected by the specific human subject. However, surely the obtained specific muscle activation/stiffness pattern(s) would be different from person to person.
This work makes different contribution from our previous work [39,48], wherein a hidden semi-Markov model with Gaussian mixture method was proposed to carry out the repetitive tasks with enhanced intelligence. In this work, a locally weighted regression method with DTW was proposed for a prescribed task. Moreover, in our recent study, it was found that the length of demonstration data could significantly impact the performance of the teleoperation. To handle this problem, we employed the DTW method to deal with the issue related to different lengths. However, in our previous work [39], the influence of the length of demonstration data was not taken into account.

Conclusions and Future Work
In this study, a robot learning method with physiological interface was proposed for the teleoperated sweep task. In order to capture the information details of the remote operation environments in HRI, a novel method of human-centric interaction with human muscle stiffness was first developed. On the one hand, the sEMG signal of the human operator was used to capture the human's muscle stiffness. On the other hand, the collected muscle stiffness represented the human operator operational characteristic in the process of manipulation. Integration of the DTW method and the LWR method enabled the robots to learn the task trajectories and human muscle stiffness in the same time scale after human demonstrations in teleoperation system. The remote robot can execute the task according to the learned task trajectories and learned stiffness. Finally, the effectiveness of the proposed method was demonstrated by the experimental results.
Undeniably, a lot more systematic explorations are demanded to introduce the vision information and rotation information to the perception system of the teleoperation system which will be pursued in future. Besides, the relationship between shared control and the human stiffness in fatigue status should be comprehensively analyzed. Moreover, the shared control strategy with respect to robot learning method is an interesting research topic in the teleoperation system. The level of autonomy between direct teleoperation and robot-learning-based automation is worth to exploit to explain the cooperative strategies for HRI. Moreover, the smoothness of task trajectories and safe trajectory generation with respect to the muscle stiffness are very important aspects for the complex task. In order to obtain the model with smoothness of task trajectories in-depth exploration of the intelligent segmentation algorithm to divide the complicated task into certain subtasks which can be approximated explicitly by locally smooth mathematical functions/polynomial will be carried out in future.
Author Contributions: J.L. conceived the method, performed the experiments and wrote the paper; C.Y. conceived the method and helped to improve it; H.S. and C.L. further helped to improve it. All authors have read and approved the manuscript.
Funding: This work was partially supported by Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/S001913 .

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In order to facilitate reading, the main notations are presented in Table A1. x(i) Input with related to LWR.

y(i)
Output with related to LWR.

w(i)
Weights of the y(i).
(φ) Log likelihood for the probability with related to LWR.
W Diagonal weight matrix of y(i).
φ Nonlinear model with related to LWR.