Probabilistic Sensitivity Amplification Control for Lower Extremity Exoskeleton

To achieve ideal force control of a functional autonomous exoskeleton, sensitivity amplification control is widely used in human strength augmentation applications. The original sensitivity amplification control aims to increase the closed-loop control system sensitivity based on positive feedback without any sensors between the pilot and the exoskeleton. Thus, the measurement system can be greatly simplified. Nevertheless, the controller lacks the ability to reject disturbance and has little robustness to the variation of the parameters. Consequently, a relatively precise dynamic model of the exoskeleton system is desired. Moreover, the human-robot interaction (HRI) cannot be interpreted merely as a particular part of the driven torque quantitatively. Therefore, a novel control methodology, so-called probabilistic sensitivity amplification control, is presented in this paper. The innovation of the proposed control algorithm is two-fold: distributed hidden-state identification based on sensor observations and evolving learning of sensitivity factors for the purpose of dealing with the variational HRI. Compared to the other state-of-the-art algorithms, we verify the feasibility of the probabilistic sensitivity amplification control with several experiments, i.e., distributed identification model learning and walking with a human subject. The experimental result shows potential application feasibility.


Introduction
In recent decades, with the dramatic progress in computing, control and sensing, significant types of prostheses, orthoses and exoskeletons have been designed and developed to assist and support human limbs for various tasks. In order to increase the operator's strength and endurance, the exoskeleton is a typical class of wearable mechatronic robotic systems, which are designed to meet this demand. A particular feature of this kind of exoskeleton is that they are fully driven by the pilot intention and follow the pilot movements in synchrony. The application feasibility includes functional rehabilitation for the injured, physical assistance for the disabled and elderly and fatigue relief for heavy-duty workers. To recognize human intention, two main groups of sensors, i.e., cognitive-based sensors, as well as physical-based sensors, are usually applied to capture the information of the human interaction or the muscle signals.
By measuring the electric signals from the musculoskeletal system or the nervous system, the operator's intention read by the cognitive-based sensors can be interpreted as the input to the exoskeleton system controller. A successful practical application of these kinds of exoskeletons is  Figure 1. Proposed control framework. In the feedforward control loop, the sensitivity amplification control (SAC) controller with the evolving sensitivity factor (in black dashed arrow) provides the necessary driven torque according to the variational HRI (human-robot interaction, given in double blue arrows). Moreover, to improve the control performance, the hidden-states are identified in the feedback control loop according to the observation information. Note that, the red color arrows stand for input and output, while the black arrows present the signal direction.

Distributed Hidden-State Identification
For realizing efficient control of the exoskeleton, it is essential to understand the human intention. According to our control scheme, the intention can be interpreted as the system state variables. Although tremendous attention has been paid to filtering and smoothing [16][17][18], these algorithms do not focus on the identification process, but on inference. In addition, the Gaussian process (GP) state-space model presented in [19] has not addressed the crucial issue of computation expense, which we will carefully discuss in this section. Therefore, the graphic explanation of the crucial difference between the GP state-space inference [17][18][19] and our proposed distributed hidden-state identification are shown in Figure 2. . Graphic explanation of the Gaussian process (GP) state-space inference and distributed hidden-state identification. Derived from the distributed GP framework, the hidden state variables (embedded in the gray circles) are identified by the dynamic models h i () and the measurement models g i (), separately. Also, the connection is presented in the red arrows and the influence of the input to the control plant is shown in the blue arrows. Note that the hyper-parameters φ h , σ h related to the dynamic models are shared in the training process (in green arrows), along with the hyper-parameters φ g , σ g for measurement models. (a) GP state-space inference; (b) proposed distributed hidden-state identification.
In this section, firstly, we give a comprehensive explanation of the distributed model learning for the identification process based on the expectation maximization. Then, we introduce the online inference under a distributed framework and Markov chain Monte Carlo, also along with the fusion method, so-called generalized product-of-expert (PoE).

Model Learning
Although the Gaussian process has a substantial impact in many fields, the crucial limit is that the computation expense scales in O(N 3 ) for training, while regarding prediction, the computed expense is O(N 2 ) if caching the kernel matrix. Thus, we employ the distributed Gaussian process [20] to address this issue, since this method can handle arbitrarily large datasets with sufficient hierarchical product-of-expert (PoE) models.
The Gaussian process, fully specified by a mean function µ f and a covariance function k f , is a probabilistic distribution over functions f . The latent function y i = f (x i ) + ω, ω ∼ N (0, σ 2 ω ) can be learned from a training dataset {X, z}, where X = [x 1 , ..., x n ] and z = [z 1 , ..., z n ]. In addition, the posterior predictive distribution of the function f * = f (x * ), by a given set of hyper-parameters θ is defined with mean and covariance, respectively With x * defining input, k * = k(X, x * ) and k * * = k(x * , x * ). Note that the hyper-parameters θ of the covariance function consist of the length-scale l, the signal variance σ 2 f and the noise variance σ 2 ω . Besides, the hyper-parameters are of significance to the covariance function defining the non-parametric model behavior [21].
According to the independent distributed assumption, the marginal likelihood p(y|X, θ) can be decomposed into M individual products: With M the total number of experts. In the model learning process, the hyper-parameters θ are shared for all M GPs. Moreover, a usual method for optimizing the hyper-parameters is to maximize the evidence maximization. Nevertheless, we apply a more efficient method in this paper, i.e., expectation maximization with Monte Carlo sampling.
Therefore, the general class of latent state-space models with the Gaussian process is given below: With k = 1, ..., T. The x ∈ R D is a latent state that evolves over time, while z ∈ R D can be read from actual measurement data. Both additive system noise, as well as measurement noise are assumed i.i.d. If not stated otherwise, both the measurement function and dynamic function are implicit, but instead can be described with the probability distributions over these Gaussian functions. Here, the functions are a Gaussian process non-parametric model, defined as h ∼ GP h and g ∼ GP g , separately.
Limited by the space of this paper, we briefly explain the learning process of the dynamic function h ∼ GP h . In the expectation step, we compute the posterior distribution of the system states q(X) = p(X (m) |y (m) , θ), where X (m) = (x k−1 , a k−1 ) and y (m) = x k . Nevertheless, the posterior distribution is not a Gaussian process and hence is not easy to obtain analytically. Consider the situation in which we draw samples from the posterior distribution and the marginal likelihood is approximated by Monte Carlo integration, as shown in Figure 3. Consequently, in the maximization step, the hyper-parameters are given by maximizing the obtained likelihood E q [p(X (m) |y (m) , θ] with stochastic conjugate gradient optimization. The whole learning routine is give in Algorithm 1. Note that though the learning procedure is time consuming, this is nota problem since the training process of hyper-parameters is off-line.

Distributed Identification Process
To identify the latent state x k , we have to compute the posterior distribution p(x k |z 1:k ). Note that, even if the system states can be obtained from the observation, the measurement data are not accurate owing to the system noise. Thus, the system state x k is seen as a hidden variable that should be identified. Moreover, the k time step system state only depends on k − 1 time step state and the k time step observation, which hence is a typical hidden Markov model. According to the previous measurements and current time step prediction, Bayes' law yields: where p(x k |z + 1; k) is defined as a prior of the posterior distribution and p(z k |x k ) is determined by the measurements. In addition, to improve the identification process, we reuse the observation data, i.e., current, previous and future: For the recursive process, the smooth hidden states γ(x k−1 ) are given from k = T to 2: In light of the work in [22], the updated posterior distribution of the latent state for the m-th (m = 1, · · · , M) GP can be obtained: Not akin to [22], to infer a hidden state with a corresponding observation input, we should combine all the M GPs to fusion of the M predictions. Therefore, in this paper, we compare four fusion algorithms, i.e., product of expert (PoE) [23], generalized product of expert (gPoE) [24], Bayesian committee machine (BCM) [25] and robust Bayesian committee machine (rBCM) [20]. The only difference between the gPoE and PoE is the different precisions, since the important weights are set to β k = 1/M to avoid an unreasonable error bar. Moreover, the improvement made in the BCM is that it incorporates the GP prior explicitly when predicting the joint distribution. Hence, the prior precision should be added to the posterior predictive distribution. The rBCM seeks to add the flexibility of gPoE to BCM to enhance predictive performance.
Moreover, we conduct an experiment with a one dimension problem [26]. Figure 4 shows the performance comparison of four distributed identification process (PoE , gPoE, BCM, rBCM) and robust Rauch-Tung-Striebel smoother for GP dynamic systems (GP-RTSS) [18], which is implemented under the full GP scheme. The PoE and BCM distributed schemes overestimate the variances in the regime of the series hidden states, which leads to overconfident precision compared with GP-RTSS. Moreover, the prediction of the rBCM is too conservative, although the rBCM aims to provide a more robust solution. The gPoE distributed scheme provides more reasonable identification of the hidden states than any other models in Figure 4.  Moreover, Figure 5 shows two aspects of the distributed identification framework with the different number of GP experts: (1) a comparison of approximation quality; (2) computation expense.
According to the information in Figure 5, for a certain set of training data, with the increase of the GP experts, the computation time significantly decreases. However, the computation expense reaches a minimum when there are five GP experts. Besides, the interesting properties are that there is no significant difference (root-mean-square-error (RSME): 0.01 and negative log-likelihood (NLL): 0.2) between the identification framework with one GP expert (full GP) and more than one expert. Furthermore, note that a better approximation quality can be expected with ten GP experts with the minima of RSME or with eight GP experts according to the NLL criterion. Thus, although we do not argue that our proposed hidden-state identification scheme outperforms the full GP identification algorithm concerning precision, the computation expense has been dramatically reduced. Moreover, the overall identification procedure is concluded in Algorithm 2.

Evolving Learning of the Sensitivity Factor
The sensitivity amplification control proposed in [9] aims to improve the control performance without directly measuring the interaction information with the pilot. Thus, by inserting a sensitivity factor α in dynamic equation: The interaction between the pilot and the exoskeleton is given: With perfect estimationM = M,Ĉ = C andP = P and sufficient large α. The interaction d can approach zero. Nevertheless, the sensitivity factor should not be selected with a sufficiently large parameter due to the weak robustness. Moreover, in an actual situation, the interaction varies owing to stochastic environment factors. Therefore, choosing a fixed sensitivity factor is not a good option. An improvement has been made in [27], where the sensitivity factor is learned with Qlearning. Compared with the classic SAC, the interaction force has been reduced. However, the learning algorithm mainly copes with the discrete region of interest. In addition, the optimization of the constraint domain of the sensitivity factor is not addressed in the reference.
In this paper, the sensitivity factor is evolved in real time in a continuous constraint domain. By applying (1 + 1) covariance matrix adaptation evolution strategy ((1 + 1)-CMA-ES) [28], the sensitivity factor seeks the optimal value owing to tracking error penalty.

(1 + 1)-CMA-ES
Derived from the CMA-ES, a crucial feature of the (1 + 1)-CMA-ES is that it directly operates on Cholesky decomposition and is about 1.5-times faster than the classic CMA-ES. The (1 + 1)-CMA-ES adapts not only the entire covariance matrix of its offspring candidates, but also the entire step size. Since the covariance matrix C is definitely positive, owing to Cholesky decomposition, C = AA T , with A an n × n matrix. Thus, the offspring candidate y is written: where σ is the step size, x is the corresponding parental candidate and z ∈ R n is derived from normal distribution. In addition, the adaptation strategy maintains a search path in order to save an exponentially-fading record s ∈ R n . Thus, a successful offspring candidate can be obtained from the covariance matrix: With c + cov ∈ (0, 1). Rather than dealing with the Cholesky decomposition operation in every loop, the matrix A can be updated as follows: With w = A −1 s. To enhance the learning efficiency, if the offspring candidate solution is extremely unsuccessful, the algorithm should also learn from failure cases. Therefore, the covariance matrix update is written: where c − cov ∈ (0, 1); similarly, the equivalent update of matrix A is given: The graphic interpretation of CMA-ES is shown in Figure 6. More specifically, the contributions of each sample are ranked according to corresponding penalties, which will be explained in the next subsection. Not all samples, but several high-contribution points are used during the update procedure. Thus, the new distribution is evolved from the previous step distribution with an adaptive step size.

Constraint Optimization
Since the sensitivity factor α −1 ∈ (0, 1) is constrained regarding the optimization domain, to deal with this problem, the variance of the offspring distribution is reduced in the normal vector direction in the vicinity of the corresponding parental candidate. For the constraints v j , j = 1, · · · , m, an exponential fading vector is defined following [29] and updated when the optimization is lessened by the boundary: where c c ∈ (0, 1) is a parameter that determines the convergence speed. Equation (20) can be seen as a low-pass filter, which guarantees that the tangential components of Az have been canceled out. In these situations, the Cholesky factor A should also be rewritten to meet the demand of the restrained boundary: With w j = A −1 v j and Γ g j (y) = 1 if g j (y) > 0, otherwise zero. The update size is controlled by the parameter β. Moreover, compared with Equations (17) and (19), Cholesky decomposition A without a scalar greater than one aims to decrease the update step size to approach the boundary of constraints.
The solution to generating offspring candidates is explained in Figure 7. In the direction of the normal vector, the variance of the offspring is reduced as shown in Figure 7a. After the operation, the evolved variance is tangential to the constraint boundary in Figure 7b. Thus, the offspring candidates can be sampled and derived from the evolved variance A. Note that, in Figure 7c, after constraint optimization, the constraint angle has been increased due to the change of the offspring distribution.
In terms of the sequence hidden-states (q des i ,q des i , q des i ), as well as actual data, the penalty function for updating of the sensitivity factor is given: whereq i and q i are the joint velocity and angle, separately. Note that not akin to the control algorithm in [27], where the penalty function consists of the previous loop trajectory data, the desired control input (q des i ,q des i , q des i ) is computed and predicted from the identification process. Therefore, our scheme is more suitable for improving real-time control performance. The first term in Equation (22) is to avoid high jerk. Furthermore, the last two terms are penalized for tracking error. From our intuition, with less interaction force between the human and the exoskeleton, the pilot will feel more comfortable. According to this point view, punishing the last two terms may enhance the tracking performance. The main routine of the evolving learning of sensitivity factors scheme is presented in Algorithm 3. repeat Generate offspring samples according to Equation (15); if g(y) > 0 then v j is updated according to Equation (20); The Cholesky decomposition A is updated according to Equation (21); else the success probability and global step size are estimated The Cholesky decomposition A is updated according to Equation (17); else If the penalty is still inferior to its fifth-order ancestor, the Cholesky decomposition A is updated according to Equation (19); end if end if until f (α) convergence

Experiment and Discussion
Although the overall algorithm has been introduced in the previous part of this paper, the control performance of the algorithm should be verified. Consequently, in this section, first, we briefly explain our exoskeleton platform and distributed identification mode learning; then, we compare several state-of-the-art control methods (SAC, SAC + Qlearning) with our proposed control framework.

Experiment Setting and Model Learning
The developed single-leg exoskeleton platform is as shown in Figure 8, which is a robust and ergonomic device for endurance enhance and strength augmentation. The necessary auxiliary facilities are embedded industrial personal computer (IPC), PMAC (programmable multi-axis controller), Copley actuators and power supply unit. The power module should provide three kinds of voltage, i.e., 5 V, 12 V and 24 V. The observation data, derived from inertial measurement unit (IMU) and magnetic encoders, is collected by PMAC. The high-level control scheme is calculated in the embedded PC, and the driven commands are sent to the actuator through the Copley driver. For more details, we refer to our previous work in [30]. Figure 8. Exoskeleton robotic platform. The safety concern for our pilot is from two aspects, i.e., hardware precaution, as well as programming. Accordingly, exceeding of the motion is prohibited by the mechanical limit, and the system will be shut down if an error occurs.
Before the implementation of the experiments, the dynamic and measurement models should be learned off-line. Therefore, to learning dynamic and measurement models, the necessary components are input, as well as observation. For dynamic models, the training input is given in a series of tuples (x k , a k ) with k ∈ [1, · · · , n] and the next step state x k+1 as training outputs. The input signal a < 40 Nm is selected randomly and is constrained for safety. In addition, for measurement models, the training inputs are defined as system state x k given by the angles θ 1 , θ 2 and the corresponding angular velocitieṡ θ 1 ,θ 2 , for hip joint and knee joint, respectively. To observe the system states, we apply the encoder and IMUs to measure the joint angles and angular velocities separately. Therefore, the training outputs z k of the measurement models are defined.
The learning result of the dynamic model of the hip joint is presented in Figure 9. Consequently, in order to learn the hyper-parameters each GP takes about 30-40 trails. Furthermore, the initial value of each function may be different owing to random initialization. Moreover, to demonstrate the feasibility, we test 600 unused data pairs with the learned model as shown in Figure 10. The error of the identification result compared with the system value is about 0.01 rad given in the lower figure of Figure 10.

Flat Walking Experiment
In the experiment, the pilot is asked to naturally swing his left leg, as shown in Figure 11. The control algorithms compared in the experiment are SAC [10], SAC + Qlearning [27], SAC + CMA-ES and our proposed algorithm.
The tracking performances of the knee joint and the hip joint are given in Figure 12a,b. Since the hidden states θ 1 , θ 2 are identified, the single exoskeleton leg can follow the human motion in real time. Furthermore, owing to the CMA-ES algorithm, the evolution of the sensitivity factor α for the hip joint is shown in Figure 12c, along with the evolution for the knee joint presented in Figure 12d. To be more specific, the updated distribution is derived from the contributions of 15 samples selected from 20 candidates. The lower bound and upper bound are set to be three and 10, respectively, according to experimental experience. Furthermore, from Figure 12c, the sensitivity factor of the hip joint increases as the pilot puts down his leg, which means that the HRI force decreases, and the exoskeleton does not need too much additional torque from the pilot. Consequently, setting a fixed sensitivity factor is not a good option. Moreover, compared with the hip-joint sensitivity factor, the knee-joint sensitivity factor almost keeps a lower one, fluctuating around four. The reason for this situation is that the interaction between the thigh and the exoskeleton is more compliant than the interaction between the shank and the exoskeleton. Therefore, the sensitivity of the hip joint is better than that of the knee joint. In order to assess the control performance of our proposed algorithm, we compare the HRI force of four algorithms as shown in Figure 13. Since there is no sensor mounted between the pilot and exoskeleton, the HRI force is estimated by the disturbance observer. From intuition, the pilot feels more comfortable with less interaction force in the walking pattern. Therefore, the control performance may be evaluated from the information of the interaction force. As we all know, the ideal control performance is that the interaction force is zero, which means that the exoskeleton can follow the human motion perfectly.
Thus, to quantitatively evaluate the four control schemes, we compute the RMSE of the HRI of each joint. From Table 1, compared with the SAC control scheme, both SAC + Qlearning and SAC + CMA-ES own lower RMSE in both joints, since both algorithms learn an adaptive sensitivity factor. Furthermore, a small difference is shown in Table 1 between the SAC + Qlearning, as well as SAC + CMA-ES, and SAC + CMA-ES performs a little better. Moreover, a significant reduction of RMSE can be seen in Table 1 in our proposed algorithm, which is nearly two-times less than the other control schemes.  (c) SAC + CMA-ES; (d) proposed algorithm. Note that the HRI force is estimated by the disturbance observer since there is no sensor mounted between the human and exoskeleton.

Conclusions
In this paper, a novel control framework has been proposed to address the crucial issues of SAC, i.e., weak robustness, disability to reject the disturbance and model uncertainties. The proposed control methodology, so-called probabilistic sensitivity amplification control, is two-fold: distributed hidden-state identification and evolving learning of sensitivity factors. The identification aims to enhance the robustness and to reject the undesired disturbance. The computation expense is significantly reduced owing to the distributed GP scheme. Moreover, the evolving learning is mainly designed to reduce the HRI between the pilot and the exoskeleton. Since the sensitivity factor can be evolved due to the punishment of the tracking error, the controller can learn from different HRI. Finally, several experiments have been implemented to verify the feasibility of our control scheme. The experimental result shows promising application feasibility. In our future work, we will extend our control scheme to a more complex situation, i.e., upper limb exoskeleton and whole body exoskeleton.