Online Control for Biped Robot with Incremental Learning Mechanism

: In this paper, we develop a new online walking controller for biped robots, which integrates a neural-network estimator and an incremental learning mechanism to improve the control performance in dynamic environment. With the aid of an iteration algorithm for updating, some newly incoming data can be used straightforwardly to update into the original well-trained model, in order to avoid a time-consuming retraining procedure. On the other hand, how to maintain the zero-moment-point stability and counteract the effect of yaw moment simultaneously is also a key technical problem to be addressed. To this end, an interval type-2 fuzzy weight identiﬁer is newly developed, which assigns weight for each walking sample to deal with the imbalanced distribution problem of training data. The effectiveness of the proposed control scheme has been veriﬁed through a full-dynamics simulation and a practical robot experiment.


Introduction
In recent years, biped robot has received considerable attention owing to its unique bipedal movement, excellent suitability to human society and theoretical importance. Up to date, a number of active control approaches have been proposed, e.g., stability-criterionbased method [1][2][3], model-based method [4][5][6], and optimization-based method [7][8][9]. In addition to these, many real robot platforms have been successfully developed, including Atlas, MABEL, ASIMO, and NAO [10,11]. To achieve the bipedal locomotion stability, zero-moment-point (ZMP) was proposed and have become the most popular stability criteria. In [12], Kajita et al. designed a ZMP tracking servo controller and proposed a bipedal walking pattern generation algorithm based on cart-table model. In [13], a modified walking pattern method was presented by utilizing allowable ZMP variation and both step length and walking period can be independently adjusted without any extra step. Subsequently, Shin et al. [14] further proposed a practical gait synthesis algorithm by optimizing gait parameters, and the locomotion stability was guaranteed. Moreover, Caron et al. [15] defined the pendular support area and presented a whole-body controller for locomotion across arbitrary multicontact stances. Despite these contributions, the stability established in [12][13][14][15] depends on an assumption that the effect on stability caused by yaw moment can be ignored, which is in fact a restrictive condition. As pointed out in [16][17][18], yaw moment is inevitably generated by the motion of swing leg, which may lead to slippage or falling down.
To remove such limitation, much effort has been paid in this field and some interesting results were reported in [16][17][18][19][20][21]. In particular, Hirabayashi et al. [16] proposed a waistrotation-based yaw moment compensation algorithm, while a biped robot was modeled as a 3D inverted pendulum. In [22], with the fusion of waist joint control technique and optimized swing leg reference generation method, a fast walking pattern generation approach was presented to counteract the effect of yaw moment. Inspired by human walking experience, Xing et al. [18] designed an arm-swing-based control scheme to cancel the factors which produce the yaw moment. To further improve the control performance, in [19], the angular momentum rate changes were smoothly integrated into yaw moment equation and the locomotion stability was ensured by utilizing a Eulerian ZMP resolution approach. Moreover, Yang et al. [21] constructed a practical control scheme to compensate yaw moment by controlling lower limb. Although much progress has been made in dynamic balance control field, some challenging difficulties still remain open. In most of existing control schemes on yaw moment compensation for biped robot, such as those mentioned above, only a few joints are involved which bring much burden to driving motors and may result in unnatural gaits. In practice, it is difficult to generate natural and efficient gaits in real time according to external disturbance from circumstance.
To address this problem, a series of optimization-based methods were proposed. In [23], a spline-based estimation of distribution algorithm was proposed by formulating the gait pattern generation into a multiobjective optimization problem. In [24], besides ensuring the ZMP stability, the performance of energy efficiency was also well guaranteed by the fusion of moving ZMP criterion with the fourier series approximation technique. With the recourse of Newton-Raphson iteration, the locomotion stability was achieved and the walking speed of robot was successfully regulated online in [25]. Furthermore, Wang et al. [26] presented a SVM-based learning control system for biped robots, in which a novel SVM objective function with energy-related slack variables was proposed. This objective function followed the principle that the slack variables were determined by energy cost, which means the sample with lower energy consumption contributes more to SVM regression. This provided an interesting clue to learn biped walking locomotion. However, this method generally depends on a well-trained model, which may not always be achieved in practical applications.
Motivated by such an observation, in this paper, we make an attempt to further address the online control problem for biped robot. To remove the restrictions just mentioned, the main challenging difficulty that obstructs the design of our control scheme lies in the development of a protocol to compensate yaw moment and at the same time maintain zero-moment-point stability. To overcome the difficulty, an online walking control approach is presented. In summary, the work of this paper has the following novelties and contributions: 1.
As compared with the control scheme developed in [26], ours newly equips with a neural-network estimator and an incremental mechanism, with which those newly coming data can be used straightforwardly to update the original well-trained model in real time. This implies that it is possible for a robot to achieve better locomotion stability in dynamic environment, e.g., from flat ground to uneven terrain; 2.
Traditional optimization-based methods, such as those in [23][24][25][26] are involved in many adaption laws to be updated or computed online, which may result in a computation burden during control implementation. To remove this restriction, we achieve the fusion of the random vector functional-link neural network with an incremental mechanism, so that the entire retraining from beginning can be effectively avoided. Furthermore, by designing an interval type-2 fuzzy weight identifier (IT2FWI), both horizontal and vertical locomotion stabilities are successfully taken into account in training procedure.
The rest of this paper is organized as follows. In Section 2, the kinematics and dynamics of the biped robot are given and some preliminaries are presented. In Section 3, we propose an online control scheme based on incremental learning Algorithm 1, and a neural-network mechanism is established. In Section 4, simulation and experiment are carried out to verify the effectiveness of our scheme. In Section 5, the conclusions are given.

Overview of Biped Robot BRZ-4
BRZ-4 is a half-size biped robot, which is set up as a test bed, as presented in Figure 1. Basically, BRZ-4 is 66.2 cm in height and 2.4 kg in weight, which contains 17 degrees. Specifically, three degrees for hip joint, one for knee joint, and two for ankle joint. To collect necessary feedback information, each joint is driven by a DYNAMIXEL MX-64-T motor and the mechanical structure of this robot is made from 3D printing. Moreover, the rotary encoders are integrated in the motors to obtain the motion of joint. The BRZ-4's kinematic model is accordingly depicted in Figure 1 and the physical parameters are given in Table 1.  A control system for BRZ-4 is set up to perform stable walking objective, which consists of biped robot and ground workstation. By integrating Matlab in the ground workstation, the proposed method is implemented to collect the real-time motion of robot and send control signals to BRZ-4 through RS485 bus. Moreover, to reduce the number of cables, driving motors are connected in daisy chains.

Kinematics and Dynamics
Usually, biped robot can be simplified as connective link model as shown in Figure 1, in which each link is uniform mass distribution. The relationship between joint speed and end-effector velocity is defined asṙ whereṙ ∈ R m is task-space velocity, q = [q 1 , ..., q n ] ∈ R n represents joint angles, n is the number of degrees of freedom,q = [q 1 , ...,q n ] denotes joint angle velocities and J(q) ∈ R m×n is the Jacobian matrix from joint space to task space. From the Lagrangian approach, the dynamics of biped robot can be expressed as follows where M(q) ∈ R n×n is the positive definite inertial matrix, C(q,q) ∈ R n×n is the Coriolis and centrifugal matrix, G(q) ∈ R n is the gravitational force, F e ∈ R n denotes the external disturbance, τ ∈ R n is the joint torques.

Bipedal Locomotion Stability
Zero-moment-point (ZMP) stability criterion is one of the most popular stability criteria, which is successfully applied in real biped robot platforms including ASIMO, NAO, and HRP. According to the definition of ZMP, zero-moment-point is the point on the sole, in which the horizontal component of the net moment caused by inertial and gravity forces is zero as shown in Figure 2. Hence, the following equation holds where M x and M y are the x-axis and y-axis moment of the inertial and gravity forces, respectively.
As pointed out in [18,19], ZMP stability criterion cannot guarantee the moment equilibrium in vertical plane, which neglects the influence caused by yaw moment on locomotion stability. Actually, the undesired yaw moment along the support leg would be generated by the motions of components of biped robot in different planes. Thus, we have where M z denotes yaw moment and M R is the moment generated by the ground reaction force. Specially, yaw moment is defined as below where m i is the mass of the ith connective link; r i is the position vector of the center of the ith connective link; r zmp denotes ZMP position vector; g = [0, 0, g] is the gravitational acceleration vector; M z is yaw moment. Moreover, ZMP coordinate r zmp = (x zmp , y zmp , 0) has the following forms [27]: where [x i , y i , z i ] is the position of the center of the ith link.
To evaluate the locomotion stability in horizontal plane, ZMP stability margins introduced in [28] are adopted.
where l zx and l zy denote the x-axis and y-axis distance between zero-moment-point and boundaries, respectively; l cx and d cy represent the x-axis and y-axis distance between the center of foot sole and boundaries, respectively; Ω zmp denotes ZMP boundaries.

Online Control System Design
In this section, the control system design procedure will be specially introduced and a walking control framework is presented as shown in Figure 3, in which the random vector function-link neural networks (RVFLNNs) [29] is adopted to approximate f (•) and an incremental learning mechanism is incorporated in the NNs. Moreover, an interval type-2 fuzzy weight identifier (IT2FWI) is designed to improve the control performance.

Weighted Neural-Network Estimator
From Equations (5)- (7), it is noted that both ZMP stability and yaw moment are related with the position, velocity and acceleration of each link. According to [24,25] and results in [28], the locomotion stability can be ensured by regulating robot joints. Thus, the following mapping function is considered where f (•) is a non-linear mapping function, ∆q = [∆q 1 , ..., ∆q n ] T denotes the corrections of all joints; ∆X z is ZMP error, M z denotes yaw moment. By approximating the non-linear mapping function f (•), ∆q can be obtained. In our scheme, the random vector functional-link neural networks(RVFLNNs) [29] is adopted to estimate f (•). Different from traditional Neural Networks, RVFLNNs effectively eliminates the disadvantage of the long training process and provides a fast learning property by designing a flatted network with randomly generated weights and biases.
Given N training sample sequences {X i , t i } N i , letĉ i represent the appropriate weight of the ith sample. Thus, the approximation task is formulated as the following optimization problem arg min whereĉ i denotes the weight of the ith sample, A = [A 1 , ..., A N ] = [K n H m ] represents the input matrix, K n = [K 1 , ..., K n ] is the input node set, K i = ϕ(XW ei + β ei ) is the input node, X is the input sample data, and H m = [H 1 , ..., H m ] is the enhancement node set, H i = ϕ(K n W hi + β hi ) is the enhancement node; β ei and β hi are bias; W ei and W hi are weight matrices; ϕ(•) is sigmoid function; T = [T 1 , ..., T N ] is the desired output matrix, λ is a penalty coefficient. Moreover, W is the connecting weight matrix, which can be computed by W = [W 1 , ..., W N ] = A + T.
By applying the Moore-Penrose inverse, we have where A + is the pseudo-inverse matrix of A,Ĉ = [ĉ 1 ,ĉ 2 , ...,ĉ n ] is the weight matrix of sample data.

Incremental Learning Method Design
Let A m+1 = [A m Y p ], then the pseudoinverse of A m+1 can be expressed as follows [29]: where Thus, the new weight matrix is achieved by the following equation:

Interval Type-2 Fuzzy Identifier Design
One of the main difficulties in the development of the proposed control scheme is how to assign an appropriate learning weight for each walking sample. In this paper, an interval type-2 fuzzy weight identifier (IT2FWI) is designed to deal with the uncertainty of walking sample.
The interval type-2 fuzzy logic systems (FLSs) rules are given as follows where z and y represent ZMP stability margin and yaw moment,ĉ denotes the weight of sample data, B l,j and O l denote the linguistic variables of the fuzzy sets; l = 1, 2, ..., L. L is the total number of the fuzzy rules. Gaussian membership function is adopted to map crisp input to fuzzy sets for its clear physical signification. The membership function of ZMP stability margin is given as below [30], the output fuzzy set O(ĉ) can be obtained by the following equation where f i = ∏ n j=1 φ B i,j , f i = ∏ n j=1 φ B i,j , n = 2; '∨' operation denotes the maximum operation.
Utilizing the center-of-sets-type reduction and the Karnik-Mendel method, we havê whereĉ is an interval set,ĉ low ,ĉ high represent the left and right limits, respectively;ĉ i is the centroid of the type-2 interval consequent set O i ; C = (ĉ 1 , ...,ĉ L ) represent the original rule-ordered consequent values and c = ( c 1 , . . . , c L ) = QC satisfying c 1 ≤ c 2 ≤ ... c L ; Q is an L × L permutation matrix. Thus, the defuzzified output isĉ =ĉ low +ĉ high 2 (22)

Experiment Results and Analysis
In this section, the effectiveness of our control scheme is discussed through simulations and experiments. Robots are required to perform two typical kinds of tasks including walking on flat ground and climbing stairs. The first one is based on a physical platform, while the second one is carried out on a simulation platform.

Experiment: Walking on Flat Ground
As illustrated in Figure 1 and Table 1, the biped robot BRZ-4 is set up as test bed. Roughly, the test bed consists of two parts, which are the ground workstation and BRZ-4. We apply the proposed control algorithm to BRZ-4. Specifically, our scheme contains off-line and online learning parts. To improve the efficiency, the off-line training is carried out in matlab while the online learning is implemented by using C language. Moreover, the control commands and the states of robot can be transmitted through RS485 bus. To visually present the basic components of hardware system, a graphical result is provided in Figure 4. One of the goals of the experiment is to control the biped robot to track the desired gait, such that stable walk is achieved. Under the new constructed control framework, desired gait is planned by using a spline-based parametric optimization technique [31], which contains start gait, period normal gait, and stop gait. In addition, the generation of planned gait is implemented in the ground workstation and specific control command will be transmit to BRZ-4 through RS485 bus in real time. To visually illustrate the planned results, the stick animation is presented in Figure 5, in which the CoM trajectory is highlighted in red. In the construction of the proposed IT2FWI, we take the Gaussian function with fixed standard deviation σ and uncertain mean as the primary membership functions. By applying the trial-and-error procedure, the designed parameters of membership functions of ZMP stability margin and yaw moment are chosen as follows A comparison between the proposed control scheme and the one in [26] is carried out on the platform of BRZ-4. The ZMP response trajectories are plotted in Figure 6. As indicated in this figure, all the ZMP trajectories are observed to be within the convex boundaries of the supporting foot, which implies that both methods can ensure the locomotion stability in horizontal direction. On the other hand, undesired yaw moment has an significant impact on locomotion stability of biped robot, as pointed out in [16][17][18]. Now we test the effectiveness of our control scheme in compensating for yaw moment. The evolutions of yaw moment, with our method and the one in [26] are visualized in Figure 7. As seen from the comparison, with the above two methods, the yaw moment is successfully suppressed. Apart from these, the root-mean-square (RMS) errors of x-axis/y-axis ZMP stabilities and yaw moment are recorded in Table 2. It is noted that, with the proposed scheme, the RMS errors of x-axis,y-axis ZMP trajectories and yaw moment are around 5.9%, 9.9%, and 20.7% lower than those in [26], respectively. Moreover, comparing with the method in [26], the online learning time is dramatically reduced from 1.56 s to 0.23 s.

Remark 1.
Comparing with [26] which also focuses on the optimization-based learning control design, the learning mechanism in our scheme can be divided into two parts including off-line learning and online learning. By employing a flat network structure and deriving an weight matrix updating Equation (14), the proposed method successfully avoids the entire retraining from beginning. As a result, the online learning time is dramatically reduced as shown in Table 2.

Simulation: Climbing Stairs
In this case, we consider the new robot BRZ-5, whose basic parameters Table 3. As indicated in Figure 8, BRZ5 is 121 cm in height and 14.9 kg in weight. The whole simulation contains two kinds of gaits. One is walking gait and the other is climbing stairs gait. Specially, every gait includes six step cycles. In this simulation, the robot is required to climb stairs after walking six steps on flat ground. Moreover, some comparative simulations between our proposed method with the one in [26] are conducted on Pybullet which is a real-time physics simulation platform. To facilitate the comparison and analysis, we keep the setting parameters and initial conditions as the same.  The comparative simulation results are given in Figures 9 and 10. In particular, Figure 9 shows the snapshots of climbing stairs while the ZMP response trajectories are plotted in Figure 10. From Figure 9, it is noticed that, with these two methods, the robot can maintain balance in the first six step cycles on flat ground. However, in the next six step cycles, the robot fell down with the method in [26] while the robot controlled by the proposed scheme successfully finished the climbing stairs task. Similar results are also observed in Figure 10, in which ZMP response trajectories are illustrated in dot-dash black line and solid blue line. As indicated in Figure 10, with the approach in [26], ZMP response trajectory is basically within ZMP boundaries in the first six steps while the obvious deviation appears from the 8th to 12th steps. Comparing with the control scheme in [26], ours exhibits a better generalization performance.

Remark 2.
A brief analysis is given to the above comparative results. As we know, the strong environment adaptive ability is one of the keys to realize the large-scale application of biped robots. However, it is almost impossible to handle all kinds of dynamic disturbances from environment with only one well-trained model. Unlike the approach in [26], an incremental updating mechanism is newly integrated into our scheme. With the aid of an iteration algorithm for updating, the new incoming data can be used straightforwardly to update into the original well-trained model, which successfully avoids the entire retraining from beginning. Thus, with this incremental updating mechanism, the adaptive capacity of robot is further improved.

Conclusions
This paper presented a walking control framework for biped robot to deal with the online leaning and control problems. Under the new framework, an incremental learning algorithm is further constructed, such that the new coming data can be integrated into the well-trained model in real time without a retraining process. Finally, experiment and simulation results verified the effectiveness of the proposed scheme.

Conflicts of Interest:
The authors declare no conflicts of interest.