Nonlinear Model Predictive Control of Single-Link Flexible-Joint Robot Using Recurrent Neural Network and Differential Evolution Optimization

: A recurrent neural network (RNN) and differential evolution optimization (DEO) based nonlinear model predictive control (NMPC) technique is proposed for position control of a single-link ﬂexible-joint (FJ) robot. First, a simple three-layer recurrent neural network with rectiﬁed linear units as an activation function (ReLU-RNN) is employed for approximating the system dynamic model. Then, using the RNN predictive model and model predictive control (MPC) scheme, an RNN and DEO based NMPC controller is designed, and the DEO algorithm is used to solve the controller. Finally, comparing numerical simulation ﬁndings demonstrates the efﬁciency and performance of the proposed approach. The merit of this method is that not only is the control precision satisﬁed, but also the overshoots and the residual vibration are well suppressed.


Introduction
The control of the flexible-joint (FJ) robot has been a major research topic in the field of control theory and engineering for several decades [1][2][3][4][5][6][7]. The FJ robot benefits from the characteristic of inbuilt compliance that provides low output impedance, shock tolerance, and accurate force control [8]. Due to its benefits, the FJ robot has been widely used in many applications where robot interacts with environments or with humans, such as monopod hopping robots and exoskeletons [9]. However, the FJ robot is an under-actuated strong coupling nonlinear system [10]. The control of such a complex nonlinear system is a difficult task. Therefore, the goal of this study is to design a suitable controller for a single-link FJ robot, which can also be utilized for complicated nonlinear systems.
Model-free methods have been generally employed in the field of FJ robot control. The earliest influential control approach is the traditional proportional-derivative (PD) method with gravity compensation [11,12]. Different types of PD controllers have been proposed because of their simplicity and practicability [13][14][15]. To deal with the overshoots and residual vibration, a fuzzy proportional-integral-derivative (PID) controller was proposed to suppress the elastic torsional vibration [16], and a nonlinear state feedback controller was employed to suppress the residual vibration of FJ robot [10]. Although these techniques in the aforementioned references have acquired relatively excellent performance in the FJ robot control, there are still certain issues requiring attention. For instance, the parameters of model-free controllers must be adjusted according to the requirements of system performance. The control performance is sensitive to the controller parameters, and these parameters are complex to adjust. Unlike the model-free method, model predictive control (MPC) as a primary model-dependent control method is an efficient controller to handle the • First, an RNN and DEO based NMPC method is proposed for the position control of a single-link FJ robot. The merit of this process is that not only is the control precision satisfied, but also the overshoots and the residual vibration is well suppressed. • To overcome the difficulty of modeling, a simple three-layer RNN with leaky rectified linear units as an activation function (ReLU-RNN) is established to approximate the FJ robot dynamic model with satisfactory precision. Then, according to the RNN predictive model and MPC approach, an RNN and DEO based NMPC controller is designed, in which the DEO algorithm is applied to solve the controller. • Finally, to demonstrate the efficiency and performance of this technique, some numerical simulation comparisons between our method and the PD method and the differential dynamic programming (DDP) [57] MPC approach have been established. Numerical simulation findings illustrate that the performance of this technique is superior to that of the PD and DDP MPC methods.
The remainder of this paper is organized as follows. In Section 2, the dynamic model of the single-link FJ robot, including the direct-current (DC) motor dynamics is established. In Section 3, the controller design is indicated. The numerical simulations are displayed in Section 4. Finally, the conclusion is given in Section 5.

Single-Link FJ Robot System Model
In this section, we establish the single-link FJ robot dynamic model with the DC motor dynamics being considered. The single-link FJ robot system, which can rotate in vertical plane, is shown in Figure 1. The system comprises two parts, as shown in Figure 1. The left part is the motor side, which includes a motor drive board, a DC motor, and a gear reduction box. The right part is the link side, which is composed of a massless link and a load. The two sections are linked by an elastic element, which is modeled as a linear spring. The FJ robot rotates in a vertical plane with the assumption that the elastic element can only deform in the direction of joint rotation [4]. The driving torque provided by the DC motor is τ m , and the gear reduction ratio is 1:N. The motor side torque is τ 2 = Nτ m . The stiffness of the linear spring is K. The angular position of the motor side is θ 2 , and the link side angular position is θ 1 . When the joint rotates, the joint can deform in the direction of rotation, and the torque can be represented by τ 1 (ϕ) = K(θ 2 − θ 1 ), where ϕ = θ 2 − θ 1 denotes the deformation of a linear spring.θ 1 andθ 2 stand for the angular velocity of the link side and the motor side, respectively. Similarly,θ 1 andθ 2 symbolize the angular acceleration of the link side and the motor side, respectively. For the sake of simplicity, we presume the viscous damping on the motor side and the link side to be B 1 (θ 1 ) = K f 1θ 1 and B 2 (θ 2 ) = K f 2θ 2 , where K f 1 and K f 2 denote the damping coefficient of the motor side and the link side, respectively. G(θ 1 ) = mglsin(θ 1 ) represents gravity, where m is the quality of the load, g is the gravity acceleration, and l is the length of the massless link. The rotary inertia of the link side and the motor side are J 1 and J 2 , respectively. Then, based on the Euler-Lagrangian equations, the system dynamics is formulated as (1) [58] Since the motor is employed to actuate the system, the motor dynamics are also considered to institute the system dynamic model. The motor dynamics are depicted as (2) where K τ is motor torque coefficient, i denotes motor armature current, R represents armature circuit resistance, L stands for armature circuit inductance, K e is back electromotive coefficient,θ m denotes the angular velocity of the motor rotor, and U V symbolizes motor armature voltage. The torque produced by the motor is transmitted to the motor side using a gear reduction box as shown in Figure 1. We suppose that there is no transmission loss. Then, based on Equation (2), we attain According to the above analysis, combining (1)-(3), the system dynamics including the motor dynamics can be described as (4) Let us define Then, the system dynamics can be formulated by following state-space expression (5) and (6) where X(t) = [x 1 (t), x 2 (t), x 3 (t), x 4 (t), x 5 (t)] T denotes the system state vector, and Y(t) symbolizes the system output. u(t) stands for the control input of the system. This model contains unmodeled parts, such as an accurate friction model, gear backlash, and mechanical transmission efficiency. Besides, precise model parameters are difficult to obtain. This type of nonlinear system is complex to control as the model is unknown. Thus, we present an RNN and DEO based NMPC method, which can be utilized for complicated nonlinear systems.

Nonlinear Model Predictive Control
Based on our system, the fundamental scheme of NMPC is detailed in this subsection. We utilize the discrete-time nonlinear autoregressive exogenous dynamic model to represent the system state-space Equation (5), which is capable of predicting future states for long-time series. At time step k, the state X(k + 1) is predicted by (7) where X k = [X T (k), X T (k − 1), ..., X T (k − d x + 1)] represents system state time series from kth time step through k − d x + 1th time step, correspondingly, U k = [u(k), u(k − 1), ..., u(k − d u + 1)] depicts the control input time series. d x and d u stand for the length of time series of system state and control input, respectively. f p (·) signifies a nonlinear function. For long-time series prediction, the predicted system state is transmitted into X k recurrently, for example (8) At k + 1th time step predicted system state X(k + 1) is transmitted into X k , and system state time series is updated as Similarly, exogenous control input is transmitted into U k and the control input time series is updated ]. This formula is not only useful for establishing NMPC controller but also convenient for approximating the model using NN. Then, we consider the discrete-time nonlinear system (7) to express the MPC scheme. Equation (7) including constraints can be rewritten as (9) where X ⊂ 5 symbolizes system state vector constraints, and U ⊂ denotes control input constraints. f n (·) stands for the nonlinear function, which is approximated by ReLU-RNN. N is the prediction horizon. A nonlinear MPC controller works by minimizing the performance criterion such as (10) where X(k) = [X k , X k+1 , ..., X k+N−1 ] and U(k) = [u 0 , u 1 , ..., u N−1 ] symbolize the system state information and control input to be optimized, respectively. The cost function is denoted by J(X(k), U(k)). U (k) = [u 0 , u 1 , ..., u N−1 ] signifies the optimized control input series. Each state-input pair satisfies Equation (9) with constraints. When the control input series are optimized, only the first term u 0 is applied to the system until the next time step, and the system state measurements are updated at the next time step. Then, the optimization procedure is repeated at each time step, which runs as a closed-loop.
In the field of robot control, it is very important to realize accurate position control, speed control and torque control. In practical applications, the accuracy of position control will directly affect the performance of the robot. When performing position control, the residual vibration is easy to be inspired due to the existence of elastic elements [10]. Therefore, position control is the most important and more challenging in FJ robot control. In this study, in order to achieve accurate position control, the position variable is selected as the control objective of the NMPC controller, so that we design the quadratic cost function as (11) subject to the terminal constraint (12)X wherex 1 (k + j + 1) denotes predicted position state and x re f 1 (k + j + 1) represents the reference trajectory. u(k + j) stands for the system control input at time step k + j. Due to the constraints of the control input in the real system, this term is introduced into the objective function as adjustment. α and β are the penalty coefficients of the performance criterion and control input, respectively.
We conclude from the above analysis that this technique is flexible as we have the option to design the cost functions for various objectives. For example, we can execute the velocity and torque control by simply varying the cost functions. However, implementing an NMPC controller is rather challenging, and there are two main problems in designing the controller. The first is how to create an accurate predictive model, and the second is how to solve the optimization problem successfully. We utilize a ReLU-RNN to approximate the system dynamic model in Section 3.2 to overcome the difficulty of nonlinear system modeling. The DEO algorithm, which will be described in Section 3.3 is used to optimize the control inputs.

Dynamics Model Approximation Using ReLU-RNN
We approximate the discrete-time dynamic model (7) by utilizing a simple three-layer ReLU-RNN, which is displayed in Figure 2.
where W x i and W u i are the weight vectors of the system state and control input for ith hidden neuron. b i signifies the bias of ith hidden neuron. The ith hidden neuron output h i,k is evaluated by (14) h where δ(·) represents the nonlinear activation function. We choose the leaky ReLU function as the nonlinear activation function, selected because it is useful for computing efficiently and preventing gradients from disappearing. The leaky ReLU function is detailed as (15) where ϑ indicates the input variable of leaky ReLU function. Then, the predicted output is expressed as (16)x where W o j symbolizes the weight vector of jth output neuron concerning hidden layer neurons, and b o j is the bias of jth output neuron. h k = [h 1,k , h 2,k , ..., h q,k ], where q is the number of hidden neurons. Combining (13)- (16), ReLU-RNN is capable of estimating system dynamic model by (17) where W and b represent the weights and bias of the NN, respectively. Each batch of training data contains 1000 randomly selected state-input pairs throughout the training process. The state-input pairs are generated by the simulation of discretized system model (5). We choose the the mean squared error (MSE) as the loss function, which is denoted as (18) whereX k = {X(k + 1),X(k + 2), ...,X(k + N)} symbolize the predicted values of system states. According to (18), the backpropagation method may be used to obtain the weight gradients (X,X) ∂W and bias gradients (X,X) ∂b . Then, we adopt the Adam algorithm [59], a type of gradient descent method, to train the network. Using Intel(R) Core(TM) i7-8550U CPU, the learning rate is set to 1.0 × 10 −5 , and the training process is completed after 50 min. The parameters of ReLU-RNN are set as follows. d x and d u are set to 5. The number of neurons in the hidden layer is 15, in the input layer is 30, and in the output layer is 5. The training findings are displayed in Section 4.

RNN and DEO Based NMPC Controller
In this subsection, we first introduce the DEO algorithm, which is based on the NMPC technique. Then, the RNN and DEO based NMPC controller is designed in detail.
The standard DEO is commonly indicated as DE/rand/1/bin [44]. A randomly selected population P consists of N P individuals corresponding to the prediction horizon of NMPC, each individual is an N-dimensional vector, which is represented by U i = [u i,1 , u i,2 , ..., u i,N ]. The U i corresponds to the control input U k+N that will be optimized. The evolutionary generation time in DEO is expressed by G = 0, 1, 2, ..., G m , where G m signifies the highest generation time. At Gth generation, the ith individual of the Gth generation population is designated as . u L and u U are the lower band and upper band of the control input, respectively. The population will vary with the evolution process, P G stands for the Gth generation population, and the initial population P 0 is randomly generated with the boundary constraint [u L , u U ]. The basic DEO algorithm operation procedure contains initialization, mutation, crossover, and selection, which are detailed as follows. Initialization: To establish the initial point of the optimization search, the population needs to be initialized. Generally, one way to build an initial population is to randomly select from the values within a given boundary constraint. It is a common assumption that all populations with random initialization conform to a uniform probability distribution. Typically, each jth element of the ith individual in the P 0 is initialized by (19) u 0 i,j = u L + rand(0, 1) · (u U − u L ), (i = 1, 2, ....., N P , j = 1, 2, ....., N), where rand(0, 1) denotes a uniformly distributed random number in [0, 1]. Mutation: For each individual vector U G i , a mutant vector V G i = [ν G i,1 , ν G i,2 , ..., ν G i,N ] at generation G is generated by (20) where r 1 , r 2 , r 3 ∈ {1, 2, 3, ..., N P } represent randomly chosen indices. F ∈ [0, 2] is the zoom factor of the difference vector (U G r 2 − U G r 3 ). If the element ν G i,j of the mutant individual violates the feasible region boundary of the search space, a simple method to treatment this problem is to replace the element with a novel one formulated by Equation (19). Another method is boundary absorption, which is described as (21) The mainstream mutation strategies are described as follows (22)- (27) (1) DE/rand/1/bin (2) DE/rand/2/bin (4) DE/best/2/bin (5) DE/current-to-best/1/bin (6) DE/rand-to-best/1/bin where r 1 , r 2 , r 3 , r 4 , r 5 ∈ {1, 2, ..., N P } are randomly chosen individual indices. U G best denotes the best fitness individual vector of Gth generation.
Crossover: To maintain the diversity of the population, the crossover operation is introduced. Binomial crossover strategy is most frequently utilized, which is expressed as (28) where Z G i = [z i,1 , z i,2 , ..., z i,N ] stands for the trial vector. C r ∈ [0, 1] is the crossover rate, which determines how many elements are inherited from the mutant vector. randint(j) is a randomly generated integer of [1, N], which is used to make sure that at least one element of the trial vector is inherited from the mutant vector.
Selection: In the selection procedure, the fitness function of the DEO algorithm is designed according to the control objective. Since our goal is to execute position control of a single-link FJ robot, the cost function detailed in Equation (11) is designed as the fitness function (29) of the DEO algorithm.
The best individual in current population is selected by calculating the fitness function. The selection method is represented by (30) In this paper, we adopt DE/best/2/bin (25) as the mutation progress of the DEO algorithm. The adaptive mutation factor is applied to scale the difference vector, which is described as (31) where F 0 signifies the initial mutation factor. The adaptive mutation factor is 2F 0 , which is a big value at the beginning of the evolution. Then, the diversity of individuals can be maintained, and it benefits for avoiding premature. In the later evolution period, the mutation rate is close to F 0 , the better individual is retained, and the damage of the optimal solution is avoided. In this approach, the probability of searching for the global optimal solution is enhanced. The flow chart of the DEO algorithm is shown in Figure 3. The RNN and DEO based NMPC controller architecture is illustrated in Figure 4. First, we employ the ReLU-RNN described in Section 3.2 to approximate the system dynamic model. Then, the ReLU-RNN predictive model is employed for predicting system forward dynamics, which is capable of integrating into NMPC architecture for designing the NMPC controller. Finally, the DEO algorithm is utilized to optimize the control inputs, only the first term u 0 is applied to the system, and the whole procedure runs as a closed-loop.
Based on the sampled system state information and the control inputs that will be optimized, the predicted position statesx 1 (k + j + 1) will be obtained via the ReLU-RNN predictive model (17). Then, the fitness function (29) is computed, and the system control inputs can be optimized via the DEO algorithm. The process of optimization via DEO algorithm is detailed in Algorithm 1.

Output:
The best f (U G m best ), and the best individual U G m best 1: Initialize parameters: C r , F 0 , and N P 2: Randomly initial population: Evaluate f (U G i ) and select the best individual U G i , i = 1, 2, ..., N P

5:
Let U G best ← U G i 6: Evaluate adaptive mutation factor:

22:
end if 23: end for 24: end if 27: end for 28: end for 29: return The best f (U G m best ), and the best individual U G m best . A corresponding summary of the RNN and DEO based NMPC scheme can be presented as follows: Step1. Obtaining the current system states and the saved history system states information along with system control inputs from the single-link flexible joint robot system. Step2. Based on the system state information, using the ReLU-RNN predictive model to predict future position states with N time steps. Step3. According to the predicted system state information and the designed cost function, using the DEO (Algorithm 1) to solve the NMPC controller. Step4. Applying the first term (u 0 ) of the optimized control inputs to the system until the next time step. Step5. Time step proceeds forward one step (k = k + 1). Then, it updates the saved history system state information, and returns to Step 1.
As usual in stability proofs, we will assume that the ReLU-RNN predicitve model is perfect, so that the predicted and real state trajectories coincide: X(k + j) =X(k + j) if u(k + j) = u (k + i).
Let define where G(X(k), U(k)) = α x 1 (k + j + 1) − x re f (33) With this assumption we have We have assumed that the terminal constraint is satisfied, the optimization problem was assumed to be feasible, so we can make U(k + N) = 0 and stay at X(k) = 0, which gives Since G(X(k), U(k)) ≥ 0, we can conclude that J (X(k + 1), U(k + 1)) ≤ J (X(k), U(k)). Thus, J (X(k), U(k)) is a Lyapunov function, and we conclude by Lyapunov's theorem that the equilibrium X(k) = 0, U(k) = 0 is stable.

Numerical Simulations
In this section, MATLAB 2019b is employed to create numerical simulations to demonstrate the effectiveness and performance of our suggested method. To verify the superiority of this method, the conventional PD and DDP MPC methods were considered comparatives. DDP MPC is an NMPC technique that uses DDP to solve the MPC controller. To achieve a fair comparison, the predictive model and cost function used in DDP MPC were the same as the RNN and DEO based NMPC method, and the parameters of the controllers were carefully adjusted.
The model approximated by ReLU-RNN, which has been described in Section 3.1, was used for designing the NMPC controller. The discretized system dynamic model (5) was simulated in MATLAB, which is regarded as the real system platform. Table 1 lists the parameters of the simulated model.

Parameters
Values Parameters Values The MSE (18) was employed to evaluate the performance of the approximated model. Table 2 shows the model approximation results. The findings revealed that the prediction precision of the learned model was relatively accurate.  Figure 5 displays the progress of the multi-step prediction of the ReLU-RNN predictive model. Correspondingly, Figure 6 shows the absolute errors. The figures show that, even if a forward prediction was 20 time steps, the performance of the ReLU-RNN predictive model was also satisfied, and it could be used to establish an NMPC controller.  The experimental results proposed by [44,62] have shown that the DEO has good convergence properties. To demonstrate the convergence of DEO, the cost values of the optimization process are plotted in Figures 7 and 8. Figure 7 displays the cost values in evolutionary iteration at each time step. It can be seen that the cost value converged to a fixed value after 80 iterations. Figure 8 shows the optimized cost values at the target tracking process. We can see that the cost values converged to a small value with the increase of time step, which indicates that the DEO could solve the proposed controller effectively.   Table 3 indicates the state error of different controllers. The findings illustrate that both controllers could efficiently control the system.    Figure 9 indicates that there were some overshoots and residual vibration in the system response when controlled by the PD and DDP MPC methods. This is due to the existence of an elastic element in the FJ robot, which led to the overshoots and residual vibration being easily inspired. Nevertheless, from Figure 9, we can see that the proposed controller was able to reduce the overshoots and suppress the residual vibration. Table 3 demonstrates that our controller had a certain degree of precision control, and the precision was better than the PD controller. The DDP MPC controller achieved higher precision than our controller, but a closer look at the tracking progress in Figure 9, shows that the tracking process of our controller was smooth, with few overshoots and the vibration was well suppressed. Figure 10 depicts the controller actions. The control signal of the DDP MPC controller fluctuated greatly, the PD controller presented smaller fluctuations, and the proposed controller had the smallest fluctuations. The fluctuations in the controller signal had a great influence on the system, potentially reducing the service life of the robot and even leading to mechanical damage. The influence of controller signal fluctuations was, to some extent, more essential than control precision. It indicates that our strategy was more suitable for FJ robot control.
The need for a closed-loop system is important in the presence of external disturbances. To verify that the proposed controller is robust to external disturbances, we added external disturbances to the system. Figures 11 and 12 show the system responses with external disturbances. As can be seen from Figure 11, the system responded quickly and remained stable. The control performance was also fairly satisfactory. Figure 12 depicts the control actions, which demonstrates that the proposed controller could be solved by the DEO efficiently and it could achieve a good robustness against external disturbances. Based on the above investigation, we conclude that the performance of the RNN and DEO based NMPC method was better than that of the PD and DDP MPC methods. In addition, this method achieved a good robustness against external disturbances. The merit of the proposed method was that not only was the control precision satisfied, but also the overshoots and residual variation were suppressed well.

Conclusions
This work presents an RNN and DEO based NMPC approach for position control of a single-link FJ robot. First, the system dynamic model has been approximated using a simple three-layer ReLU-RNN. Then, according to the RNN predictive model and MPC method, the RNN and DEO based NMPC controller was designed, in which the DEO algorithm was utilized to optimize the control inputs. Finally, through comparative numerical simulations, the effectiveness and performance of the proposed technique have been verified. The simulation findings have shown that the suggested method is superior to that of the PD and DDP MPC methods, which is capable of minimizing overshoots and suppressing residual variation with the control precision satisfied.
The parallel DEO can speed up the optimization process because DEO is a stochastic optimization algorithm that is inherently parallel. In the future, considering the optimization solution time, we will evaluate the RNN and parallel DEO based NMPC approach that can be utilized for implementing real-time NMPC, and it will be further verified by experiments. In addition, we intend to apply it to multi-degree-of-freedom FJ robot applications.