Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints

: In this paper, we investigate the constrained optimal control problem of nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints. By utilizing a barrier function transformation, together with a new disturbance-related term and a smooth safety boundary function, a nominal system-dependent multi-input barrier transformation architecture is developed to deal with the time-varying safety constraints and uncertain disturbances. Based on the obtained transformation system, the coupled Hamilton–Jacobi–Bellman (HJB) function is established to obtain the constrained Nash equilibrium solution. In addition, due to the fact that it is difﬁcult to solve the HJB function directly, the single critic neural network (NN) is constructed to approximate the optimal performance index function of different control inputs, respectively. It is proved theoretically that, under the inﬂuence of uncertain disturbances and time-varying safety constraints, the system states and neural network parameters can be uniformly ultimately bounded (UUB) by the proposed neural network approximation method. Finally, the effectiveness of the proposed method is veriﬁed by two nonlinear simulation examples.


Introduction
To solve the optimal control problem of any safety-critical systems (e.g., autonomous vehicles, intelligent robots, etc.), safety should be the basic requirement.Failure to ensure the safety of such systems may result in serious consequences, such as casualties, environmental pollution, and equipment damage.The safety control design refers to the control strategy which satisfies the safety specification stipulated by the physical or environmental constraints of the system.The barrier function (BF) method [1,2] has been proved to be an effective method to realize the system safety constraints or state constraints, and have attracted a wide amount of attention in recent years.For the optimal control problem in the modern control domain, it usually relies on solving the complex Hamilton-Jacobi-Bellman (HJB) equation [3][4][5].However, there is no effective mathematical method to solve the HJB equation due to its own properties.When designing the controllers that are both safe and optimal, the proper combination of safety and performance goal is an issue worth studying.
It has been proved that the dynamic programming (DP) method is a feasible and effective method to solve the HJB equation and derive the optimal solution.However, as the dimension of the variables increases, the dynamic programming method suffers from the "dimension curse".Adaptive dynamic programming (ADP) [6][7][8][9][10] uses the function approximation, such as neural network (NN) approximation methods, to approximate the cost function in the HJB equation, which has been proved to be a valid method to solve the dimension curse of dynamic programming method.It is an emerging method combining the development of artificial intelligence and control field, and has become a hotspot of international optimization research in recent years [11][12][13][14][15].In [11][12][13], the authors studied the optimal control problem with disturbance by using the reinforcement learning (RL) method.Aiming at the random differential equations systems with coexisting parametric uncertainties and severe nonlinearities, Zhang et al. [14] studied the problem of eventtriggered adaptive tracking control.Vamvoudakis et al. [15] proposed an online continuous time learning algorithm based on policy iteration to learn the optimal control solutions of known nonlinear systems.In [16][17][18], the robust control problem was transformed into the optimal control problem of the nominal system by selecting an appropriate utility function.On the other hand, game theory [19][20][21][22][23][24] has become a powerful tool to optimize the coordination and cooperation of multiple controllers, and has been proved in many practical control problems.In fact, many systems in the real world have the idea of the non-zero-sum (NZS) game, where each controller of the system tries to minimize its cost function.Many researchers translate the non-zero-sum game problem [25,26] into the problem of solving the coupled HJB equation, but it is still a great difficulty to solve the coupled HJB equation [27][28][29].The development of adaptive dynamic programming and game theory has prompted many scholars to conduct relevant research.For robust trajectory tracking multiple input control of uncertain nonlinear systems, Qin et al. [28] proposed a new adaptive online learning method to learn the Nash equilibrium solution.Song et al. [29] developed a non-strategic integral reinforcement learning (IRL) method to effectively solve the NZS game control problem with unknown system dynamics.Ming et al. [30] proposed a single-network adaptive control method to obtain the optimal solution of NZS differential game for autonomous nonlinear systems.All of the above methods can effectively solve the NZS game optimal control problem.However, few studies have been done on the NZS game with disturbance and time-varying safety constraints.This prompted the author to study this problem.
For the safety constraints, the existing methods based on barrier function and adaptive dynamic programming have received a lot of attention in recent years.Marvi et al. [31] proposed a barrier certified method to learn the safety optimal controller and ensure the operation of the safety-critical system within its safety zone while providing the optimal performance.By introducing the barrier function into utility function, Xu et al. [32] augmented the penalty mechanism to the utility function, and solved the state constraints problem that was difficult to be dealt with by the traditional ADP method.Liu et al. [33] proposed an adaptive control method to obtain the safety solution of nonlinear stochastic systems.In addition, the barrier function transformation method has proved that it is possible to transform the safety-critical system with safety constraints into a general system without constraints in different scenarios, such as zero-sum game [34], non-zero-sum game [35], tracking control [36], and event-triggered control [37].However, without exception, the above results must satisfy the implicit assumption that the safety constraints are constant.In fact, the constant constraint is only a special case of time-varying constraints.In practical applications, the time-varying constraints also have a wide range of application scenarios, such as UAV or manipulator working in some more complex environments.
For the constrained optimal control problem with time-varying safety constraints and uncertain disturbances, the constrained Nash equilibrium solutions are obtained by introducing a novel barrier function transformation and constructing coupled HJB equations.The novelty of this paper is reflected in the following points: (1).A novel barrier function transformation method is proposed by introducing a smooth safety boundary function and a barrier function with a single variable.Compared to previous works [34,35], the proposed method no longer strictly requires the timeinvariance of safety constraints and can deal with both time-invariance and time-varying safety constraints.
(2).In order to obtain the constrained optimal Nash equilibrium solution of the multi-input barrier transformation system with uncertain disturbances, the reasonable performance index function and coupled HJB function are designed for the nominal system by introducing a disturbance-related term.It is proved that the obtained constrained Nash equilibrium solution can make the safety-critical system asymptotically stable under the uncertain disturbances and time-varying safety constraints.
(3).The single critical neural network is used to approximate the performance index function online to obtain the constrained control input.It is proved theoretically that the proposed barrier function transformation and neural network approximation method can make the system state and NN parameters uniformly ultimately bounded (UUB) under the condition of satisfying the time-varying safety constraints.In addition, two simulation examples also verify the feasibility and effectiveness of the proposed method.
The remainder of this article is organized as follows: Problem formulation and barrier transformation are given in Section 2. Section 3 employs the coupled Hamilton-Jacobi-Bellman equation to obtain the approximate optimal solution online.Section 4 shows the efficiency of the proposed method by giving two simulation examples.Finally, conclusions are given in Section 5.

Problem Formulation and Barrier Transformation
Consider the following nonlinear multi-input safety-critical system: where C indicates the set of acceptable system state, and U 1 , U 2 indicates the set of acceptable system inputs.It is supposed that f (x), g 1 (x), g 2 (x) is Lipschitz continuous, and f (0) = 0.It is also assumed that the system (1) is stabilizable.The uncertain disturbance term d satisfies d T d < δ T δ, where δ is a given function, δ(0) = 0 and ϕ(•) satisfy that ϕ(0) = 0 is a fixed function denoting the uncertainty.Given the initial system state x 0 , the purpose of this article is to find the constrained control inputs u 1 , u 2 to make the system state x converge to the ideal value under the impact of the uncertain disturbances and time-varying safety constraints.Remark 1.In some papers, for example [31,35], the system state is constrained by the constant, that is, x ∈ (ζ a , ζ A ), where (ζ a , ζ A ) represent the upper and lower bounds of system state.We consider a more complex and interesting case where the system safety constraints are time-varying and can be mathematically expressed as x ∈ (ζ a (t), ζ A (t)), where (ζ a (t), ζ A (t)) represent the bounded smooth time-varying functions.
It is worth noting that the constraints given by (ζ a (t), ζ A (t)) can be many common trajectories, including sinusoidal waveforms, damping sinusoids, ramp, and so on.In our study, we will discuss a more useful form.We design the constraints (ζ a (t), ζ A (t)) as the following smooth transformation functions, and satisfy the following conditions: . . . where We can find many similar practical applications where the similar constraints are imposed (e.g., vehicle entering a narrow road from a wide road, drone entering a tunnel, robotic arm working in a narrow space, etc.).

Remark 2.
A reasonable choice of parameters can be such that l 1 = l 2 , l 3 = l 4 when designing a smooth transformation function.In other words, the proposed method can also impose time-invariant safety constraints on the system state when some parameters are selected properly.In addition, according to the defined smooth transformation function, it can be extended to scenarios with more complex safety requirements, such as more frequent transformation of constraints and different types of constraints.
Considering the system (1) with the uncertain disturbances and time-varying safety constraints, we use the proposed barrier function and smooth transformation function to convert the multi-input safety-critical system x with the uncertain disturbances and time-varying safety constraints into the transformation system with uncertain disturbances only.We define According to the chain rule and Equations ( 6) and ( 7), the transformed system dynamics ṡ can be defined as where Based on Formula (8), the transformation system s = [s 1 ; • • • ; s n ] can be written as where )) and use s to represent s(t) in the following description.
After the proposed barrier transformation, we have transformed the problem from the constrained optimal control problem for the safety-critical system (1) with uncertain disturbances and time-varying safety constraints to the constrained optimal control problem for the transformation system (9) with uncertain disturbances only.Before proceeding, we need to make the following proof about the transformation system (9).Theorem 1.Based on the proposed barrier transformation (6) and (7), the transformation system (9) obtained from the system (1) satisfies the following properties: (1) F(s) is Lipschitz with F(0) = 0, and satisfies F(s) ≤ λ f s , where λ f is a constant; (2) G 1 (s), G 2 (s) are bounded, and there exists constants λ 1g , λ 2g , makes G 1 (s) ≤ λ 1g , G 2 (s) ≤ λ 2g .The transformation system (9) has zero state observability.

Proof of Theorem 1. (1) Based on Equation (8), we can obtain
where , F i (0) = f i (0) = 0. Based on Assumption 1, we know that, as long as x ∈ C, then the transformation system state s is bounded, that is, T i (s) is bounded.We can derive where λ ζ represents the upper bound of T i (s).Based on the assumptions about the system (1), we can obtain where Based on the property of the barrier function, we can deduce that s 1 and s 2 are bounded as long as where k L3 is the Lipschitz constant of F(s).Based on the Lipschitz condition [38], F(s) is Lipschitz continuous.Based on the boundedness of T i (s) and the assumptions about system (1), we can obtain that every term in F i (s) is bounded with x ∈ C. Therefore, we can say that F(s) is also bounded, and there is a constant λ f such that F(s) ≤ λ f s .
(2) Based on the boundedness of T i (s) and Equation (8), we can obtain that Given the initial system state x 0 , the initial state of transformed system (9) can be obtained from Equation (6), which proves the zero state observability of transformed system (9).This completes the proof.
Based on the transformation system, the nominal system of ( 9) can be defined as The performance index function related to the design of u 1 can be defined as where The performance index function related to the design of u 2 is defined as where ) is the nonquadratic ) is the nonquadratic penalty function of u 2 , and hold for any admissible control policies u 1 and u 2 .
Based on the performance index function ( 15) and ( 16), the Hamilton functions associated with the control input u 1 and u 2 are defined as We define the optimal performance index functions of u 1 , u 2 as Considering the nominal system ( 14) and the Formulas ( 15) and ( 16), the constrained optimal control strategys u * 1 and u * 2 can be obtained according to the stationarity condition of optimization: where V * 1 (s) and V * 2 (s) are obtained by solving the following coupled HJB equations: 0) = 0, and there exist two bounded functions Γ 1 (s), Γ 2 (s) satisfying Γ 1 (s) ≥ 0, Γ 2 (s) ≥ 0, and two control laws u 1 , u 2 , such that where Then, the transformation system (9) can achieve asymptotic stability under the control laws u 1 and u 2 .
Proof of Lemma 1.We can use the chain rule to obtain According to Formula (26), we can obtain V1 (s(t)) < 0 for any s = 0. We can derive that V 1 (•) is a Lyapunov function for the transformation system (9), which proves that the transformation system can be asymptotic stability.As long as V 1 (•) satisfies the condition of Formula (26), it is concluded that the control law u 1 can realize the asymptotic stability of the transformation system.Similarly, we can prove that the control law u 2 can realize the asymptotic stability of the transformation system.Lemma 2. Under Assumption 1, if the constrained optimal control problem of the transformation system (9) can be solved by the constrained optimal control laws u 1 , u 2 , then the system (1) satisfies the time-varying safety constraints (ζ a (t), ζ A (t)) provided that the initial state x 0 of the system (1) satisfies time-varying safety constraints.
Proof of Lemma 2. Based on Lemma 1, one can obtain V1 (s(t)) ≤ 0 and V2 (s(t)) ≤ 0, such that According to the properties of the barrier function in Assumption 1, we can derive that the performance index functions V 1 (s(0)) and V 2 (s(0)) are finite when the initial value x 0 of the safety-critical system (1) satisfies the time-varying safety constraints (ζ a (t), ζ A (t)), and V 1 (•), V 2 (•) satisfies the condition of Formula (26).That is, the performance index functions V 1 (s(t)) and V 2 (s(t)) are finite.Therefore, based on Assumption 1, we obtain This proof is completed.
According to Lemmas 1 and 2, the constrained optimal control laws ( 22) and ( 23) can make the safety-critical system (1) with the uncertain disturbances and time-varying safety constraints asymptotically stable based on the proposed barrier transformation and disturbance-related term.Based on ( 22) and ( 23), we only need to use the proposed coupled HJB Equations ( 24) and ( 25) to obtain the optimal performance index function, and then obtain the constrained optimal control solution.However, Equations ( 24) and ( 25) are often difficult or impossible to solve due to their inherently nonlinear nature.In view of this problem, an approximate structure based on NN is proposed to learn the solutions of the coupled HJB equations online.

Approximate Optimal Solution of Coupled Hamilton-Jacobi-Bellman Equations
In this section, an online approximation method is proposed by constructing a single critic network.Based on the universal approximation property of NN, the optimal performance index functions (20) and (21) and their partial derivatives can be approximated as follows: where the neural network activation function, ∇φ j (s) represents the partial derivative of φ j (s), L represents the number of hidden layer neurons, ε j (s) represents the NN approximation error, and ∇ε j (s) represents the partial derivative of ε j (s).
Assumption 2. It is assumed that the ideal weights W j are limited to constants, i.e., W j ≤ λ W j , and the neural network approximation residuals satisfy ε j ≤ λ ε j , ∇ε j ≤ λ dε j , and the neural network activation functions satisfy φ j ≤ λ φ j , ∇φ j ≤ λ dφ j .
Based on Formula (30), the Bellman approximation errors of the neural network approximation can be expressed as Remark 3. The Bellman approximation errors ε B1 and ε B2 will be equal to 0 with the number of hidden neurons L → ∞.When the number of L is a constant, the Bellman approximation errors is bounded, i.e., ε Bj (s) < ε Bjh .In the later proof, we will consider the influence of Bellman approximation errors ε B1 and ε B2 .
Since the ideal weights W * 1 and W * 2 are unknown, we use the estimates of ideal weights to construct the critic neural network: According to Formulas ( 22), ( 23) and ( 32), the approximate optimal control strategys are Substituting ( 32)-( 34) into ( 18) and ( 19), the approximate Hamiltonian function can be obtained The estimates of ideal weights need to be adjusted so that Ŵ1 and Ŵ2 can minimize the squared residual error E = e T 1 e 1 /2 + e T 2 e 2 /2.In general, the online adaptive learning algorithm usually requires a persistence excitation (PE) condition to achieve convergence.In order to satisfy this condition, we redefine the residual squared error as , where e 1l , e 2l represent the past data with t l < t.We choose the normalized gradient descent algorithm as the tuning laws of the estimates to minimize the residual squared error, where α 1 > 0 and α 2 > 0 are learning rates that determine the convergence speed of the estimate, σ ) are all obtained by storing the past data.
The weight estimation errors W1 and W2 can be defined as Based on ( 37)-( 39), we have Combined with the previous content, the proposed multi-input safety-critical system structure diagram is shown in Figure 1.
Proof of Theorem 2. See the Appendix A.

Remark 4.
According to the result of Theorem 2, we can obtain that the neural network weight errors are UUB.According to formulas (33), (34), and (39), we can easily derive that, as That is, the control strategy can be approximately optimal.
Remark 5. Compared with [35], this work considers a more complex and interesting constrained control problem, that is, the safety constraints change with time.In addition, we establish the coupled HJB equation to obtain the constrained optimal solution, so that the system state can complete convergence under the condition that the time-varying constraints are satisfied.Remark 6.In [34,36], the safety optimal control problem with external disturbance is considered, and the control scheme based on barrier transformation is designed.However, all of the external disturbances mentioned are known.In this work, the safety control problem with uncertain disturbance is further studied, and it is proved that the system state can complete convergence under the proposed control strategy.

Simulation
To prove the effectiveness of the proposed method, we give two nonlinear examples with time-varying safety constraints.In both cases, we observe that the system can satisfy the time-varying safety constraints.
We define the activation functions as Meanwhile, the critic weight parameters are denoted as The critic parameters after 100 s converge to the value of Ŵ1 = [−0.392It is obtained from Figure 2 that the method of using constant constraints can satisfy constant constraints (−1, 2.2), (−2.8, 3) in the process of system state convergence, but can not satisfy the time-varying constraints (ζ a1 , ζ A1 ), (ζ a2 , ζ A2 ).It can be seen that the trajectory of system state x obtained by the proposed method can converge to zero under the condition that time-varying safety constraints are satisfied.Figure 3 gives the evolution of the critic parameters for player 1.The evolution of the critic parameters for player 2 is shown in Figure 4.It can be seen that, according to the proposed tuning laws (37) and (38), the critic weight parameters converge to their ideal values.Figure 5 shows the state trajectories of the transformation system (9).

Nonlinear System Example 2
Consider the following nonlinear system of a single link robot arm: In addition, x = [x 1 , x 2 ] T is the system state.One selects ], and δ(x) = x 1 sin x 2 .In this example, we apply the more complex time-varying safety constraints to the system state, where the constraints on the upper bounds of x 1 , x 2 vary at 3 and 8 s, respectively, and the constraints on the lower bounds of x 1 and x 2 vary at 3 and 10 s, respectively.Define λ 1 = 3, λ 2 = 18 as the boundaries of the control inputs.Before 75 s, the persistence excitation condition is ensured by the probing noise.
We define the activation function as Meanwhile, we denoted the critic weight parameters as The critic parameters after 100 s converge to the value of Ŵ1 = [−1.319 0.249 −0.023], Ŵ2 = [0.250−1.113 0.658].
In Example 2, we further consider the case of input constraints.Figure 6 shows that the method using constant constraints cannot satisfy the time-varying safety constraints (ζ a1 , ζ A1 ), (ζ a2 , ζ A2 ) in the process of system state convergence, while the proposed method can ensure that the system state x converges under the time-varying safety constraints.The constrained control inputs are shown in Figure 7.The evolution of the critic parameters is given in Figures 8 and 9.The transformation system state trajectories are shown in Figure 10.

Conclusions
For the affine nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints, a new adaptive learning algorithm based on the coupled HJB equations was proposed to solve the constrained optimal control problem.In order to satisfy the time-varying safety constraints, the novel barrier function and smooth safety boundary function were used to transform the safety-critical system into the transformation system without the time-varying safety constraints.The proposed barrier function solves the time-varying safety constraint problem which cannot be solved by the traditional constant constraint method.The influence of uncertain disturbances on the transformation system was dealt with reasonably by establishing the nominal system and disturbance-related term.In addition, two critic neural networks were used to learn the optimal solutions of the coupled HJB equations.The effectiveness of this method was verified by the theoretical proof.In addition, we test both the nonlinear system of the robotic arm and the numerical nonlinear example.Simulation results also verify the effectiveness of the proposed method.
This completes the proof.

Figure 1 .
Figure 1.The structure diagram of the proposed multi-input safety-critical system.

Figure 2 .
Figure2.Evolution of the state x(t) by using the presented method and the method in[35].

13 Figure 3 .
Figure 3. Evolution of the critic estimates for player 1.

Figure 4 .
Figure 4. Evolution of the critic estimates for player 2.

2 Figure 5 .
Figure 5. Transformed system states using the presented method.

Figure 7 .
Figure 7. Constrained control inputs of player 1 and player 2.

13 Figure 8 .
Figure 8. Evolution of the critic estimates for player 1.

Figure 9 .
Figure 9. Evolution of the critic estimates for player 2.

2 Figure 10 .
Figure 10.Transformed system states using the presented method.
[35]ution of the state x(t) by using the presented method and the method in[35].