Next Article in Journal
Specular Reflection Suppression through the Adjustment of Linear Polarization for Tumor Diagnosis Using Fluorescein Sodium
Previous Article in Journal
An Ultrasonic Tomography System for the Inspection of Columns in Architectural Heritage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Nonlinear Finite-Time Robust Differential Game Guidance Law

School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(17), 6650; https://doi.org/10.3390/s22176650
Submission received: 3 July 2022 / Revised: 6 August 2022 / Accepted: 16 August 2022 / Published: 2 September 2022
(This article belongs to the Section Navigation and Positioning)

Abstract

:
In this paper, a robust differential game guidance law is proposed for the nonlinear zero-sum system with unknown dynamics and external disturbances. First, the continuous-time nonlinear zero-sum differential game problem is transformed into solving the nonlinear Hamilton–Jacobi–Isaacs equation, a time-varying cost function is developed to reflect the fixed terminal time, and the robust guidance law is developed to compensate for the external disturbance. Then, a novel neural network identifier is designed to approximate the unknown nonlinear dynamics with online weight tuning. Subsequently, an online critic neural network approximator is presented to estimate the cost function, and time-varying activation functions are considered to deal with the fixed final time problem. An adaptive weight tuning law is given, where two additional terms are added to ensure the stability of the closed-loop nonlinear system and so as to meet the terminal cost at a fixed final time. Furthermore, the uniform ultimate boundedness of the closed-loop system and the critic neural network weights estimation error are proven based upon the Lyapunov approach. Finally, some simulation results are presented to demonstrate the effectiveness of the proposed robust differential game guidance law for nonlinear interception.

1. Introduction

In the modern battlefield, when the missile executes the interception mission, the fuel consumption, rapid attitude maneuver, and other uncertain factors may lead to the nonlinear dynamic characteristics of the system. Moreover, external disturbances, which may cause the system to lose control, are important factors in practice applications. Furthermore, the missile intercepts maneuvering targets with a fixed terminal time, which is a more effective attack. Therefore, it is of great significance to study the finite-time robust differential game law in nonlinear circumstances with external disturbance.
Traditional interception guidance strategies include proportional navigation guidance (PNG) [1] and augmented proportional navigation guidance (APNG) [2,3]. In these works, the interception problem was simplified into a two-dimensional plane and the target aircrafts were assumed to be non-maneuvering targets. With the development of the modern control theory, optimal guidance laws (OGLs), sliding mode control (SMC) guidance laws, and linear quadratic differential game (LQDG) guidance laws are investigated to solve the missile interception problem. In [4], first-order interceptor dynamics were taken into account, and an optimal guidance law (OGL) was proposed to control the impact time and impact angle. In [5], a combination of a line-of-sight (LOS) rate shaping technique and a second-order SMC was proposed. However, the target position was assumed to be known in advance. In [6], three-party pursuit and evasion guidance strategies were derived from a linear dynamic system, analytical solutions were derived via the Game Algebraic Riccati equation (GARE) and LQDG guidance laws were proposed. However, it is almost impossible, or even impossible, to acquire the analytic solution, due to the inherent nonlinearity. To circumvent the inherent nonlinearity of the system, in [7], an SDRE guidance law was proposed by solving the nonlinear Hamilton–Jacobi–Isaacs (HJI) equation. However, the external disturbance was neglected and it was difficult for the analytical solution to find its Nash equilibrium. Recently, an artificial neural network [8] has been applied to solve the nonlinear HJI equation for an unknown system. However, the finite-horizon optimal guidance law still remains an unsolved problem.
To solve the GARE, adaptive dynamic programming (ADP) techniques serve as powerful tools to solve the optimal control problems. In [9], a novel online mode-free reinforcement learning algorithm was proposed to solve the multiplayer non-zero-sum games, and the algebraic Riccati equation (ARE) was solved by an iterative algorithm. However, external disturbances were not considered. Unfortunately, in practice applications, external disturbances always exist. In [10], the author proposed a data-driven value iteration (VI) algorithm to handle the adaptive CT linear optimal output regulation. The optimal feedback control gain was learned by an online value iteration algorithm. However, the iterative algorithm requires a significant number of iterations within a fixed sampling interval to guarantee the stability of the system. In the above references, no matter what the mode-free integral RL algorithm or the data-driven value iteration algorithm, all the proposed methods were based on a solvable ARE. However, in most practice applications, external disturbances and parameter uncertainties always exist, which the ARE cannot obtain, in addition to the optimal controller.
For other contemporary approaches related to ADP, one can refer to refs. [11,12]. However, these are not applicable to practice applications with unknown or inaccurate system dynamics. The neuro-dynamic programming (NDP) technique is a typical method used to solve the nonlinear Hamilton–Jacobi–Isaacs (HJI) equation. In [13], the policy or value iteration-based NDP scheme was proposed to obtain the finite-horizon ε -optimal control for the discrete-time nonlinear system by using offline neural network (NN) training. In [14], an online algorithm based on policy iteration was proposed to attain the synchronous optimal policy with infinite horizon time for nonlinear systems with known dynamics. However, inadequate iterations within a sampling interval may lead to instability. To avoid this problem, in [15], a time-based NDP method was studied, and iteration-based optimal solutions were replaced by using the previous history of system states and cost function approximations. However, this NDP scheme was not suitable for finite-time nonlinear control; furthermore, external disturbances were not considered and only infinite-time optimal control was studied. In [16], an online concurrent learning NDP algorithm was presented to solve the two-player zero-sum game of nonlinear CT systems with unknown system dynamics and three NN approximators were tuned to learn about the value function corresponding to the optimal control strategies. However, two or more NNs led to an increase in computational complexity.
Although the above research has solved the HJB problem to a certain extent, unfortunately, there are few kinds of literature involved in the missile–target interception problem. In [17], a novel sliding mode adaptive neural network guidance law was proposed to intercept highly maneuvering targets. Aimed at the external disturbance caused by the target maneuvering, the RBF neural network was adopted to eliminate estimation errors without prior information about the target. Similarly, in [18], an adaptive NN-based scheme with an estimation cost function was proposed for solving the interception problem of the spacecraft with limited communication and external disturbances, but it did consider the target maneuverability. However, in these studies, only the missile interception strategy is considered, and the target control strategy is ignored.
In this paper, the nonlinear system of missile–target engagement with external disturbances is considered, and a time-varying cost function is designed for satisfying the terminal interception time. The uncertain nonlinear two-player zero-sum game is developed via the HJI equation based on differential game theory. The main contributions of this paper include two aspects. First, for solving the external disturbances problem, unlike the work [14], an extended robust interception guidance strategy for the missile to intercept the maneuvering target within a fixed final time is proposed. Second, two novel NNs are designed, one online NN identifier is designed to approximate the unknown nonlinear system, and the other critic NN is developed to approximate the cost function without policy or value iterations, while online learning is adopted. Finally, the nonlinear finite-time robust differential game guidance law is proposed.
The advantages of the proposed method of this paper are listed as follows:
(1)
The nonlinear guidance law for missile–target engagement within a fixed interception time is studied by using the differential game theory based on neuro-dynamic programming (NDP). More importantly, a time-varying cost function is reconstructed by a critic NDP with two additional terms added to ensure the stability of the nonlinear system and to meet the fixed interception time, which implies that the missile can intercept the target at different terminal times.
(2)
In practical applications, there are always external disturbances, and a robust interception guidance law is proposed to deal with this problem. Furthermore, inspired by the work [19], our proposed method extends the controller by considering the target.
(3)
Unlike the discrete system, our proposed method is the CT. Moreover, compared with the existing work [15,16], a clear advantage of our method is that a simpler critic NN structure is designed; thus, the computational burden is alleviated.
The remainder of this paper is organized as follows. The statement of the normal guidance problem is presented in part II. The robust control strategy of the nonlinear system with external disturbances is developed in part III. In part IV, an online NN identifier and a novel NDP-based approximator are presented. The stability of the nonlinear system is proved in part V. The nonlinear model of the missile–target engagement is established and numerical experiments are carried out to evaluate the performance of the proposed robust differential game guidance strategy in part VI. Part VII presents some conclusions.

2. Problem Formulation

In this paper, the two players are the missile and the target, which are described in detail in part VI. For the finite-time nonlinear two-player zero-sum differential game, the object of the missile input u ¯ ( t ) is to minimize the cost function, while the target input w ¯ ( t ) is to maximize the cost function in a specified time. The continuous-time (CT) uncertain nonlinear two-player zero-sum differential game is now presented as
x ˙ ( t ) = f ( x ( t ) ) + g ( x ( t ) ) ( u ¯ ( t ) + d ¯ ( x ) ) + k ( x ( t ) ) ( w ¯ ( t ) + p ¯ ( x ) )
where x ( t ) R n , u ¯ ( t ) R m , and w ¯ ( t ) R q represent the system state vector, the control input of the missile and the target, respectively. f ( x ) R n denotes the internal system dynamics, g ( x ) R n × m   ,   and   k ( x ) R n × q are the control coefficient matrices, with f ( x ) , g ( x )   and   k ( x ) are locally Lipschitz, respectively. d ¯ ( x )   and   p ¯ ( x ) represent external disturbances, which are both bounded by known functions d M ( x ) and p M ( x ) , i.e., d ¯ ( x ) d M ( x ) and p ¯ ( x ) p M ( x ) . Furthermore, we assume that d ( x ) = R 1 1 2 d ¯ ( x ) and p ( x ) = γ d ¯ ( x ) , with R 1 and γ are symmetric positive definite matrices, respectively.
The nominal system (without external disturbances) of the system (1) can be described as
x ˙ ( t ) = f ( x ( t ) ) + g ( x ( t ) ) u ( t ) + k ( x ( t ) ) w ( t )
We assume that f + g u + k w is Lipschitz continuous on a set Ω and the system (2) is controllable.
Considering external disturbances in the nonlinear system (1), for the nominal system (2), the finite-time two-player zero-sum differential game cost function is defined as
V ( x , t 0 ) = φ ( x ( t f ) , t f ) + t 0 t f r ( x ( t ) , u ( t ) , w ( t ) ) d t V ( x , t f ) = φ ( x ( t f ) , t f )
where r ( x ( t ) , u ( t ) , w ( t ) ) = Q ( x ) + u T R 1 u 1 2 γ 2 w T w , with Q ( x ) = d M 2 ( x ) + R 2 p M 2 ( x ) + x T Q 1 x ; Q 1 is a semi-positive function. The terminal cost, external disturbances, control efforts of the missile and the target, the system state, and fixed terminal time are chosen as the performance evaluation indicators. φ ( x ( t f ) , t f ) reflects the terminal cost between the missile with the target. The term Q ( x ) reflects external disturbances and the system state simultaneously, which is positively defined. Moreover, R 2 represents the influence of the target disturbance. R 1 reflects the missile control effort. γ reflects the target control effort. All parameters Q ( x ) , R 1 ,   R 2 , and γ are positively defined. The goal of this paper is to find the saddle point of the cost function (3).
Remark 1.
First, unlike the infinite-time scenario, the terminal cost V ( x , t 0 ) is a time-varying function. φ ( x ( t f ) , t f ) is needed to guarantee the finite-time scenario. Second, external disturbances are considered in the cost function via adopting a positive constant R 2 and the robust control problem can be addressed by designing the finite-time guidance strategy of the nominal system (2).
Assuming that V ( x , t ) C 1 , an infinitesimal equivalent to (3) can be derived as
V ( x , t ) t = r ( x , u , d ) + V T ( x , t ) x ( f ( x ) + g ( x ) u + k ( x ) w )
when t 0 = t f , the terminal cost function can be expressed as
V ( x , t f ) = Ψ ( x ( t f ) , t f )
The Hamiltonian function of the nonlinear system (2) can be defined as
H ( x , u , v ) = V t + r ( x , u , v ) + V x T ( f ( x ) + g ( x ) u ( t ) + k ( x ) w ( t ) )
where V t = V ( x , t ) t and V x = V ( x , t ) x . It can be clearly observed that the Hamiltonian function includes a time-dependent term V t .
In the Nash equilibrium theory, the saddle point with respect to the optimal control pair ( u ( t ) , w ( t ) ) can be obtained by
H ( x , u * , w ) H ( x , u * , w * ) H ( x , u , w * )
If one recalls the classical optimal game theory, both optimal controllers can be solved by using stationary conditions H ( · ) u = 0 and H ( · ) w = 0 , which yields
{ u * ( x ) = 1 2 R 1 1 g T ( x ) V x * w * ( x ) = γ 2 k T ( x )   V x *
where V * ( x , t ) is the optimal two-player zero-sum game cost function, which is the saddle point of the cost function, such that
V * ( x , t ) = max w ( t ) min u ( t ) V ( x , t ) V * ( x , t f ) = φ ( x ( t f ) , t f )
By substituting optimal strategy (8) into Equation (4), the Hamilton–Jacobi–Isaacs (HJI) equation reduces to
V t * + Q ( x ) + V x * T f ( x ) 1 4 V x * T g ( x ) R 1 1 g T ( x ) V x * + γ 2 2 V x * T k ( x ) k T ( x ) V x * = 0
Remark 2.
For the linear system case, the HJB equation can be easily solved by the Riccati equation [19]. However, it is difficult or even impossible to attain the mathematical solution of the HJI Equation (10), when system dynamics exist in nonlinear terms. Moreover, the fixed final time t f is provided in this nonlinear system for solving the inadequate iterations problem, a novel time-based online optimal guidance law design is proposed and the system dynamics are demonstrated.

3. The Nonlinear Finite-Time Robust Differential Game Guidance Law

In this part, the nonlinear finite-time robust differential game guidance law is presented. First, for coping with the robust stabilization problem of the system (1) with external disturbances, a robust controller is designed. Then, the finite-time NDP-based optimal guidance strategy is designed.

3.1. Robust Controller Design of Uncertain Nonlinear Differentia Games

By extending the work [20], two feedback gains π 1   and 1 2 π 2 are added to the optimal feedback control (8) of the system (2) for the missile and the target, respectively. The robust optimal feedback control yields as follows:
{ u ¯ * ( x ) = π 1 u * ( x ) = 1 2 π 1 R 1 1 g T ( x ) V x *       w ¯ * ( x ) = 1 2 π 2 w * ( x ) = 1 2 π 2 γ 2 k T ( x )   V x *
Here, some lemmas are presented for indicating that the robust optimal control has an infinite gain margin.
Lemma 1.
For the nominal system (2), the optimal control strategy given by (11) can ensure that the closed-loop system is asymptotically stable for π 1 1 2 and π 2 1 .
Proof of Lemma 1.
The optimal cost function V * ( x , t ) = J * ( x , t ) is selected as the Lyapunov function. In light of (3), it is easily found that V * ( x , t )   is positive definite. By combining (10) and (11), the derivative of the V * ( x , t ) along the trajectory of the closed-loop system yields
J ˙ * ( x , t ) = V t * + V x * T ( f ( x ) + g ( x ) u ¯ ( t ) + k ( x ) w ¯ ( t ) ) = V t * Q ( x ) V t * + 1 4 V x * T g ( x ) R 1 1 g T ( x ) V x * γ 2 2 V x * T k ( x ) k T ( x ) V x * 1 2 π 1 g ( x ) R 1 1 g T ( x ) V x * + 1 2 π 2 γ 2 k ( x ) k T ( x )   V x *                   = Q ( x ) 1 2 ( π 1 1 2 ) R 1 1 2 g T ( x ) V x * 2 1 2 ( 1 1 π 2 ) γ k T ( x ) V x * 2
Hence, whenever π 1 1 2 , π 2 1 and x 0 ,   J ˙ * ( x , t ) < 0 . □
Theorem 1.
For the system (1), there exists two positive gains, π 1 * 1 and π 2 * 2 R 2 2 R 2 + 1 , with R 2 > 1 , such that for any π 1 > π 1 * and π 2 > π 2 * , the robust control (11) ensures that the closed-loop system (1) is asymptotically stable.
Proof of Theorem 1.
The optimal cost function L ( x , t ) = V * ( x , t ) is selected as the Lyapunov function, and the derivative of the V * ( x , t ) along the trajectory of the closed-loop system can be obtained as
L ( x , t ) = V t * + V x * T ( f ( x ) + g ( x ) ( u ¯ ( t ) + d ¯ ( x ) ) + k ( x ) ( w ¯ ( t ) + p ¯ ( x ) ) )
Based on (12), (13) can be rewritten as
L ˙ ( x , t ) = Q ( x ) 1 2 ( π 1 1 2 ) V x * T g ( x ) R 1 1 2 2 1 2 ( 1 1 π 2 ) V x * T g ( x ) γ 2 + V x * T g ( x ) d ¯ ( x ) + V x * T k ( x ) p ¯ ( x ) x T Q 1 x ( 1 2 ( π 1 1 2 ) V x * T g ( x ) R 1 1 2 2 + 1 2 ( 1 1 π 2 ) V x * T g ( x ) γ 2 V x * T g ( x ) R 1 1 2 d M ( x ) V x * T k ( x ) γ d M ( x ) + d M 2 ( x ) + R 2 d M 2 ( x ) )
If z = [ d M ( x ) ,   p M ( x ) , V x * T g ( x ) R 1 1 2 ,   V x * T k ( x ) γ ] T ; thus, (14) can be rewritten as
L ˙ ( x , t ) x T Q 1 x z T Θ z
where Θ = [ 1 0 1 2 0 0 R 2 0 1 2 1 2 0 1 2 ( π 1 1 2 ) 0 0 1 2 0 1 2 ( 1 1 π 2 ) ] .
According to the Lyapunov function, the determinant Θ 0 represents the L ˙ ( x , t ) 0 and implies that the closed-loop system is asymptotically stable. Thus, it can be concluded that π 1 * 1 and π 2 * 2 R 2 2 R 2 + 1 can ensure the positive definiteness of Θ . When π 1 > π 1 * and π 2 > π 2 * , the closed-loop system is asymptotically stable. □

3.2. Finite-Time NDP-Based Optimal Guidance Strategy

First, a novel online NN identifier is proposed to approximate the unknown system dynamics. Next, a critical NDP-based approximator is utilized to estimate the cost function within a fixed final time and an online adaptive weight tuning law is proposed with additional terms to guarantee the stability of the nonlinear system. Finally, combining the identified system and estimated cost function, the finite-horizon optimal differential guidance strategy is derived.

3.2.1. NN Identifier

System dynamics are necessary for developing guidance laws of the nonlinear two-player zero-sum differential games. However, system dynamics may be unknown in practice applications. To overcome this problem, a novel online NN identifier is designed. Based on the NN universal function approximation property, the nonlinear system can be represented as
f ( x ) = W f T σ f ( x ) + ε f g ( x ) = W g T σ g ( x ) + ε g k ( x ) = W k T σ k ( x ) + ε k
where W f R f × N , W g R g × N , and W k R k × N are ideal weight matrices. σ f ( x ) R N ,   σ g ( x ) R N , and σ k ( x ) R N denote NN activation function vectors, N is the number of hidden layer neurons, and ε f ,   ε g , and ε k represent NN approximation errors.
Then, the nominal system (2) can be represented by using (16) as
x ˙ = f ( x ) + g ( x ) u + k ( x ) w = [ W f W g W k ] T [ σ f ( x ) 0 0 0 σ g ( x ) 0 0 0 σ k ( x ) ] [ 1 u w ] + ε f + ε g u + ε k w = W I T σ I ( x ) ξ ¯ + ε I
Because the ideal NN weights are typically unknown, we define the state estimator as follows:
x ^ ˙ = W ^ I T σ I ( x ) ξ ¯ + K x ˜
where W ^ I represents the estimate of the W I . x ˜ = x x ^ R N denotes the state estimation error. K is a design parameter, which can guarantee the stability of the NN identifier.
From (17) and (18), the derivative of the state estimation error yields
x ˜ ˙ = x ˙ x ^ ˙ = W ˜ I T σ I ( x ) ξ ¯ + ε I K x ˜
Inspired by [9], in order to make the approximated NN identifier weight matrix close to its ideal value, the online tuning law is given by
W ^ ˙ I ( t ) = α 1 W ^ I ( t ) + σ I ( x ) ξ ¯ x ˜
where α 1 is the learning rate of the critical NN.
Next, by defining W ˜ I = W I W ^ I , the identifier weight estimation error yields the following equation by using (20):
W ˜ I ˙ ( t ) = α 1 W ^ I ( t ) σ I ( x ) ξ ¯ x ˜
Theorem 2.
For the NN identifier (18), let the initial ideal critic NN weight W I reside in a compact set by selecting the proposed NN weight tuning law provided (20). There exists a positive tuning parameter ( α 1 > 0 ). Then, the identification (19) and the weight estimation errors W ˜ I ( t ) are uniformly ultimately bounded (UUB) with a fixed terminal time.
Proof of Theorem 2.
Let the Lyapunov function candidate be as follows:
J I = 1 2 x ˜ T x ˜ + 1 2 t r ( W ˜ I T W ˜ I )
Then,
J I ˙ = x ˜ T x ˜ ˙ + t r ( W ˜ I T W ˜ I ˙ ) 1 2 x ˜ 2 + 1 2 ε ˜ I 2 λ m i n ( K ) x ˜ 2 + α 1 t r ( W ˜ I T W ˜ I ) α 1 t r ( W ˜ I T W ˜ I ) ( λ m i n ( K ) 1 2 ) x ˜ 2 α 1 2 W ˜ I 2 + ε I M
where ε I M = α 1 2 W I 2 + 1 2 ε I 2 , the eigenvalue K and α 1 are the design parameters that guarantee the stability of the system. Therefore, when λ m i n ( K ) 1 2 , and α 1 0 , the following inequalities hold
x ˜ > ε I M ( λ m i n ( K ) 1 2 )   or   W ˜ I > 2 ε I M α 1
It can be observed from (24) that x ˜ bound can be decreased by increasing the eigenvalue K. Therefore, x ˜ is quantified by selecting the minimum eigenvalue λ m i n ( K ) . According to (19) and the relationship between x ˜ and W ˜ I , the smaller x ˜ will enforce the W ˜ I to converge into a small bound. Therefore, we have J I ˙ 0 and it can be concluded that x ˜ and W ˜ I are UUB.
This completes the proof. □

3.2.2. NDP-Based Guidance Strategy

In order to confront the nonlinear HJI function, according to the universal approximation property of the neural network, the optimal cost function can be reconstructed by a critic NDP on a compact set as
V ( x , t ) = W V T h ( x , t f t ) + ε V ( x , t )
where W V R V × N   i s the ideal weight matrix. h ( x , t f t ) R N denotes NN activation function vectors, N is the number of hidden layer neurons, and ε v represents the NN approximation error.
Thus, the terminal cost function can be expressed as
V ( x , t f ) = W V T h ( x ( t f ) , 0 ) + ε V ( x , t f )
Remark 3.
The critic NN activation function h ( x , t ) and its gradient h ( x , t ) are upper bounded, i.e., h ( x , t ) h M and h ( x , t ) h d M , with h M and h d M positive constants. The critic NN weight W is upper bounded, i.e., W V W V M , with W V M being a positive constant. The critic NN approximation error ε V ( x ) and its gradient ε V ( x ) are upper bounded, i.e., ε V ( x ) ε V M and ε V ( x ) ε d V M , with ε V M and ε d V M positive constants. Obviously, the activation function is a time-varying function, which can maintain the fixed final time.
Next, the partial derivation of V ( x , t ) with respect to x and t can be obtained, respectively.
V x = h x T ( x , t f t ) W V + x ε V ( x , t ) V t = h t T ( x , t f t ) W V + t ε V ( x , t )
where h x T ( x , t f t ) = h ( x , t f t ) / x , h t T ( x , t f t ) = h ( x , t f t ) / t , x ε V ( x , t ) = ε V ( x , t f t ) / x , and t ε V ( x , t ) = ε V ( x , t f t ) / t .
Therefore, by substituting (27) into (8), we then obtain the differential game guidance strategy as
{ u * ( x ) = 1 2 R 1 1 g T ( x ) V x * = 1 2 R 1 1 g T ( x ) ( h x T ( x , t f t ) W V + x ε V ( x , t ) ) w * ( x ) = γ 2 k T ( x )   V x * = γ 2 k T ( x ) ( h x T ( x , t f t ) W V + x ε V ( x , t ) )
By substituting (28) into (10), the HJI function can be rewritten as
H ( x , u , w ) = W V T h t T ( x , t f t ) + Q ( x ) + W V T h x ( x , t f t ) f ( x ) 1 4 W V T h x ( x , t f t ) D 1 ( x ) h x T ( x , t f t ) W V + γ 2 2 W V T h x ( x , t f t ) k ( x ) k T ( x ) h x T ( x , t f t ) W V + ε H J B ( x , t )
where D 1 ( x ) = g ( x ) R 1 1 g T ( x ) and
ε H J B ( x , t ) = t ε V ( x , t ) + 1 2 W V T h ( x , t f t ) x ε V ( x , t ) D 1 ( x ) x ε V ( x , t ) γ 2 W V T h ( x , t f t ) x ε V ( x , t ) D 1 ( x ) x ε V ( x , t ) 1 2 x ε V T ( x , t ) D 1 ( x ) h x T ( x , t f t ) W V 1 4 x ε V T ( x , t ) D 1 ( x ) x ε V ( x , t ) + γ 2 x ε V T ( x , t ) k ( x ) k T ( x ) h x T ( x , t f t ) W V + γ 2 2 x ε V T ( x , t ) k ( x ) k T ( x ) x ε V ( x , t ) + 1 4 W V T h x ( x , t f t ) D 1 ( x ) x ε V ( x , t ) + x ε V T ( x , t ) f ( x ) γ 2 2 W V T h x ( x , t f t ) k ( x ) k T ( x ) x ε V ( x , t )
Because ideal NN weights are typically unknown, we define the estimated cost function V ^ ( x , t ) as follows
V ^ ( x , t ) = W ^ V T h ( x , t f t )
The estimated terminal cost function is
V ^ ( x , t f ) = W ^ V T h ( x ^ ( t f ) , 0 )
where W ^ V denotes the estimate of the W V and h ( x ^ ( t f ) , 0 ) is the activation function with the estimated terminal state x ^ ( t f ) .
Next, the partial derivation of the estimated cost function V ^ ( x , t ) with respect to x and t can be obtained, respectively.
V ^ x = h x T ( x , t f t ) W ^ V V ^ t = h t T ( x , t f t ) W ^ V
where V ^ t = V ( x , t ) t and V ^ x = V ( x , t ) x .
Then, by applying (33) to (8), the estimated differential game guidance strategy can be rewritten as
{ u ^ ( x ) = 1 2 R 1 1 g T ( x ) V ^ x = 1 2 R 1 1 g T ( x ) ( h x T ( x , t f t ) W ^ V w ^ ( x ) = γ 2 k T ( x ) V ^ x = γ 2 k T ( x ) h x T ( x , t f t ) W ^ V
By applying (34) to (10), the estimated HJB function yields
H ^ ( x , u , w ) = W ^ V T h t T ( x , t f t ) + W ^ V T h x ( x , t f t ) f ^ ( x ) 1 4 W ^ V T h x ( x , t f t ) D ^ 1 ( x ) h x T ( x , t f t ) W ^ V + γ 2 2 W ^ V T h x ( x , t f t ) k ^ ( x ) k ^ T ( x ) h x T ( x , t f t ) W ^ V = e c
In order to obtain the optimal differential game guidance strategy, we define the estimated terminal cost error as
e t f = Ψ ( x ( t f ) , 0 ) W ^ V T h ( x ^ ( t f ) , 0 ) = W V T h ˜ ( x ( t f ) , 0 ) + W ˜ V T h ( x ^ ( t f ) , 0 ) + ε t f
where h ˜ ( x ( t f ) , 0 ) = h ( x ( t f ) , 0 ) h ( x ^ ( t f ) , 0 ) .
Moreover, in order to guarantee the estimated NN weight W ^ V , which is approximated to the ideal NN weight W V , and after combining the time-varying nature of the cost function and the estimated terminal cost error, the total NN approximation error is defined as
E t o t a l = 1 2 e c T e c + 1 2 e t f 4
By using the gradient descent algorithm, a novel simplified weight tuning law is proposed with additional terms to ensure the stability of the nonlinear system (1), as follows:
W ^ ˙ V = α 2 ( ω ^ + 1 2 s n g ( ω ^ ) ) ( 1 + ω ^ T ω ^ ) 2 e c + α 3 ( ξ ^ + 1 2 s n g ( ξ ^ ) ) ( 1 + ξ ^ T ξ ^ ) 2 e t f 3 + α 4 2 h x T ( x , t f t ) ( D ^ 1 ( x ) 2 γ 2 k ^ T ( x ) k T ( x ) ) Q ( x , t )
where ω ^ = h t T ( x , t f t ) + h x ( x , t f t ) f ^ ( x ) + 1 2 h x ( x , t f t ) ( D ^ 1 ( x ) + k ^ T ( x ) k T ( x ) ) h x T ( x , t f t ) W ^ V , ξ ^ = h ( x ^ ( t f ) , t f )
Remark 4.
It is important to mention that the first term in (38) is used to minimize the squared residual error. The second term is used to minimize the terminal cost estimation error. The last term is used to guarantee that the system states remain bounded.

4. Stability Analysis

In order to prove the stability of the weight tuning law and the nonlinear system by choosing the optimal strategy, and without loss of generality, the weight estimation error of the critic NN is defined as W ˜ V = W V W ^ V ; then, we have W ˜ ˙ V = W ^ ˙ V . Therefore, the approximated HJI is
H ^ ( x , u , w ) = W ˜ V T h t T ( x , t f t ) W ˜ V T h x ( x , t f t ) f ^ ( x ) W V T h x ( x , t f t ) f ˜ ( x ) + 1 2 W ˜ V T h x ( x , t f t ) ( D ^ 1 ( x ) 2 γ 2 k ^ T ( x ) k T ( x ) ) h x T ( x , t f t ) W ^ V    + 1 4 W V T h x ( x , t f t ) ( D ˜ ( x ) 2 γ 2 k ˜ T ( x ) k ˜ ( x ) 4 γ 2 k ˜ T ( x ) k ^ ( x ) ) h x T ( x , t f t ) W V + ε H J B
Assumption 1.
For the nonlinear system (1) with the cost function (2) and the optimal guidance law (34), let the value function V ( x , t ) be the Lyapunov function and continuously differentiable, when
V ˙ ( x , t ) = V x ( x , t ) x ˙ + V t ( x , t ) + V x ( x , t ) ( f ( x ) + g ( x ) u * + k ( x ) w * ) < 0
Exists and it is clear that the inequation holds
V t ( x , t ) + V x ( x , t ) ( f ( x ) + g ( x ) u * + k ( x ) w * ) < Q ( x , t )
Theorem 3.
For the nonlinear system (1) with the ideal HJI Equation (9), let the updated law for the NN-based identifier and NDP-based cost function approximator be provided by (20) and (38), respectively and the estimated optimal guidance laws are given in (34). The existence of the positive constants B Q x , B W V , B x ˜ and B W I mean that the identification error, weight estimation error and the controller are UUB.
Proof of Theorem 3.
Select the Lyapunov candidate function as
J V = J a V + J b V + J c V + J d V
where J a V = α 2 4 ( x ˜ T Λ x ˜ ) 2 + α 3 4 ( t r ( W ˜ I T Λ W ˜ I ) ) 2 , J b V = 1 2 W ˜ V T Π W ˜ V , J c V = α 4 V ( x , t ) , J d V = 1 2 x ˜ T Ξ x ˜ + 1 2 t r ( W ˜ I T Λ W ˜ I ) .
It can be observed that J V > 0 .
First, the derivation of J a V with respect to time is given by
J ˙ a V Λ 2 α 2 ( x ˜ T x ˜ ) x ˜ T x ˜ ˙ + Λ 2 α 3 t r ( W ˜ I T W ˜ I ) t r ( W ˜ I T W ˜ I ˙ ) α 2 ( λ m i n ( K I ) 3 2 α 3 4 α 2 ) Λ 2 x ˜ 4 + 1 4 Λ ε I 2 u ˜ 2 α 3 2 ( α I 2 1 α 2 + α 3 4 α 3 ζ I 4 ) Λ 2 W ˜ I 4 + ε c M
where α 2 > 0 , α 3 > 0 and α I > 4 + α 2 + α 3 4 α 3 ζ I 4 , K I satisfies λ m i n ( K I ) > 3 2 + α 3 4 α 2 ,   ε c M = α 3 α I 4 Λ 2 W I , M 4 + Λ 2 α 2 ε I 2 4 .
The derivation of J b V is governed by
J ˙ b V = W ˜ V T Π W ˜ ˙ V Π ( W ˜ V T α 2 ω ^ ( 1 + ω ^ T ω ^ ) 2 e c α 3 W ˜ V T ξ ^ ( 1 + ξ ^ T ξ ^ ) 2 e t f α 4 2 W ˜ V T h x T ( x , t f t ) ( D ^ 1 ( x ) + k ^ ( x ) k T ( x ) ) Q ( x , t ) ) α 2 Π 8 W ˜ V T ω ^ ω ^ T W ˜ V ( 1 + ω ^ T ω ^ ) 2 + α 2 W V M 4 λ m a x 2 ( R 1 ) h x M 4 σ I , M 4 ( 1 + ω ^ T ω ^ ) 2 W ˜ I 4 α 4 Π 2 W ˜ V T h x T ( x , t f t ) ( D ^ 1 ( x ) 2 γ 2 k ^ ( x ) k T ( x ) ) Q ( x , t ) + ε V H
where
0 < α 2 3 α 2 ( 1 + ω ^ T ω ^ ) 2 ( λ m i n ( ξ ^ ξ ^ T ) + 1 2 ) λ m a x 2 ( h x ( x , t f t ) ) ( D ^ 1 ( x ) 2 γ 2 k ^ ( x ) k T ( x ) ) h x T ( x , t f t ) ( 1 + ξ ^ T ξ ^ ) 2
and ε V H = α 2 Π 2 ε H J B 2 ( 1 + ω ^ T ω ^ ) 2 + 3 ( W V T h ˜ ( x ( t f ) , 0 ) h ˜ T ( x ( t f ) , 0 ) W V + ε V 2 ( x , t f ) ) 2 + 15 2 ( α 3 ε V 2 ( x , t f ) ( 1 + ξ ^ T ξ ^ ) 2 + W V T h ˜ ( x ( t f ) , 0 ) h ˜ T ( x ( t f ) , 0 ) W V ( 1 + ξ ^ T ξ ^ ) 2 ) 2 + α 2 W V M 4 λ m a x 4 ( R 1 ) h x M 4 ε I H 4 2 ( ( 1 + ω ^ T ω ^ ) 2 ) . λ m i n ( R 1 ) and λ m a x ( R 1 ) is the minimum and the maximum eigenvalue of the matrix R 1 .
Next, we have
J ˙ c V = α 4 ( V t ( x , t ) + V x ( x , t ) ( f ( x ) + g ( x ) u ^ + k ( x ) w ^ ) ) = α 4 ( V t ( x , t ) + V x ( x , t ) ( f ( x ) 1 2 g ( x ) R 1 g ^ T h x ( x , t f t ) W ^ V + γ 2 k ( x ) k ^ T ( x ) h x ( x , t f t ) W ^ V )
Thus,
J V ˙ = J ˙ a V + J ˙ b V + J ˙ c V + J ˙ d V α 4 ( V t ( x , t ) + V x ( x , t ) ( f ( x ) 1 2 g ( x ) R 1 g ^ T ( x ) ) h x ( x , t f t ) W ^ V + γ 2 k ( x ) k ^ T ( x ) h x ( x , t f t ) W ^ V ) + α 2 W V M 4 λ m a x 4 ( R 1 ) h x M 4 σ I , M 4 16 ( 1 + ω ^ T ω ^ ) 2 Λ W ˜ I 4 α I 2 Ξ W ˜ I 2 + Ξ ε I M 1 2 ( α 2 ω ^ T ω ^ ( 1 + ω ^ T ω ^ ) 2 + α 3 ξ ^ T ξ ^ ( 1 + ξ ^ T ξ ^ ) 2 ) Π W ˜ V 2 α 4 2 W ˜ V T h x T ( x , t f t ) ( D ^ 1 ( x ) 2 γ 2 k ^ ( x ) k T ( x ) ) Q ( x , t ) + ε V H α 2 ( λ m i n ( K I 3 2 α 3 ζ I 4 8 α 2 ) Λ 2 x ˜ 4 + ε I M ) α 3 4 ( α I 3 2 α 2 α 3 ζ I 4 ) Λ 2 W ˜ I 4 ( λ m i n ( K I ) 1 2 ) Ξ x ˜ 2 α 4 5 Q ( x , t ) 2 5 α 4 ( W V M 2 h x M 2 + 1 ) W I M 2 λ m a x 2 ( R 1 ) σ I , M 4 16 W ˜ I 2 α 2 4 Λ ω ^ T ω ^ ( 1 + ω ^ T ω ^ ) 2 W ˜ V 2 α 2 8 Λ 1 ( 1 + ω ^ T ω ^ ) 2 W ˜ V 2 + ε T C
where
0 < α 4 min ( 5 λ m i n ( 1 + ω ^ T ω ^ ) + 5 2 32 h x M 6 Q m i n W V M 4 λ m a x 2 ( R 1 ) ( 1 + ω ^ T ω ^ ) 2 , 1 )
Ξ = 5 α 4 ( W V M 2 h x M 2 + 1 ) W I M 2 σ I , M 4 λ m a x 2 ( R 1 ) 4 α I I
Π = 5 8 α 2 W V M 4 λ m a x 2 ( R 1 ) h x M 4 I ε T C = ε W M + Ξ ε I M + ε c M + 5 α 4 4 ε x M 2
ε I M = ε f M + ε g M u M * + ε h M d M * . u M * and d M * are the upper bounds of the optimal guidance strategy u * and d * .
Therefore, when the following conditions hold, the first derivation of J V is less than zero.
Q ( x , t ) > 5 ε T C α 4 = B Q x   or   W ˜ V > 2 ε T C α 4 h x 2 = B W V   or   x ˜ > ε T C ( λ m i n ( K I 1 2 ) ) Ξ = B x ˜   or   W ˜ I > 16 ε T C 5 α 4 ( W V M 2 h x 2 + 1 ) W I M 2 λ m a x 2 ( R 1 ) σ I , M 2 = B W I
This completes the proof. □
Remark 5.
The eigenvalue K, α 1 , α 2 ,  α 3 , and α 4 are the tuning parameters for guaranteeing the lower bound of  B Q x , B W V ,  B x ˜ and B W I , which can quantify the bound of the system. In additions, from the proof, we can observe that the estimated optimal guidance laws given in (34) can ensure the system is UUB. Thus, by combining the robust optimal feedback control (11), the complete nonlinear finite-time robust differential game guidance law is provided.

5. Application

A missile–target engagement scenario is considered in this section. The engagement geometry of the missile–target is shown in Figure 1, where the X-Y plane represents the Cartesian reference frame. The variables V   and   A denote the speed and the normal acceleration of the missile and the target, respectively. α and β denote the flight angles of the missile and the target, respectively. The variables r and θ represent the missile–target distance and the line of sight (LOS) angle, and LOS angular rate is θ ˙ donated by σ . u and w are the control vectors perpendicular to the velocity of the missile, and the target.
The engagement occurs in the terminal guidance phase and all the participators are assumed to neglect the effect of gravity and have constant velocity. The nonlinear kinematics of the missile and the target, in a polar coordinate system, is given by
V r = r ˙ = V T c o s ( β θ ) V M c o s ( α θ )
σ = θ ˙ = ( V T s i n ( β θ ) V M s i n ( α θ ) ) / r
where V r represents the closing velocity.
The first-order dynamics of the missile are considered, and the motions of the missile are as follows:
x ˙ M = V M c o s α
y ˙ M = V M s i n α
α ˙ = a M V M
a ˙ M = u M a M τ M
where ( x M , y M ) is the position of the missile along with the Cartesian reference frame. a M is the lateral acceleration of the missile and τ M is a time constant.
Similarly, the motion equations of the target are as follows:
x ˙ T = V T c o s β
y ˙ T = V T s i n β
β ˙ = a T V T
a ˙ T = u T a T τ T
where ( x T , y T ) is the position of the target along with the Cartesian reference frame. a M is the lateral acceleration of the target and τ T is a time constant.
To obtain the capture zone, the guidance principle is adopted as follows.
Definition 2.
Zero effort miss distance (ZEMD) is the closest distance between the missile and the target at an instant t , while the missile and the target do not impose any control and the agents continue to perform their scheduled maneuver strategy from the current time to the endgame. The ZEMD is computed as
r m i s s ( t ) = r 2 σ V r 2 + r 2 σ 2
In this scenario, the missile applies its optimal strategy (34) to minimize the ZEMD, while the target applies its optimal strategy (34) to maximize the ZEMD. In addition, from (57), we can observe that if σ tends to be zero, the ZEMD also tends to be zero. Meanwhile, the closing V r is also less than zero. Therefore, to ensure that the missile successfully intercepts the target, the following two conditions hold, which will be verified in simulations:
σ 0 ,   V r < 0
Based on the two conditions mentioned above, by choosing the x = [ θ   σ ] T as the system state, differentiating the Equation (48) with respect to time, the nonlinear system can be given by
{ x 1 ˙ = x 2 x 2 ˙ = 2 V r r x 2 c o s ( β θ ) r u M + c o s ( α θ ) r w T
Remark 6.
From (59), it can be observed that when r is close to 0, the nonlinear terms f ( x ) ,   g ( x ) ,   a n d   k ( x ) tend to be infinite. It will mean that the system is broken and the proposed differential game guidance laws do not work. Thus, a minimized ZEMD ( r m i n ) should be designed, which means that when the displacement between the missile and the target is less or equal to the r m i n = 0.5   m , the missile completes the interception mission. Moreover, in the practice application, the missile has an actual killing radius; thus, this design is reasonable.
Remark 7.
From (59), it also can be found that when | ( β θ ) | = π / 2 and | ( α θ ) | = 0 , no matter what the proposed guidance laws change, the nonlinear system is an unstable equilibrium. Therefore, the domain where the differential game-based guidance laws are applicable is given by
φ = { x :   | ( β θ ) | π 2 , | ( α θ ) | 0 , r 0 , V r < 0 }

6. Simulation Results

In this section, some experiments are designed to verify the proposed finite-time robust differential game guidance strategy. To further explain the performance of the proposed differential game guidance strategy based on NDP, some simulation experiments are designed and analyzed. The initial engagement occurs in the terminal guidance phase. The speeds of the missile and the target are VM = 700 m/s, and VT = 400 m/s, respectively. The initial position of the target is at (0 m, 0 m), and the position of the missile is (2500 m, 0 m). The initial flight path angles of the missile and the target are α = 70 ° and β = 150 ° , respectively. The time constants are τ M = 0.1   and   τ T = 0.1 , respectively. The weight parameters R 1 = 5 , R 2 = 50 , γ = 10   and   t f = 7   s are set. Q x = 20 ( x 1 2 + x 2 2 + τ 2 ) 2 , where τ = t f t .
Furthermore, the initial parameters of the critic NN are selected inside w V [ 0   1 ] randomly and the critic NN identifier activation function is designed as σ I ( x ) = [ 1   x 1 2   x 1 x 2   x 2 2   x 1 3   x 1 2 x 2   x 1 x 2 2   x 2 3   x 1 4   x 1 3 x 2   x 1 2 x 2 2   x 2   x 2 3   x 2 4 ] T , which refers to [18,21]. In addition, polynomial basis functions can better approximate the nonlinear system. The initial parameters of the critic NN are w V = [ 10   10   10   10   10   10   10   10 ] T and the critic NN vector activation function for estimating the cost function is designed as
h ( x , t f t ) = [ x 1 2 exp ( τ )   x 2 2 exp ( τ )   x 1 x 2 τ   x 1 4 exp ( τ )   x 2 2 exp ( τ )   x 1 3 x 2   x 1 2 x 2   x 1 x 2 2 ] T .
The learning rates are α 1 = 0.01 ,   α 2 = 0.05 ,   α 3 = 0.55   and   ,   α 4 = 0.15 , respectively. The experiment environments are performed on a PC platform with i7-9750H CPU, using Matlab 2020b.

6.1. Effect of the Proposed Finite-Time Differential Game Guidance Strategy without Unknown Uncertainties

In this case, the performance of the proposed finite-time differential game guidance strategy is verified without unknown uncertainties and the missile and the target choose the optimal guidance strategy (34). More importantly, the ZEMD and the lateral acceleration of the agents are the performance evaluation indicators and we have rewritten the physical meaning. The performance of the ZEMD tends to be zero, indicating that the missile intercepts the target successfully. The performance of the lateral accelerations of the agents is limited in the range ±100 g and stable change indicates that the agents can be reasonably controlled. The simulation results for this engagement scenario are shown in Figure 2, Figure 3 and Figure 4.
Figure 2a shows the trajectories of the missile and the target by selecting the proposed finite-time differential game guidance strategy (34). Figure 3b presents the change in the relative distance between the missile and the target and it can be observed that the ZEMD is less than 0.5 m. By bombining (a) and (b), the results reveal that the missile intercepts the target successfully.
Both the angular rate and the relative velocity are presented in Figure 3. From Figure 3a, it can be observed that the state variable (the angular rate) bound is zero at about 4.5 s, which leads the ZEMD to be zero according to the Equation (57) and Figure 2b. σ tends to be zero, which can ensure that the missile intercepts the target. Furthermore, the negative V r guarantees the relative distance between the missile and the target tends to be zero in Figure 3b. Moreover, sharp changes between σ and V r are reasonable due to the system characteristics. Furthermore, it also verifies Definition 2, which mentioned that σ 0 ,   and   V r < 0 .
Figure 4 presents lateral acceleration curves of the target and the missile. It can be observed that both lateral accelerations are maintained within a reasonable range, which can ensure that the missile intercepts the target. However, due to the system characteristics, the lateral accelerations decrease sharply at the end of the engagement.
The convergence curves of critic NN weights are presented in Figure 5 and it can be observed that the critic NN weights finally converge to stable coefficients. The result reveal that critic NN weights can guarantee the stability of the closed-loop nonlinear system.

6.2. Engagement without Unknown Uncertainties for a Maneuvering Target

To further verify the effectiveness of the proposed guidance strategy dealing with the target executing other forms of target maneuvers, the following experiment is presented. In this experiment, the target is expected to perform a sin-wave maneuver with a magnitude of 10 g and the missile still selects the guidance law (34). The simulation results for this engagement scenario are shown in Figure 6, Figure 7 and Figure 8.
Figure 6a shows the engagement trajectories of the missile confronting the target, which is expected to perform a sin-wave maneuver by selecting the proposed guidance law (34). Figure 6b presents the change in the relative distance between the missile and the target and it can be observed that the final ZEMD is less than 0.5 m. The results reveal that the missile intercepts the maneuvering target successfully.
Curves of the angular rate and the range rate are presented in Figure 7. The results reveal the same conclusion as case 1.
Figure 8 presents the lateral acceleration curves of the target and the missile. It can be observed that both lateral accelerations are maintained within a reasonable range, which can ensure that the missile intercepts the target. Furthermore, compared with Figure 4a, due to the maneuvering target, higher acceleration demands of the missile are needed.

6.3. Engagement with Unknown Uncertainties for a Maneuvering Target

To further verify how the proposed finite-time robust guidance strategy deals with unknown uncertainties, the following experiment is presented. In this experiment, the missile selects the robust differential game guidance law (11), and the target is expected to perform a sin-wave maneuver with a magnitude of 10 g. Furthermore, external disturbances, with a uniform distribution between −0.2 and 0.2, are considered in both input vectors. The simulation results for this engagement scenario are shown in Figure 9, Figure 10 and Figure 11.
Figure 9a shows the trajectories of the missile confronting the target, which is expected to perform a sin-ware maneuver by selecting the proposed guidance law. Figure 9b presents the change in the relative distance between the missile and the target and it can be observed that the miss distance is less than 0.5 m. By combining (a) and (b), the results reveal that the missile intercepts the maneuvering target successfully.
Curves of the angular rate and the range rate are presented in Figure 10. The results reveal the same conclusion as case 1.
Figure 11 presents the acceleration demands and the control requirements of the target and the missile. It can be observed that the lateral accelerations are maintained within a reasonable range, which can that ensure the missile intercepts the target. Furthermore, when the missile experiences external disturbances, the proposed robust differential game guidance strategy can successfully intercept maneuvering targets with external disturbances.

6.4. Effect of the Proposed Robust Optimal Differential Game Guidance Strategy with Other Methods

To confirm the advantage of the proposed robust optimal differential game guidance law, we offer a comparative experiment. In this experiment, the target selects the proposed robust optimal differential game guidance strategy (11), while the missile chooses the proposed differential game guidance strategy (11), i.e., the OGL in [4], and the conventional differential game guidance law (CDGGL) in [18], respectively. To further illustrate the advantages of the proposed differential game guidance strategy, the control effort J is defined as J = 0 u T ( τ ) u ( τ ) d τ . The comparison results for this engagement scenario are shown in Figure 12.
The engagement trajectories and the control effort of the missile are presented in Figure 12a. It can be observed that the OGL, the CDGGL, and the proposed robust optimal differential game guidance strategy both can successfully intercept the target, with a miss distance equal to 0.985, 1.218, and 0.0024 m, respectively. Furthermore, the curves of the missile control effort are presented in Figure 12b. It can be shown that the minimal control effort is our proposed robust optimal differential game guidance strategy, the second control effort is the CDGGL in [18], and the maximum control effort is the OGL in [4]. All control efforts are limited to the setting range (100 g). More importantly, it also can be observed that before the missile intercepts the target, the control effect we proposed is always at its minimum at different times. For example, when t = 4.8 s, the control effort of our method is minimal. Thus, the missile uses our method to intercept the target, which saves more energy. Furthermore, the OGL in [4] is larger than the proposed robust optimal differential game guidance strategy, which means that the missile may not successfully intercept the target when using the OGL in some acceleration-limited scenes. In general, our proposed guidance law is superior to the OGL in [4] and the CDGGL in [18]. To further illustrate the superiority of the proposed robust optimal differential game guidance strategy, the energy consumption and the simulation time are compared in 100 average experiments. The comparison results are shown in Table 1.
From Table 1, it is easily found that less control effort is needed in our proposed method compared with the OGL in [4] and the CDGGL in [18], which implies that our proposed guidance law can reduce unnecessary energy consumption. The missile can intercept the target with minimal control. Moreover, the computation time of our proposed method is shorter compared with the OGL in [4] and the CDGGL in [18], which means that our proposed method can make the confrontation strategy in the shortest time. In general, our proposed guidance law can intercept targets with minimal control effort and a fast response.

7. Conclusions

In this paper, a finite-time robust differential game law for the nonlinear two-player zero-sum game is proposed for unknown system dynamics with external disturbance. The robustness and optimality of the guidance strategy are proven under a time-varying cost function. In order to solve the fixed terminal time constraint, the HJI equation is solved by a critic NN with time-varying activation functions, and extra weight tuning terms are introduced to guarantee the stability of the closed-loop system and the minimization of the HJI approximation error. The NN identifier estimates the nonlinear system with an online tuning law, which can be subsequently utilized in the finite-time guidance strategy design. The proposed scheme yields an online guidance strategy design that enjoys great practical benefits. Finally, in the missile–target engagement experiments, the missile with the proposed differential game guidance law successfully intercepts the target in different forms of maneuvers and all the results verify the theoretical claims.

Author Contributions

Conceptualization, A.X. and Y.C.; methodology, A.X.; software, A.X.; validation, A.X.; formal analysis, A.X.; investigation, A.X. and Y.C.; resources, Y.C.; data curation, A.X.; writing—original draft preparation, A.X.; writing—review and editing, A.X.; visualization, A.X.; supervision, Y.C.; project administration, Y.C.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China under grant (2018YFB1700100).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Guo, Y.; Li, X.; Zhang, H.; Cai, M.; He, F. Data-Driven Method for Impact Time Control Based on Proportional Navigation Guidance. J. Guid. Control Dyn. 2022, 43, 955–966. [Google Scholar] [CrossRef]
  2. Franzini, G.; Tardioli, L.; Pollini, L.; Innocenti, M. Visibility Augmented Proportional Navigation Guidance. J. Guid. Control Dyn. 2018, 41, 987–995. [Google Scholar] [CrossRef]
  3. Ghosh, S.; Ghose, D.; Raha, S. Capturability of Augmented Pure Proportional Navigation Guidance Against Time-Varying Target Maneuver. J. Guid. Control Dyn. 2014, 37, 780–787. [Google Scholar] [CrossRef]
  4. Chen, X.; Wang, J. Optimal-control based guidance law to control both impact time and impact angle. Aerosp. Sci. Technol. 2019, 84, 454–463. [Google Scholar] [CrossRef]
  5. Harl, N.; Balakrishnan, S.N. Impact Time and Angle Guidance with Sliding Mode Control. IEEE Trans. Control Syst. Technol. 2012, 20, 1436–1449. [Google Scholar] [CrossRef]
  6. Re, R.; Rp, R. Three-Party Differential Game Theory Applied to Missile Guidance Problem. In Differential Game Theory with Applications to Missiles and Autonomous Systems Guidance; John Wiley & Sons: Hoboken, NJ, USA, 2017; p. 102. [Google Scholar]
  7. Bardhan, R.; Ghose, D. Intercepting maneuvering target with specified impact angle by modified SDRE technique. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 4613–4618. [Google Scholar]
  8. Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinite control. Automatica 2007, 43, 473–481. [Google Scholar] [CrossRef]
  9. Xin, X.; Tu, Y.; Stojanovic, V.; Wang, H.; Shi, K.; He, S.; Pan, T. Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems. Appl. Math. Comput. 2022, 1, 126537. [Google Scholar] [CrossRef]
  10. Jiang, Y.; Gao, W.; Na, J.; Zhang, D.; Hämäläinen, T.T.; Stojanovic, V.; Lewis, F.L. Value iteration and adaptive optimal output regulation with assured convergence rate. Control Eng. Pract. 2022, 121, 105042. [Google Scholar] [CrossRef]
  11. Xu, Z.; Li, X.; Stojanovic, V. Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal. Hybrid Syst. 2021, 42, 101088. [Google Scholar] [CrossRef]
  12. Gao, W.; Jiang, Z.-P. Adaptive Optimal Output Regulation of Time-Delay Systems via Measurement Feedback. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 938–945. [Google Scholar] [CrossRef]
  13. Wang, F.Y.; Jin, N.; Liu, D.; Wei, Q.L. Adaptive dynamic programming for finite horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Trans. Neural Netw. 2010, 22, 24–36. [Google Scholar] [CrossRef]
  14. Vamvoudakis, K.G.; Lewis, F.L. Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration. Int. J. Robust Nonlinear Control 2012, 22, 1460–1483. [Google Scholar] [CrossRef]
  15. Dierks, T.; Jagannathan, S. Online Optimal Control of Affine Nonlinear Discrete-Time Systems with Unknown Internal Dynamics by Using Time-Based Policy Update. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1118–1129. [Google Scholar] [CrossRef] [PubMed]
  16. Yasini, S.; Sistani, M.; Karimpour, A. Approximate Dynamic Programming for Two-player Zero-sum Game Related to H∞ Control of Unknown Nonlinear Continuous-time Systems. Int. J. Control Autom. Syst. 2015, 13, 99–109. [Google Scholar] [CrossRef]
  17. Cheng, P.; Wang, H.; Stojanovic, V.; He, S.; Shi, K.; Luan, X.; Liu, F.; Sun, C. Asynchronous fault detection observer for 2-D Markov jump systems. IEEE Trans. Cybern. 2021, 1, 32021. [Google Scholar] [CrossRef] [PubMed]
  18. Sun, J.; Liu, C.; Ye, Q. Robust differential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming. Int. J. Control 2017, 5, 990–1004. [Google Scholar] [CrossRef]
  19. Xie, H.; Wu, B.; Liu, W. Adaptive Neural Network Model-based Event-triggered Attitude Tracking Control for Spacecraft. Int. J. Control Autom. Syst. 2021, 19, 172–185. [Google Scholar] [CrossRef]
  20. Wang, D.; Liu, D.; Li, H.; Ma, H. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 2014, 28, 167–179. [Google Scholar] [CrossRef]
  21. Xu, H. Finite-horizon near optimal design of nonlinear two-player zero-sum game in presence of completely unknown dynamics. J. Control Autom. Electr. Syst. 2015, 36, 361–370. [Google Scholar] [CrossRef]
Figure 1. Missile–target engagement geometry.
Figure 1. Missile–target engagement geometry.
Sensors 22 06650 g001
Figure 2. Engagement scenario for the proposed guidance strategy.
Figure 2. Engagement scenario for the proposed guidance strategy.
Sensors 22 06650 g002
Figure 3. Curves of the angular rate and the relative velocity for the proposed guidance strategy.
Figure 3. Curves of the angular rate and the relative velocity for the proposed guidance strategy.
Sensors 22 06650 g003
Figure 4. Acceleration demand of the missile and the target for the proposed guidance strategy.
Figure 4. Acceleration demand of the missile and the target for the proposed guidance strategy.
Sensors 22 06650 g004
Figure 5. Convergence curves of critic NN weights.
Figure 5. Convergence curves of critic NN weights.
Sensors 22 06650 g005
Figure 6. Engagement scenario for maneuvering target.
Figure 6. Engagement scenario for maneuvering target.
Sensors 22 06650 g006
Figure 7. Curves of the angular rate and the relative velocity for a maneuvering target.
Figure 7. Curves of the angular rate and the relative velocity for a maneuvering target.
Sensors 22 06650 g007
Figure 8. Acceleration demand of the missile and the target for a maneuvering target.
Figure 8. Acceleration demand of the missile and the target for a maneuvering target.
Sensors 22 06650 g008
Figure 9. Engagement scenario for the target’s maneuverability.
Figure 9. Engagement scenario for the target’s maneuverability.
Sensors 22 06650 g009
Figure 10. Curves of the angular rate and the relative velocity with unknown uncertainties.
Figure 10. Curves of the angular rate and the relative velocity with unknown uncertainties.
Sensors 22 06650 g010
Figure 11. Acceleration demands of the missile and the target.
Figure 11. Acceleration demands of the missile and the target.
Sensors 22 06650 g011
Figure 12. Engagement scenario and the control effort of the missile.
Figure 12. Engagement scenario and the control effort of the missile.
Sensors 22 06650 g012
Table 1. Comparison results of OGL, CDGGL, and our method.
Table 1. Comparison results of OGL, CDGGL, and our method.
MethodControl EffortComputation Time (t)Number of Tests
OGL in [4]980.561.647100
CDGGL in [18]570.641.083100
Our method218.8450.451100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xi, A.; Cai, Y. A Nonlinear Finite-Time Robust Differential Game Guidance Law. Sensors 2022, 22, 6650. https://doi.org/10.3390/s22176650

AMA Style

Xi A, Cai Y. A Nonlinear Finite-Time Robust Differential Game Guidance Law. Sensors. 2022; 22(17):6650. https://doi.org/10.3390/s22176650

Chicago/Turabian Style

Xi, Axing, and Yuanli Cai. 2022. "A Nonlinear Finite-Time Robust Differential Game Guidance Law" Sensors 22, no. 17: 6650. https://doi.org/10.3390/s22176650

APA Style

Xi, A., & Cai, Y. (2022). A Nonlinear Finite-Time Robust Differential Game Guidance Law. Sensors, 22(17), 6650. https://doi.org/10.3390/s22176650

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop