Optimal Cooperative Guidance Laws for Two UAVs Under Sensor Information Deficiency Constraints

This paper presents closed-form optimal cooperative guidance laws for two UAVs under information constraints that achieve the required relative approach angle. Two UAVs cooperate to optimize a common cost function under a coupled constraint on terminal velocity vectors and the information constraint which defines the sensor information availability. To handle the information constraint, a general two-player partially nested decentralized optimal control problem is considered in the continuous finite-horizon time domain. It is shown that under the state-separation principle the optimal solution of the decentralized control problem can be obtained by solving two centralized subproblems which cover the prediction problem for the information-deficient player and the prediction error minimization problem for the player with full information. Based on the solution of the decentralized optimal control problem, the explicit closed-form cooperative guidance laws that can be efficiently implemented on conventional guidance computers are derived. The performance of the proposed guidance laws is investigated on both centralized and decentralized cooperative scenarios with nonlinear engagement kinematics of networked two-UAV systems.


Cooperative Control of Networked Systems
Cooperative control problems in networks of multiple autonomous agents have received considerable attention in civilian and military applications. This is due to the advantages that swarm of multi-agent system brings and the growing interest in understanding the tactical hunting behaviors of animal group that realize greater efficiency and operational capability. Especially for cooperative missions of multiple unmanned aerial vehicles (UAVs), cooperative control techniques can be used to improve the operational performance and survivability, as well as greatly reducing the overall effort that would have been previously required by independently operating multiple agents for attack or surveillance missions. For instances, cooperative attack techniques are devised as a countermeasure against the formidable defense systems [1][2][3] and cooperative surveillance techniques are adopted in various applications in order to broadening the time and space coverage of monitoring and detection [4][5][6].
To achieve high-level autonomy for cooperative UAVs, one of the fundamental capabilities is to approach the destination with relative geometric constraints (e.g., terminal time and angle). For instance, the terminal time and angle constraints are the fundamental components that lead the cooperative loitering munitions to saturate and penetrate the defense systems [2]. Also they perform the key role in cooperative surveillance missions by enhancing the observability of the have their own processing unit and make their own decisions based on their own measurements [27]. Although the optimal solution of centralized control problems are well known, it is hard to obtain the optimal solution of decentralized problems. The well-known example of Witsenhausen [28] showed that the optimal controller of decentralized system with feedback is generally nonlinear and computationally intractable. Ho and Chu [29] developed a class of information structure for decentralized control problems, called partially nested, for which the optimal controller is linear. A broader class of problems called quadratically invariant was developed by Rotkowitz et al. [30] which includes the partially nested systems and have the property that the set of closed-loop maps is convex [31]. The explicit optimal solution of quadratically invariant problems are obtained in various search space. Swigart et al. [32] attained an explicit state-space solution in the discrete finite-horizon time domain, using a spectral factorization approach and dynamic programming. Kim and Lall [33][34][35][36] attained the explicit solution in the continuous infinite-horizon time domain by defining a unifying condition that split the decentralized optimal control problem into multiple centralized problems. However, to apply the decentralized solution to the cooperative guidance laws given as a polynomial function of time-to-go, it is necessary to obtain an explicit solution in the continuous finite-horizon time domain.

Contributions of This Paper
This paper aims to obtain the explicit guidance laws to control the relative approach angle of two UAVs under the nested dynamical structures with sensor information constraints. The relative approach angle constraints and the information constraints are considered to enhance the observability of networked two-UAV systems on the target and to cover the failure or security issues on networked systems, respectively. The centralized and decentralized optimal guidance solutions of networked two-UAV systems are derived in the continuous finite-horizon time domain. The key difference from the conventional LQR-based optimal guidance laws is that our approach considers the information deficiency constraints between the two UAVs, i.e., the first UAV's sensor measurement is available to the second UAV (via communication networks or by direct measurements), so the second UAV can use the first UAV's information for cooperation, while the first UAV is not able to use the second UAV's measurements. Please note that the conventional LQR frameworks are not able to handle these information constraints. Motivated by [35], the state-separation principle is proposed which enables separation of the decentralized control problem into multiple centralized problems. Based on the optimal solution of the decentralized control problem, the explicit closed-form cooperative guidance solutions are derived in terms of the line-of-sight angles and the line-of-sight angle rates, thus it can be easily implemented on typical guidance computers. To the best knowledge of the authors, this is the first attempt to describe the closed-form cooperative guidance solution that explicitly minimizes the finite-horizon linear quadratic objective function under information constraints. Finally, the solutions are converted to the guidance form with the cost function with the terminal velocity constraint.
The remainder of this paper is organized as follows: the two-UAV cooperative engagement geometry is presented first, and the optimal control problem under the nested dynamical structures with information constraints are formulated in Section 2. Section 3 shows the solution and the proofs of the decentralized control problem followed by the derivation of the cooperative guidance laws. In Section 4, numerical simulation results of networked two-UAV systems are presented. The concluding remarks are given in Section 5.

Problem Statement
Let us consider the planar homing guidance geometry of two UAVs and a stationary (or slowly moving) target, as shown in Figure 1. The X I − Y I frame is an inertial Cartesian coordinate system which is fixed in space. Variables associated with the i-th UAV for i = 1, 2 and the target are denoted by subscripts i and T. Here V, γ and λ denote the velocity, flight-path angle and the line-of-sight (LOS) angle, respectively. The normal acceleration of each vehicle is denoted by a and the predetermined relative approach angle constraint of two UAVs are denoted by θ ref . The relative distance of each UAV and target in Y I axis is denoted by ζ i . Other variables in Figure 1 are self-explanatory. Figure 1. Two UAVs in cooperative engagement geometry.

Kinematics Relations for Cooperative Engagement
The kinematics of each UAV for the homing problem can be expressed in vector form as follows: where the variables in bold font represent the value in X I − Y I coordinate. Hereî x andî y are the unit vectors of X I axis and Y I axis and the velocities of each UAV and target are assumed to be constant. Under the assumption, the constant closing velocity V c and the interception time can be calculated as where R i (t) is the nominal range-to-go of the i-th UAV at time t, and t 0 is the initial time.
The homing kinematics in Equation (1) is clearly nonlinear, which needs to be linearized in order for deriving guidance laws based on the linear quadratic (LQ) optimal control theory. For this purpose, the near-collision course assumption, which is valid for small γ t and λ i , is used. Then the linearized kinematics of two UAVs can be expressed in state-space form as follows: with the process noise w which covers the wind gust or the target maneuver a t . Please note that this is a straight-forward two-agent extension of the very widely used linearization techniques for classical optimal guidance problems [37][38][39].

Nested Dynamical Systems with Sensor Information Deficiency Constraints
Let us consider the nested dynamical structures for two UAVs, as shown in Figure 2 which implies that the dynamical interactions are directional. Without loss of generality, we consider two interconnected linear systems with nested dynamics as follows: Notice the two-UAV system model in Equation (3) can be described in this form. For generality, from here the word 'player' is used instead of 'UAV' until the optimal strategies are derived.
In the nested dynamics framework described above, x i and u i denote the state variable and control input of i-th player, respectively. Please note that the above equations imply that the first player's state and decision affects the second player, while the second player's does not affect the first player. Also, the first player's sensor measurement is available to the second player (via communication networks or by direct measurements), so the second player can use the first player's information for its control (u 2 depends on y 2 and y 1 ), while the first player is not able to use the second player's measurements (u 1 depends on y 1 only). Here, we assume that each player's sensor measures its state directly which indicates that C = I, and therefore y i = x i for i = 1, 2. Define the random variables of the initial states which are mutually independent with the following known probability density functions: and the process noises are assumed to be stationary zero-mean Gaussian which are characterized by the following covariance matrices: The set of available information for each player is defined as follows: which gives the control strategies as: where f i (z i ) is a function or a dynamical system that describes the controller of the i-th player.
The performance index in the finite-horizon quadratic form which couples the states x 1 and x 2 is defined as follows: where the weighting parameters H and Q are positive semidefinite and R is positive definite as follows: Then the optimization problem describing the cooperation of the two players can be stated as follows: Problem 1. (Two player LQR) For the following two-player nested dynamical systems model, and find the optimal strategies u 1 and u 2 that minimize the finite-horizon quadratic cost

State Separation
The centralized controller for linear quadratic regulation problem is well known that the full-state feedback is the optimal strategy, i.e., every player needs access to every information. However, in decentralized control problem it is impracticable because of the information asymmetry; player 1 does not have measurement information on player 2. Therefore, the state-separation principle is proposed which separates player 2's state (x 2 ) into two parts: (1) player 1's best estimate on player 2's state (x 2|1 ), and (2) the remainder (x 2 − x 2|1 = ∆x 2 , i.e., the estimation error). The accessible information with information set z i can be written to the conditional estimation E [x|z i ] which are as follows: • For player 1: where • For player 2: The subscript j|i indicates the state of player j estimated by player i. Notice that player 1 must estimate the state of player 2 which is defined by x 2|1 , and the best estimation is given by the dynamical propagationẋ which player 1 can compute only using player 1's measurement information and player 2's control strategy (it is assumed that the collaborator's control strategy is known to each other). The definition of the estimation error of x 2|1 and the corresponding control input u 2|1 = f 2 (x 2|1 ) are as follows: where the estimation error dynamics is given by: Substituting (13) into the system model in Equation (4), the system can be rewritten with the newly defined separated state x dec as follows: and

Uncorrelated Variance Propagation
In this subsection, it is shown that the variances of state accessible to player 1 and the estimation error are propagated independently, and it is used to derive two independent subproblems that give insight of the optimal strategy. The initial probability density functions in Equation (5) are assumed to be known to each other, where the initial probability density functions of x 2|1 and ∆x 2 are as follows: giving the structured covariance matrix of x dec (0) as follows: Consider a block 3 × 3 covariance matrix H which is structured as follows: where H 1 is block 2 × 2 (size compatible with A) and H 2 is block 1 × 1 (size compatible with A 22 ). Suppose an arbitrary feedback gain matrix C dec which has the same structure as H, then P propagates with the following Lyapunov equation: where A dec and B dec are the system and input matrix in Equation (16). In addition, the autocorrelation of process noise matrix, Q, is: hence A dec , B dec , C dec , and Q have the same structures as H, and consequently the covariance matrix P(t) also follows the same structure as below: Therefore, {x 1 , x 2|1 } and {∆x 2 } are uncorrelated and independent if C dec follows the same structure as H.

Cost Separation
The cost function in Equation (9) can be rewritten by substituting the state with the separated state in Equation (15) which yields where the state variables and the control variables are given by , and H dec , Q dec , R dec are given by : which helps to rewrite Problem 1 by using the following state-separated model.

Problem 2.
(Decentralized two player LQR) Consider the separated system model and find the optimal strategies u dec = u T where u 1 (t) and u 2|1 (t) are functions of y 1 (τ) for 0 ≤ τ ≤ t, and ∆u 2 (t) is a function of y 1 (τ) and y 2 (τ) for 0 ≤ τ ≤ t.
Based on the fact that {x 1 , x 2|1 } and {∆x 2 } are independent, Equation (24) can be separated into three parts following the structures of H.
• Part 1 : • Part 2 : • Part 3 : The cost J 1 and J 2 represent the performance index of the strategies by {x 1 , x 2|1 } and {∆x 2 } respectively, while J 3 describes the cross-coupling of {x 1 , x 2|1 } and {∆x 2 }. Suppose the weighting matrix H dec and Q dec are symmetric, then J 3 can be simplified as where u 2|1 and ∆u 2 are linear in x 2|1 and ∆x 2 respectively. Since {x 1 , x 2|1 } and {∆x 2 } are independent random variables with the initial probability density functions given in Equations (5) and (19) where their variance propagate with the Lyapunov equation in Equation (22), the expected values of the linear combinations of {x 1 , x 2|1 } and {∆x 2 } can be derived as follows: and Therefore, J 1 and J 2 are functions of independent variables and consequently J 3 = 0. Then the minimum of J is given by the summation of the minimums of J 1 and J 2 as follows: Now the optimal solution of Problem 2 can be obtained by solving two independent subproblems. The solution of each subproblem can be obtained by each player, the leader and the follower.

Subproblem 1. (Leader problem) Consider the separated model for the leader
where the accessible information for the leader is limited to Find the optimal strategies u 1 and u 2|1 that minimize the cost where u 1 (t) and u 2|1 (t) are functions of y 1 (τ) for 0 ≤ τ ≤ t.

Subproblem 2. (Follower problem) Consider the separated model for the follower
where the follower is accessible to every information Find the optimal strategy ∆u 2 that minimizes the cost where ∆u 2 (t) is a function of y 1 (τ) and y 2 (τ) for 0 ≤ τ ≤ t.

Main Results
In this section, the explicit optimal control strategy of the Problem 1 is obtained. Then based on the solution, the centralized and decentralized optimal two-agent cooperative guidance laws are derived.

Decentralized Two-Player Optimal Controller
Lemma 1. Consider the Subproblem 1 and suppose that there exists a stabilizing solution X for the Riccati equationẊ with the terminal condition where A, B are the system and input matrix in Equation (17) and H is the terminal weighting matrix defined in Equation (26).
Then the optimal strategies for Subproblem 1 are as follows: and x 2|1 evolves with the following dynamics: with the terminal condition Then the optimal strategy for Subproblem 2 is as follows: Proof. (for both Lemma 1 and Lemma 2) For Subproblems 1 and 2, all the necessary information is accessible to each player as follows: therefore, each subproblem is a centralized problem with full information, for which the optimal strategy is the well-known full-state feedback obtained by solving the Riccati equations in Equations (33) and (37). For more technical details including the solution uniqueness issues, one may refer to [33] or [35]. Theorem 1. The optimal strategies for Problem 1 are where K and L are obtained from Equations (36) and (40), and the state prediction x 2|1 is obtained from Proof. The proof follows from Lemma 1, Lemma 2 and the definition of ∆x 2 and ∆u 2 as follows: which achieves the minimum cost given in Equation (32).

Optimal Cooperative Guidance Laws for Two UAVs Relative Approach Angle Control
In this section, the main results in Theorem 1 are used for deriving the optimal cooperative guidance solutions for two UAVs under information constraints with the relative approach angle constraints. The solutions are expressed in terms of the line-of-sight parameters, so that they can be easily understood by aerospace guidance communities and can be efficiently implemented on conventional guidance computers.

Decentralized Solution with Information Deficiency Constraints
Based on Theorem 1 the decentralized solution of the leader problem can be obtained from Equation (62) as follows: and the follower problem is the well-known optimal rendezvous problem where the solution is given as: which can be rewritten in terms of the line-of-sight angles as follows: and a 2 (t) = a 2|1 (t) + ∆a 2 (t) Please note that the decentralized optimal strategy for agent 1 can be computed using τ 1 and λ 1 only, while the optimal strategy for agent 2 requires all the measurement information from both UAVs.

Potential Issues on Practical Implementation
The cooperative guidance laws proposed in this paper considers the matched time-horizon for both UAVs. However, they can vary for each agent because of the different launch conditions, uncertainties of the dynamical model, or external disturbances. To manage this issue, the cost function in Equation (45) can be extended for different time-horizons as follows: where the leader is assumed to intercept the target earlier than the follower does: Before the time-to-go of the follower reach to the initial time-to-go of the leader, V 1 is assumed to be 0 that yields α = 0. Therefore for 0 ≤ t ≤ t f ,2 − t f ,1 , the follower is guided with the proportional navigation (PN) guidance law for the centralized case and u 2|1 for the decentralized case, computed as follows: For t ≥ t f ,2 − t f ,1 the proposed guidance laws are used, where the leader predicts V c,2|1 , λ 2|1 andλ 2|1 which depend on the time-to-go of the follower by using the dynamical propagation in Equation (12), whereas the follower retains the necessary information by communication networks or by direct measurements.

Numerical Experiments
In this section, the performance of the proposed guidance strategies are investigated on networked two-UAV systems. We consider the planar nonlinear kinematics for the UAVs as follows: where the positions of the i-th UAV in the X I − Y I frame from Figure 1 are denoted by x i , y i in this section, and the other notations follow from Section 2. The process noise w i can be interpreted as the disturbance factor that includes the wind gust acting on the i-th UAV or target maneuver. Please note that it has been used and proven useful in a wide range of literature from classical optimal guidance problems [1,3,16,17,35,37].
In this example, two UAVs are launched from different locations with identical launch angle of 10 deg and guided to approach a stationary target with different approach angles separated by 30 deg. Please note that no explicit approach angle command for each UAV was given. For each UAV, the maximum available guidance command is assumed to be 3 g, where g represents the gravitational acceleration. The initial states of the UAVs and the target are listed in Table 1.
Both the centralized and decentralized cooperation scenarios are considered here. In the centralized cooperation scenario, the deterministic UAV dynamics, i.e., w i = 0, in Equation (70) is considered for reference analysis. The two UAVs are assumed to share the perfect information on τ i , λ i andλ i for i = 1, 2. In the decentralized cooperation scenario, the directional information constraint is considered, i.e., the first UAV's sensor measurement is available to the second UAV, while the second's measurement is not available to the first, and the stochastic dynamics is considered, where w i ∼ N (0, 0.5 2 ) disturbs the dynamics and trajectories. Our main interest is to see whether the proposed decentralized cooperation scheme let the two UAVs approach the target with the required relative approach angle while making up the perturbed dynamics and trajectories caused by the random disturbance w i 's. Figure 3 presents the trajectories obtained by the centralized guidance strategy. It is apparent that the guidance strategy enforces the required relative approach angle of 30 deg with a small error of 0.15 deg while both UAVs perfectly approaching to the target. Although both UAVs are launched to identical direction, UAV 1 approaches the target from above while UAV 2 approaches from below to satisfy the relative approach angle constraint. The impact time of each UAV is 10.12 s and 11.95 s, respectively. The detailed results are listed in Table 2.   Figure 4 shows the acceleration commands during the engagement. The guidance command for UAV 1 decreases to negative value which allows UAV 1 to approach the target from above to satisfy the relative angle constraint. The guidance command for UAV 2 increases from negative value to positive to approach the target from below to satisfy the relative approach angle constraint. Please note that UAV 2 is guided with the PN guidance law for the first 1.95 s where the matched time-to-go for UAV 1 does not exist in this engagement scenario. After that UAV 2 cooperates with UAV 1 for the matched time-horizon.

Decentralized Cooperation under Information Deficiency
Figures 5 and 6 present the decentralized cooperation scenario. In this case, UAV 1 estimates the state of UAV 2 by the dynamic propagation and UAV 2 issues a correction command in order to minimize the error caused by UAV 1 's estimate. Figure 5 presents the trajectories obtained by the decentralized guidance strategy where the thin black line (UAV 2|1 ) represents UAV 1 's best estimate on UAV 2 's position. It is evident that UAV 2 's position converges to that of UAV 2|1 which allows the relative approach angle to be maintained for a similar level with the centralized cooperation. The detailed results are listed in Table 3. Figure 6 shows the acceleration commands during the engagement where the thin black line (UAV 2|1 ) represents UAV 1 's best estimate on UAV 2 's guidance command. UAV 2 's guidance command is maintained slightly less than that of UAV 2|1 until UAV 2 's correction command increases sufficiently, and become greater than the estimated value in order to making up the perturbed trajectories caused by the disturbance. It is obvious that the guidance command of UAV 2 converges to that of UAV 2|1 at the terminal time which indicates that the correction command converges to zero. Figure 7 shows the error in UAV 1 's estimate on UAV 2 's line-of-sight angle and the line-of-sight rate. The last plot in Figure 7 shows the corresponding correction command computed by UAV 2 . The estimation errors increase under the influence of the disturbance, but they quickly converge to zero, which indicates the decentralized guidance strategy tries to follow the centralized guidance strategy.
Please note that the fluctuating guidance commands observed when the time-to-go approaches to zero is usual as in most of the classical and practical guidance laws [1,3,16,17,35,37].   Figure 6. Decentralized cooperation: guidance commands. Thin black line (UAV 2 |1) represents UAV 1 's best estimate on UAV 2 's guidance command, and the dotted blue line (UAV 2 ) represents the actual guidance command of UAV 2 as the sum a 2 = a 2|1 + ∆a 2 .

Conclusions
In this study, optimal cooperative guidance laws for two UAVs under sensor information deficiency constraints and the relative approach angle constraints are proposed. The general decentralized optimal control problem is formulated with the nested dynamics and information structure where the communication between the UAVs is directional. The optimal control problem is solved by adopting the state-separation principle which separates the decentralized optimal control problem into two centralized optimal control subproblems. The solution of the first subproblem considers the information deficiency which the leader is associated with according to the accessible information. The solution of the second subproblem considers additional effort for the follower that tries to minimize the prediction error of the leader. The optimal cooperative guidance solutions are derived in terms of the line-of-sight angles and the line-of-sight angle rates, so that it can be easily understood by the guidance community and can be easily implemented on conventional guidance computers.
Based on the proposed optimal cooperative guidance solution, the centralized and decentralized cooperative guidance laws are derived in closed form. Two UAVs cooperate to optimize a common objective function which couples their vertical velocity components at the terminal states. The performance of the proposed guidance strategies is investigated with nonlinear kinematics on both of centralized and decentralized cooperation setups. In the centralized cooperation setup, a deterministic scenario is considered, while a stochastic scenario is considered for decentralized cooperation setup. The simulation results show the guidance strategy enforces the required relative approach angle constraints and the decentralized guidance strategy converges to the centralized guidance strategy as the follower supports the prediction of the leader.
Future research directions may include extensions to general n-UAV cooperative guidance solutions or other cooperative mission objectives. Other realistic constraints such as collision avoidance, communication range, communication delay, and so on, should also be taken into account for practical implementation of the proposed approach.

Conflicts of Interest:
The authors declare no conflict of interest.