Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment

Han, Le; Song, Weilong; Yang, Tingting; Tian, Zeyu; Yu, Xuewei; An, Xuyang

doi:10.3390/electronics12173630

Open AccessArticle

Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment

by

Le Han

^1,2,

Weilong Song

^1,2,*,

Tingting Yang

^1,2,

Zeyu Tian

^1,2,

Xuewei Yu

^1,2,3 and

Xuyang An

^1,2

¹

China North Artificial Intelligence & Innovation Research Institute, Beijing 100072, China

²

Collective Intelligence & Collaboration Laboratory (CIC Lab), Beijing 100072, China

³

College of Artificial Intelligence, Nankai University, Tianjin 300071, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(17), 3630; https://doi.org/10.3390/electronics12173630

Submission received: 7 July 2023 / Revised: 13 August 2023 / Accepted: 23 August 2023 / Published: 28 August 2023

(This article belongs to the Special Issue New Technologies and Applications of Human-Robot Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

With the development of intelligent technology, multi-agent systems have been widely applied in military and civilian fields. Compared to a single platform, multi-agent systems can complete more dangerous, difficult, and heavy tasks. However, due to the limited autonomy of unmanned platforms and the regulatory needs of personnel, multi-agent systems cooperating with manned platforms to perform tasks have been more widely promoted at this stage of development. This paper addresses a differential game method for cooperative decision-making of a multi-agent system cooperating with the manned platform for the target-pursuit problem. The manned platform pursues the target according to a certain trajectory, and its state can be obtained by the multi-agent system. Firstly, for the case that the target moves with a fixed trajectory, the target-pursuit problem in a manned–unmanned environment is viewed in the form of game based on a communication graph among agents. Secondly, strategies of all agents are proposed while maintaining their group cohesion. A set of coupled differential equations is solved to implement strategy calculation. Compared to purely unmanned systems, the strategies combine the advantages of the manned platform and add a reference item, which can achieve team cohesion relatively quickly. Furthermore, a brief analysis is made on the scenarios where the target is in another case or adopts other strategies. Finally, comparative simulations have verified the effectiveness and synergy of the strategy.

Keywords:

manned–unmanned; target pursuit; cooperative decision; differential game

1. Introduction

The multi-agent system is widely concerned with the development of the unmanned system and intelligent technology [1,2]. Due to its advantages such as autonomy, loose coupling, high fault tolerance, and scalability, multi-agent systems are widely used in fields such as formation control [3], intelligent logistics [4], and collaborative search [5], among others. The cooperative decision-making of multi-agent systems focuses on the interactive behaviors among agents as well as the information exchanging between agents and the environment. Agents ultimately cooperate or compete with each other to accomplish tasks. In recent years, game theory has become a hot topic of research because of its ability to establish models for strategic interactions among agents, which can involve both cooperation and competition [6,7,8].

Due to the limitations in autonomy and intelligence of current unmanned systems, in many complex real-world scenarios, tasks are often executed through the collaboration of manned platforms and multiple unmanned platforms [9,10]. This collaborative approach allows for the full utilization of the strengths of both manned and unmanned platforms, addressing challenges that unmanned systems alone may struggle to tackle. Moreover, the participation of manned platforms can enhance task execution efficiency, success rate, and safety, thereby reducing people’s concerns about responsibility, decision-making, safety, and other regulatory aspects in multi-agent systems to some extent [11]. Therefore, as an emerging area of study, manned–unmanned collaboration is rapidly developing and being applied.

As a typical topic of the collaborative field, target-pursuit problems have received significant attention in recent decades due to their widespread application, such as drone planning [12], territorial defense [13], and search and rescue [14]. In [15], the pursuit problem was proposed first, where the agents move in a grid to capture randomly moving targets. A cooperative algorithm for the multi-agent system to pursue static targets was proposed in [16]. In [17], a real-time approach for the target-pursuit problem of the multi-agent system was studied, wherein two coordination strategies, blocking escape directions (BES), and using alternative proposals (UAL) were adopted. For the moving targets, an opportunistic framework was proposed for multi-robot pursuit of targets in a dynamic environment with partial observability [18]. The consensus pursuit problem was discussed in [19] for multi-agent systems; the authors designed a distributed multi-flocking approach based on the local information. Agents could choose the target adaptively by combining information with the circle formation control law proposed. Inspired by the division of roles in wolf hunting, new strategies for multi-robot collaborative tracking of targets was proposed in [20], which improved the convergence performance of the algorithm and shortened the hunting time significantly.

Furthermore, dynamic game theory is often used for target-pursuit problems because of its advantage in dynamic interaction and decision-making, and because the decision-making process of the game players is the process of cooperation or competition within the team [21]. The game theory approach in [22] attempted to address the target-tracking problem by using cost functions in an obstacle-free environment, and a PD-like fuzzy controller was used to tune the cost function weights. In [23], the pursuit of invisible targets was discussed within the framework of a game, and some results on how the minor closed properties are related to pathwidth are concluded. In the case of multiple pursuers, three kinds of pursuit strategies, cooperative, non-cooperative, and greedy, are proposed in [24], and the sufficient conditions for achieving uniform exponential stability are given. Furthermore, the multiple pursuers strategy in the presence of the parameter of uncertainty was given in [25], and the acquisition probability was determined using the Monte Carlo method. The distributed strategy designed by combining differential game theory with collaborative control theory can handle target-pursuit situations in system communication topology switching [26]. In [27], evolutionary game theory was introduced to solve the pursuit problem of multiple targets, an innovative three-level decomposition method was used to decompose the previously complex multiplayer game into multiple small-scale games, and a multi-agent Q-learning method based on evolutionary game models was proposed.

The studies mentioned above only consider the situations of target-pursuit problems by unmanned platforms or multi-agent systems. However, in practice, it is almost impossible that unmanned platforms perform complex tasks independently until now. The multi-agent system often needs to cooperate with manned platforms or accept commands from manned platforms. There are some results considering manned–unmanned cooperation, mostly qualitative decision-making on manned platforms and quantitative decision-making on unmanned platforms [28,29]. Cooperative operations of Manned/Unmanned Aerial Vehicle hybrid formation in antisubmarine warfare were analyzed in [30]. In order to compensate for the lack of intelligence in unmanned systems, the composition and key challenges of the MAV-UAV collaborative combat system were explored in [31]. These studies mostly focused on top-level planning from a global perspective, without considering coordination issues from the perspective of unmanned platforms. And note that none of the above literature focuses on target-pursuit problems.

In response to the above discussion, this paper proposes a cooperative decision-making method for the multi-agent system cooperating with a manned platform using differential game theory. The key idea of this method is to model the target-pursuit problem in the form of a game, where each agent is seen as a player, and each player can interact with their neighbors for state and strategies. The agents make decisions in the process of the dynamic game continuously, then cooperate with a manned platform to complete the target-pursuit task. More specifically, the contributions of this paper include the following:

A novel formulation of the differential game as a target-pursuit strategy in a manned–unmanned environment that can be applied in various manned–unmanned cooperative systems.
We propose a hierarchical decision-making approach for target-pursuit scenarios. The proposed method can enable the multi-agent system to cooperate with the manned platforms in a reasonable manner, ultimately achieving team cohesion while capturing the target. Furthermore, we discuss scenarios where the target is in different forms of motion. The method proposed in this paper can be applied to the static target, the target moving along fixed trajectories, and it can be extended to situations where the target adopts escape strategies.
Simulations prove that the proposed method can pursue the target in a manned–unmanned environment successfully. And the effects of each parameter are analyzed through comparative simulations.

The rest of this paper is organized as follows: Section 2 presents the required preliminaries and the formulation of the target-pursuit problem. In Section 3, the strategies for agents are designed using linear quadratic differential game theory, and simulations are presented to verify the effectiveness of the proposed approach. The discussion and outlook on the results are presented in Section 4. Section 5 concludes the paper.

Notation 1.

The notation used in the paper is fairly standard except where otherwise stated.

R

denotes the field of real numbers,

R^{n}

denotes the set of n-dimensional real-valued column vectors and

R^{n \times m}

denotes the set of all

n \times m

real matrices. I denotes the identity matrix of the compatible dimension and

0_{m \times n}

denotes the n-by-m dimensional zero matrix.

A^{T}

represents the transpose of the matrix A;

A^{- 1}

represents the inverse of the matrix A. For a matrix

Q \in R^{n}

, the notation

Q ≻ 0

means that Q is positive definite and

Q ≽ 0

means positive semidefinite. In symmetric block matrices, ★ is used to denote a term induced by symmetry. The notation

blkdiag {A_{1}, \dots, A_{n}}

represents a block diagonal matrix whose main diagonal consists of square matrices

A_{1}, \dots, A_{n}

. The norm

∥ x ∥

denotes the Euclidean norm of vector x,

∥ x ∥ = {(x^{T} x)}^{1 / 2}

,

x \in R^{n}

.

A \otimes B

is the Kronecker product of matrix A and matrix B.

2. Problem Formulation

We consider the target-pursuit problem of the multi-agent system in this paper. The multi-agent system cooperates with an manned platform to pursue a target while maintaining its cohesive state. We describe the target-pursuit problem in the form of a game, which terminates when the target is successfully captured. Without loss of generality, agents are represented as nodes in the network, and the communication relationships among agents are represented by the edges of the network. Then, we introduce some basic knowledge of graph theory to describe the multi-agent system and the internal relationships.

2.1. Communication Graph

The internal relationships of the multi-agent system and communication among agents can be represented by a directed graph

G = (V, E)

, where

V = {1, 2, . . ., l}

is the finite nonempty vertices set and

E \subseteq V \times V

is the set of edges. The set of vertices corresponds to the agents and the edges represent the interconnection between two agents. The edge of the graph

(i, j)

denotes that agent i can obtain information from agent j with weight

ω_{i j} \geq 0

. The set of all neighbors of agent i in the graph

G

is represented by

N (i)

. If there is a path from vertex i to vertex j in graph

G

, then i and j are said to be connected. If every pair of vertices is connected, then the graph

G

is said to be connected.

For the directed graph

G

, the incidence matrix D describes the relationship between vertices and edges. D is the

0, \pm 1

matrix, where its

u v

th element is equal to 1 if node u is the head of the edge v,

- 1

if node u is the tail of the edge v, and 0 if u is not the vertex of v. For the weighted graph, each edge has its own weight coefficient. The weight matrix is represented by

W = d i a g (ω_{i j})

, which is a diagonal matrix. For agent i, its weight matrix

W_{i}

is related to its neighbors. The Laplacian matrix L of the directed graph

G

can be defined as

L = D W D^{T}

, which contains important properties of the graph.

Assumption 1.

The graph

G

does not have self-loops and only contains ordered pairs of distinct vertices. Graph

G

is a connected graph.

Assumption 2.

The manned platform can observe the state information of the target. All nodes of the multi-agent system can observe the state information of the manned platform, and some nodes can observe the state information of the target.

2.2. Game Model of Target-Pursuit Problem

Consider a scenario where a manned platform and a group of l agents cooperate to pursue a target. The manned platform pursues the target along a certain trajectory, and the multi-agent system makes decisions based on observing the states of the manned platform. The cooperative decision of the multi-agent system can be modeled as a class of a linear quadratic differential game with l players.

The cooperative decision-making process between the manned platform and multi-agent system can be seen as the form of hierarchical game. For a target, the manned platform takes the initiative in making decisions due to its superior capabilities, the multi-agent system observes the state of the manned platform and makes decisions accordingly to collaborate with the manned platform in completing the task.

Each agent acts as a player in the game, and its state equation is as follows:

\begin{matrix} {\dot{x}}_{i} = A_{i} x_{i} + b_{i} u_{i}, i = 1, 2, . . ., l, \end{matrix}

(1)

where i represents the number of the players,

x_{i} (t) \in R^{2 n}

is the state vector, and

u_{i} (t) \in R^{n}

is the control vector for player i.

A_{i} = [\begin{matrix} 0_{n \times n} & I_{n} \\ 0_{n \times n} & 0_{n \times n} \end{matrix}], b_{i} = [\begin{matrix} 0_{n \times n} \\ I_{n} \end{matrix}]

.

Consider the vector

z = {[x_{1}^{T}, x_{2}^{T}, . . ., x_{l}^{T}]}^{T} \in R^{2 n l},

according to (1), the system dynamics is given as follows:

\begin{matrix} \dot{z} = A z + \sum_{i = 1}^{l} B_{i} u_{i}, \end{matrix}

(2)

where

A = b l k d i a g \{A_{1}, A_{2}, . . ., A_{l}\},

B_{i} = {[0_{1 \times (i - 1)}, 1, 0_{1 \times (l - i)}]}^{T} \otimes b_{i} \in R^{2 n l \times n} .

Suppose that the target moves along a fixed trajectory, and its state equation is

\begin{matrix} {\dot{x}}_{t} = A_{t} x_{t}, \end{matrix}

(3)

where

x_{t} (t) \in R^{2 n}

is the state vector of the target. Suppose that the manned platform moves along a certain trajectory to pursue the target, and its state equation is

\begin{matrix} {\dot{x}}_{m} = A_{m} x_{m}, \end{matrix}

(4)

where

x_{m} (t) \in R^{2 n}

is the state vector of the manned platform.

Consider the vector

\bar{z} = {[x_{1}^{T}, x_{2}^{T}, . . ., x_{l}^{T}, x_{m}^{T}, x_{t}^{T}]}^{T} \in R^{2 n (l + 2)},

the augmented system dynamics is given as follows:

\begin{matrix} \dot{\bar{z}} = \bar{A} \bar{z} + \sum_{i = 1}^{l} {\bar{B}}_{i} u_{i}, \end{matrix}

(5)

where

\bar{A} = [\begin{matrix} A & 0_{2 n l \times 2 n} & 0_{2 n l \times 2 n} \\ 0_{2 n \times 2 n l} & A_{m} & 0_{2 n \times 2 n} \\ 0_{2 n \times 2 n l} & 0_{2 n \times 2 n} & A_{t} \end{matrix}], {\bar{B}}_{i} = [\begin{matrix} B_{i} \\ 0_{2 n \times n} \\ 0_{2 n \times n} \end{matrix}] .

Then, we are intended to design individual cost functions for each agent according to its role in the system. Define the set of agents that can obtain the target’s state information as

P

, and the other agents as set

F

. For agent

i \in P

, the goal is to maintain a cohesive state with their neighbors

N (i)

while cooperating with the manned platform to pursue the target as efficiently as possible. Hence, the weighted distance that agent i needs to optimize is

\begin{matrix} \sum_{p \in N (i)} ω_{i p} ∥ x_{i} - x_{p} ∥^{2} + ∥ x_{i} - x_{m} ∥^{2} + {∥ x_{i} - x_{t} ∥}^{2}, \end{matrix}

(6)

where the first term is the weighted sum of the distance between agent i and its neighbors

N (i)

, which is the term that denotes system cohesion; the second term is the distance between between agent i and the manned platform; the third term is the distance between agent i and the target.

The weighted distance (6) can be transformed into the following form:

\begin{matrix} {\bar{z}}^{T} {\bar{L}}_{i} \bar{z} + {\bar{z}}^{T} {\bar{K}}_{i 2} \bar{z} + {\bar{z}}^{T} {\bar{K}}_{i 3} \bar{z}, \end{matrix}

(7)

where

{\bar{L}}_{i} = [\begin{matrix} L_{i} \otimes I_{2 n} & 0_{2 n l \times 2 n} & 0_{2 n l \times 2 n} \\ 0_{2 n \times 2 n l} & 0_{2 n \times 2 n} & 0_{2 n \times 2 n} \\ 0_{2 n \times 2 n l} & 0_{2 n \times 2 n} & 0_{2 n \times 2 n} \end{matrix}]

,

L_{i} = D W_{i} D^{T}

,

W_{i}

is the weight matrix of agent i;

{\bar{K}}_{i 2} = {\tilde{K}}_{i 2} \otimes I_{2 n}

,

{\tilde{K}}_{i 2} \in R^{(l + 2) \times (l + 2)}

,

{\tilde{K}}_{i 2} (i, i) = 1

,

{\tilde{K}}_{i 2} (i, l + 1) = - 1

,

{\tilde{K}}_{i 2} (l + 1, i) = - 1

,

{\tilde{K}}_{i 2} (l + 1, l + 1) = 1

, and the remaining elements are 0;

{\bar{K}}_{i 3} = {\tilde{K}}_{i 3} \otimes I_{2 n}

,

{\tilde{K}}_{i 3} \in R^{(l + 2) \times (l + 2)}

,

{\tilde{K}}_{i 3} (i, i) = 1

,

{\tilde{K}}_{i 3} (i, l + 2) = - 1

,

{\tilde{K}}_{i 3} (l + 2, i) = - 1

,

{\tilde{K}}_{i 3} (l + 2, l + 2) = 1

, and the remaining elements are 0.

The agent minimizes their own cost function during the game process. Using

u_{- i}

represents the weighted sum of strategies of all neighbors of agent i, which attempts to minimize the cost function of i; its form is as follows:

\begin{matrix} u_{- i} = α_{i} \sum_{j \in N (i)} u_{j} . \end{matrix}

(8)

Design

{\bar{B}}_{- i} = α_{i} \sum_{j \in N (i)} {\bar{B}}_{j}

,

{\bar{Q}}_{i} = {\bar{τ}}_{i 1} {\bar{L}}_{i} + {\bar{τ}}_{i 2} {\bar{K}}_{i 2} + {\bar{τ}}_{i 3} {\bar{K}}_{i 3}

, where

{\bar{τ}}_{i 1}

,

{\bar{τ}}_{i 2}

, and

{\bar{τ}}_{i 3}

are weight coefficients.

{\bar{τ}}_{i 1}

represents the importance that the agent i places on achieving cohesive states,

{\bar{τ}}_{i 2}

indicates the importance that the agent i places on collaborating with human platforms to accomplish the task, and

{\bar{τ}}_{i 3}

indicates the importance that the agent i places on pursuing the target.

Remark 1.

For agents that cannot observe the target’s information (the set

F

), the weighted distance does not include the third term, hence, we set

{\bar{τ}}_{i 3} = 0

. The strategy of this type of agent is to maintain the cohesive state with its neighbors and the manned platform as much as possible. In practical scenarios, this type of agent is often a lower-level platform.

The cost function for agent i to be minimized can be expressed as follows:

\begin{matrix} {\bar{J}}_{i} = \int_{0}^{t_{f}} ({\bar{z}}^{T} {\bar{Q}}_{i} \bar{z} + u_{i}^{T} {\bar{R}}_{i} u_{i} + u_{- i}^{T} {\bar{R}}_{- i} u_{- i}) d t + {\bar{z}}^{T} (t_{f}) {\bar{Q}}_{i f} \bar{z} (t_{f}), i = 1, 2, . . ., l, \end{matrix}

(9)

where

t_{f}

is is the terminal moment,

{\bar{Q}}_{i} = {\bar{Q}}_{i}^{T}

,

{\bar{Q}}_{i f} = {\bar{Q}}_{i f}^{T}

,

R_{i} ≻ 0

and

R_{- i} ≻ 0

. The purpose of this article is to solve strategy

u_{i}^{*}

so that it satisfies the following condition:

\begin{matrix} {\bar{J}}_{i} (u_{i}^{*}, u_{- i}^{*}) ≼ {\bar{J}}_{i} (u_{i}, u_{- i}^{*}), \end{matrix}

(10)

where

i \in V = {1, 2, . . ., l}

.

3. Results

3.1. Solution to the Target-Pursuit Problem

In this subsection, we give strategies for the multi-agent system which cooperate with a manned platform to execute the target-pursuit task when the target moves along a fixed trajectory. We address a linear quadratic differential game method for the target-pursuit problem. According to the game model of Equations (5) and (9), the following theorem provides the pursuit strategies of agents while maintaining a cohesive state with the manned platform and neighbors.

Theorem 1.

Consider the system (5), the state Equations (1) of agents, and (9) of cost functions, then, the strategy of agent i

\begin{matrix} u_{i}^{*} = - R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 11} z - R_{i}^{- 1} B_{i}^{T} b_{i} - R_{i}^{- 1} B_{i}^{T} c_{i}, \end{matrix}

(11)

satisfies the inequality (10).

In Equation (11),

{\bar{P}}_{i 11} = {\bar{P}}_{i 11}^{T} ≻ 0

and

{b c}_{i}

,

c_{i}

are solutions of the following coupled differential equation in

[0, t_{f}]

:

\begin{matrix} - {\dot{\bar{P}}}_{i 11} & = & Q_{i} + {\bar{P}}_{i 11} (A - \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11}) + {(A - \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 11} \end{matrix}

\begin{matrix} - & {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 11} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i}, \end{matrix}

(12)

\begin{matrix} {\bar{P}}_{i 11} (t_{f}) & = & Q_{i f}, \end{matrix}

(13)

\begin{matrix} {\dot{b}}_{i} & = & (- A^{T} + {\bar{P}}_{i 11} S_{i} + {\bar{P}}_{i 11} S_{- i} + \sum_{p = 1, p \neq i}^{l} {\bar{P}}_{p 11} S_{p}) b_{i} - {\bar{Q}}_{i 12} x_{m} + {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} b_{p}, \end{matrix}

(14)

\begin{matrix} b_{i} (t_{f}) & = & {\bar{Q}}_{i 12 f} x_{m} (t_{f}), \end{matrix}

(15)

\begin{matrix} {\dot{c}}_{i} & = & (- A^{T} + {\bar{P}}_{i 11} S_{i} + {\bar{P}}_{i 11} S_{- i} + \sum_{p = 1, p \neq i}^{l} {\bar{P}}_{p 11} S_{p}) c_{i} - {\bar{Q}}_{i 13} x_{t} + {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} c_{p}, \end{matrix}

(16)

\begin{matrix} c_{i} (t_{f}) & = & {\bar{Q}}_{i 13 f} x_{t} (t_{f}), \end{matrix}

(17)

where

S_{i} = B_{i} R_{i}^{- 1} B_{i}^{T}

,

S_{- i} = B_{- i} R_{- i}^{- 1} B_{- i}^{T}

.

Proof of Theorem 1.

Define the function

V_{i} (t, \bar{z}) = {\bar{J}}_{i}^{*}

; according to [32], we can obtain

\begin{matrix} - \frac{\partial V_{i} (t, \bar{z})}{\partial t} = min_{u_{i} u_{- i}} \{{(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} (\dot{\bar{z}} + {\bar{B}}_{- i} u_{- i}) + {\bar{z}}^{T} {\bar{Q}}_{i} \bar{z} + u_{i}^{T} {\bar{R}}_{i} u_{i} - u_{- i}^{T} {\bar{R}}_{- i} u_{- i}\}, \end{matrix}

(18)

\begin{matrix} V_{i} (t_{f}, \bar{z}) = {\bar{z}}^{T} (t_{f}) {\bar{Q}}_{i f} \bar{z} (t_{f}) . \end{matrix}

(19)

Minimizing Equation (18), we have

\begin{matrix} {(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} {\bar{B}}_{i} + 2 {\bar{R}}_{i} u_{i}^{*} = 0, \end{matrix}

(20)

\begin{matrix} {(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} {\bar{B}}_{- i} + 2 {\bar{R}}_{- i} u_{- i}^{*} = 0, \end{matrix}

(21)

then, the following equations hold

\begin{matrix} u_{i}^{*} = - \frac{1}{2} {\bar{R}}_{i}^{- 1} {\bar{B}}_{i}^{T} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}}; \end{matrix}

(22)

\begin{matrix} u_{- i}^{*} = - \frac{1}{2} {\bar{R}}_{- i}^{- 1} {\bar{B}}_{- i}^{T} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} . \end{matrix}

(23)

Define

{\bar{S}}_{i} = {\bar{B}}_{i} {\bar{R}}_{i}^{- 1} {\bar{B}}_{i}^{T}

,

{\bar{S}}_{- i} = {\bar{B}}_{- i} {\bar{R}}_{- i}^{- 1} {\bar{B}}_{- i}^{T}

, by applying Equations (22) and (23) to (18), so we easily obtain

\begin{matrix} - \frac{\partial V_{i} (t, \bar{z})}{\partial t} & = & {(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} [\bar{A} \bar{z} - \frac{1}{2} {\bar{S}}_{i} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} - \frac{1}{2} {\bar{S}}_{- i} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} + \sum_{p = 1, p \neq i}^{l} {\bar{B}}_{p} u_{p}] \\ + & {\bar{z}}^{T} {\bar{Q}}_{i} \bar{z} + \frac{1}{4} {(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} {\bar{S}}_{i} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} + \frac{1}{4} {(\frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}})}^{T} {\bar{S}}_{- i} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} . \end{matrix}

(24)

In order to solve the above partial differential equation, define

V_{i} (t, \bar{z}) = {\bar{z}}^{T} {\bar{P}}_{i} \bar{z}

, then we have

\begin{matrix} \frac{\partial V_{i} (t, \bar{z})}{\partial \bar{z}} & = & 2 {\bar{P}}_{i} \bar{z}, \end{matrix}

(25)

\begin{matrix} \frac{\partial V_{i} (t, \bar{z})}{\partial t} & = & {\bar{z}}^{T} {\dot{\bar{P}}}_{i} \bar{z} . \end{matrix}

(26)

By applying Equation (25) to (20), we obtain

u_{i}^{*} = - {\bar{R}}_{i}^{- 1} {\bar{B}}_{i}^{T} {\bar{P}}_{i} \bar{z}

. Similarly, it can be obtained that

u_{- i}^{*} = - R_{- i}^{- 1} {\bar{B}}_{- i}^{T} {\bar{P}}_{i} \bar{z}

.

u_{i}^{*}

,

u_{- i}^{*}

and Equations (24)–(26) lead to the conclusion that

\begin{matrix} - {\bar{z}}^{T} {\dot{\bar{P}}}_{i} \bar{z} & = 2 {\bar{z}}^{T} {\bar{P}}_{i} (\bar{A} \bar{z} - {\bar{S}}_{i} {\bar{P}}_{i} \bar{z} - {\bar{S}}_{- i} {\bar{P}}_{i} \bar{z} - \sum_{p = 1, p \neq i}^{l} {\bar{S}}_{p} {\bar{P}}_{p} \bar{z}) + {\bar{z}}^{T} {\bar{Q}}_{d i} \bar{z} \\ + {\bar{z}}^{T} {\bar{P}}_{i} {\bar{S}}_{i} {\bar{P}}_{i} \bar{z} + {\bar{z}}^{T} {\bar{P}}_{i} {\bar{S}}_{- i} {\bar{P}}_{i} \bar{z} . \end{matrix}

(27)

Then, it is straightforward to see that

\begin{matrix} 0 & = {\bar{z}}^{T} [{\dot{\bar{P}}}_{i} + {\bar{Q}}_{i} - {\bar{P}}_{i} {\bar{S}}_{i} {\bar{P}}_{i} - {\bar{P}}_{i} {\bar{S}}_{- i} {\bar{P}}_{i} + {\bar{P}}_{i} (\bar{A} - \sum_{p = 1, p \neq i}^{l} {\bar{S}}_{p} {\bar{P}}_{p}) \\ + {(\bar{A} - \sum_{p = 1, p \neq i}^{l} {\bar{S}}_{p} {\bar{P}}_{p})}^{T} {\bar{P}}_{i}] \bar{z}, \end{matrix}

(28)

hence, the following equation holds

\begin{matrix} - {\dot{\bar{P}}}_{i} & = & {\bar{Q}}_{i} + {\bar{P}}_{i} (\bar{A} - \sum_{p = 1, p \neq i}^{l} {\bar{S}}_{p} {\bar{P}}_{p}) + {(\bar{A} - \sum_{p = 1, p \neq i}^{l} {\bar{S}}_{p} {\bar{P}}_{p})}^{T} {\bar{P}}_{i} \end{matrix}

\begin{matrix} - & {\bar{P}}_{i} {\bar{S}}_{i} {\bar{P}}_{i} - {\bar{P}}_{i} {\bar{S}}_{- i} {\bar{P}}_{i}, \end{matrix}

(29)

\begin{matrix} {\bar{P}}_{i} (t_{f}) & = & {\bar{Q}}_{i f} . \end{matrix}

(30)

The time derivative of

V_{i} (t, \bar{z})

is given by

\begin{matrix} {\dot{V}}_{i} & = & - {\bar{z}}^{T} {\bar{Q}}_{i} \bar{z} - u_{i}^{T} {\bar{R}}_{i} u_{i} + {(u_{i} - u_{i}^{*})}^{T} {\bar{R}}_{i} (u_{i} - u_{i}^{*}) - u_{- i}^{T} {\bar{R}}_{- i} u_{- i} \\ + & {(u_{- i} - u_{- i}^{*})}^{T} {\bar{R}}_{- i} (u_{- i} - u_{- i}^{*}) . \end{matrix}

(31)

We integrate both sides of Equation (27) in

[0, t_{f}]

and then have

\begin{matrix} {\bar{J}}_{i} & = & \int_{0}^{t_{f}} ({(u_{i} - u_{i}^{*})}^{T} {\bar{R}}_{i} (u_{i} - u_{i}^{*}) + \\ {(u_{- i} - u_{- i}^{*})}^{T} {\bar{R}}_{- i} (u_{- i} - u_{- i}^{*})) d t + V_{i} (\bar{z} (0)) . \end{matrix}

(32)

The above equation shows that the control strategy

u_{i}^{*}

can minimize the cost function of the agent i.

Partitioning the matrix

{\bar{P}}_{i} \in R^{2 n (l + 2) \times 2 n (l + 2)}

yields

\begin{matrix} {\bar{P}}_{i} = [\begin{matrix} {\bar{P}}_{i 11} & {\bar{P}}_{i 12} & {\bar{P}}_{i 13} \\ ★ & {\bar{P}}_{i 22} & {\bar{P}}_{i 23} \\ ★ & ★ & {\bar{P}}_{i 33} \end{matrix}], \end{matrix}

(33)

where

{\bar{P}}_{i 11} \in R^{2 n l \times 2 n)}

,

{\bar{P}}_{i 12} \in R^{2 n l \times 2 n}

,

{\bar{P}}_{i 22} \in R^{2 n l \times 2 n}

.

Similarly, matrices

{\bar{Q}}_{i}

,

{\bar{S}}_{i}

,

{\bar{S}}_{- i}

, and

\bar{A}

can also be partitioned according to this rule. Let

Q_{i} = τ_{i 1} {\hat{L}}_{i} + τ_{i 2} {\hat{K}}_{i 2} + τ_{i 3} {\hat{K}}_{i 3}

, where

τ_{i 1}

,

τ_{i 2}

, and

τ_{i 3}

are weight coefficients,

{\hat{L}}_{i} = L_{i} \otimes I_{2 n}

,

{\hat{K}}_{i 2} = K_{i 2} \otimes I_{2 n}

and

{\hat{K}}_{i 3} = K_{i 3} \otimes I_{2 n}

,

K_{i 2}

and

K_{i 3}

are

l \times l

matrices,

K_{i 2} (i, i) = 1

,

K_{i 3} (i, i) = 1

, and the remaining elements are 0. Therefore,

\begin{matrix} {\bar{Q}}_{i 11} & = & Q_{i}, {\bar{A}}_{11} = A, {\bar{A}}_{22} = A_{m}, {\bar{A}}_{33} = A_{t}, \\ {\bar{S}}_{i 11} & = & S_{i} = B_{i} R_{i}^{- 1} B_{i}^{T}, {\bar{S}}_{- i 11} = S_{- i} = B_{i} R_{- i}^{- 1} B_{i}^{T} \\ {\bar{Q}}_{i 12} & = & {0_{1 \times (i - 1)}, - 1, 0_{1 \times (l - i)}}^{T} \otimes I_{2 n}, {\bar{Q}}_{i 13} = {0_{1 \times (i - 1)}, - 1, 0_{1 \times (l - i)}}^{T} \otimes I_{2 n} . \end{matrix}

According to the above definition of block matrices, Riccati differential Equations (29) and (30) can be decomposed; specifically, we obtain

\begin{matrix} - {\dot{\bar{P}}}_{i 11} & = & Q_{i} + {\bar{P}}_{i 11} (A - \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11}) + {(A - \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 11} \end{matrix}

\begin{matrix} - & {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 11} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i}, \end{matrix}

(34)

\begin{matrix} {\bar{P}}_{i 11} (t_{f}) & = & Q_{i f}, \end{matrix}

(35)

\begin{matrix} - {\dot{\bar{P}}}_{i 12} & = & {\bar{P}}_{i 12} A_{m} - {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 12} - {(\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 12} \end{matrix}

\begin{matrix} + & A^{T} {\bar{P}}_{i 12} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i 12} - {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 12} + {\bar{Q}}_{i 12}, \end{matrix}

(36)

\begin{matrix} {\bar{P}}_{i 12} (t_{f}) & = & {\bar{Q}}_{i 12 f}, \end{matrix}

(37)

\begin{matrix} - {\dot{\bar{P}}}_{i 13} & = & {\bar{P}}_{i 13} A_{t} - {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 13} - {(\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 13} \end{matrix}

\begin{matrix} + & A^{T} {\bar{P}}_{i 13} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i 13} - {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 13} + {\bar{Q}}_{i 13}, \end{matrix}

(38)

\begin{matrix} {\bar{P}}_{i 13} (t_{f}) & = & {\bar{Q}}_{i 13 f} . \end{matrix}

(39)

Obviously, Equation (34) and (35) are coupled differential Equations (12) and (13) of Theorem 1.

Due to

\bar{z} = {[z^{T}, x_{m}^{T}, x_{t}^{T}]}^{T}

, control strategy

u_{i}^{*} = - {\bar{R}}_{i}^{- 1} {\bar{B}}_{i}^{T} {\bar{P}}_{i} \bar{z}

can be expressed as

\begin{matrix} u_{i}^{*} = - R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 11} z - R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 12} x_{m} - R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 13} x_{t}, \end{matrix}

(40)

where the first term

- R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 11} z

on the right-hand side is dependent on the state of neighboring agents, the second term

- R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 12} x_{m}

is related to the state of the manned platform, and the third term

- R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 13} x_{t}

is related to the state of the target.

Let

b_{i} = {\bar{P}}_{i 12} x_{m}

,

i = 1, 2, . . . m

, taking the derivative of

b_{i}

, we have

\begin{matrix} {\dot{b}}_{i} & = & {\dot{\bar{P}}}_{i 12} x_{m} + {\bar{P}}_{i 12} {\dot{x}}_{m} \\ = & {\bar{P}}_{i 12} A_{m} x_{m} - ({\bar{Q}}_{i 12} - {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 12} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i 12} + {\bar{P}}_{i 12} A_{m} + A^{T} {\bar{P}}_{i 12}) x_{m} \\ + & {(\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 12} x_{m} + {\bar{P}}_{i 11} (\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 12}) x_{m} \\ = & (- A^{T} + {\bar{P}}_{i 11} S_{i} + {\bar{P}}_{i 11} S_{- i} + \sum_{p = 1, p \neq i}^{l} {\bar{P}}_{p 11} S_{p}) b_{i} - {\bar{Q}}_{i 12} x_{m} + {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} b_{p}, \end{matrix}

(41)

and

b_{i} (t_{f}) = {\bar{P}}_{i 12} (t_{f}) x_{m} (t_{f}) = {\bar{Q}}_{i 12 f} x_{m} (t_{f})

, namely, coupled differential Equations (14) and (15) are valid.

Let

c_{i} = {\bar{P}}_{i 13} x_{t}

,

i = 1, 2, . . . m

, taking the derivative of

c_{i}

, we have

\begin{matrix} {\dot{c}}_{i} & = & {\dot{\bar{P}}}_{i 13} x_{t} + {\bar{P}}_{i 13} {\dot{x}}_{t} \\ = & {\bar{P}}_{i 13} A_{t} x_{t} - ({\bar{Q}}_{i 13} - {\bar{P}}_{i 11} S_{i} {\bar{P}}_{i 13} - {\bar{P}}_{i 11} S_{- i} {\bar{P}}_{i 13} + {\bar{P}}_{i 13} A_{m} + A^{T} {\bar{P}}_{i 13}) x_{m} \\ + & {(\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 11})}^{T} {\bar{P}}_{i 13} x_{t} + {\bar{P}}_{i 11} (\sum_{p = 1, p \neq i}^{l} S_{p} {\bar{P}}_{p 13}) x_{m} \\ = & (- A^{T} + {\bar{P}}_{i 11} S_{i} + {\bar{P}}_{i 11} S_{- i} + \sum_{p = 1, p \neq i}^{l} {\bar{P}}_{p 11} S_{p}) c_{i} - {\bar{Q}}_{i 13} x_{t} + {\bar{P}}_{i 11} \sum_{p = 1, p \neq i}^{l} S_{p} c_{p}, \end{matrix}

(42)

and

c_{i} (t_{f}) = {\bar{P}}_{i 13} (t_{f}) x_{t} (t_{f}) = {\bar{Q}}_{i 13 f} x_{t} (t_{f})

, therefore, coupled differential Equations (16) and (17) hold. □

Remark 2.

Once given the initial state of each agent, communication topology, and weight coefficients, we can calculate

{\bar{P}}_{i 11}

,

b_{i}

,

c_{i}

at different times by performing backward iterations on the terminal values of the coupled differential equations. The strategy

u_{i}^{*}

can be obtained by applying the equations’ solutions to (11). Furthermore, we can solve the trajectory of each agent through the state Equation (1).

Remark 3.

The theorem in this paper can also be extended to the case where the target is stationary. Without loss of generality, supposed that the target

x_{t}

is at the origin, namely,

x_{t} = {[0, . . ., 0]}^{T} \in R^{2 n}

, then, the last term

- R_{i}^{- 1} B_{i}^{T} {\bar{P}}_{i 13} x_{t}

in the control input (11) is equal to 0. Therefore, strategy

u_{i}^{*}

only includes the the first two items. On the other hand,

x_{t}

is a constant value if the target

x_{t}

is not at the origin, so we can solve strategically similarly.

Remark 4.

It can be seen from Theorem 1 that solving the strategy of each agent does not need the specific form of state equation of the manned platform and the target. We provided

A_{m}

and

A_{t}

in the previous section only for the convenience of the system’s expression. In fact, models of the manned platform and the target often cannot be obtained in practical scenarios, we can only observe its states through sensors. Therefore, the strategic solving method only relies on the states

x_{m}

and

x_{t}

in this paper. This assumption also corresponds to the situation where the multi-agent system cooperates with the manned platform to perform tasks in the actual manned–unmanned environment.

3.2. Simulation

3.2.1. Simulations under Different Target Trajectories

This subsection shows the results of the simulation examples. Assume there is a team of three agents (

l = 3

) and the dimension of the state vectors for all agents is 2 (e.g.,

n = 2

). Each agent adopts a double integral dynamic model. In order to facilitate computation,

R_{i}

,

R_{- i}

in cost functions are both taken as identity matrices corresponding to their respective dimensions. Consider the target

x_{t}

and the manned platform

x_{m}

, assume that agent 1 and agent 2 can obtain the target’s information while agent 3 cannot observe the target. The communication graph among agents is shown in Figure 1. The corresponding incidence matrix is

\begin{matrix} D = [\begin{matrix} - 1 & 1 & 0 & 1 \\ 1 & - 1 & - 1 & 0 \\ 0 & 0 & 1 & - 1 \end{matrix}] . \end{matrix}

(1): A Target Moving along a Fixed Trajectory

Firstly, assume that the target moves in a straight line, and the state of the target is

x_{t} = {[0.5 t, 5, 1, 0]}^{T}

. Assume that the manned platform pursues the target with the certain strategy by itself. The agents make decisions based on the state of the target and the manned platform. Let the capture radius to be 0.2 m, sample time to be 0.05 s, and the terminal time

t_{f}

= 10 s. The weight coefficients are

\begin{matrix} \begin{matrix} α_{i} & = 1, \\ {\bar{τ}}_{11} & = 1, {\bar{τ}}_{12} = 10, {\bar{τ}}_{13} = 5, {\bar{τ}}_{11 f} = 1, {\bar{τ}}_{12 f} = 10, {\bar{τ}}_{13 f} = 5, \\ {\bar{τ}}_{21} & = 2, {\bar{τ}}_{22} = 9, {\bar{τ}}_{23} = 6, {\bar{τ}}_{21 f} = 2, {\bar{τ}}_{22 f} = 9, {\bar{τ}}_{23 f} = 6, \\ {\bar{τ}}_{31} & = 3, {\bar{τ}}_{32} = 12, {\bar{τ}}_{33} = 0, {\bar{τ}}_{31 f} = 3, {\bar{τ}}_{32 f} = 12, {\bar{τ}}_{33 f} = 0 . \end{matrix} \end{matrix}

Notice that agent 3 cannot observe the target, therefore

{\bar{τ}}_{33} = 0

and

{\bar{τ}}_{33 f} = 0

. Set the initial state of each agent

x_{1} (0) = {[2, 0.2, 0, 0]}^{T}

,

x_{2} (0) = {[4.7, - 1, 0, 0]}^{T}

,

x_{3} (0) = {[0.2, 5.5, 0, 0]}^{T}

. According to the strategies

u_{i}^{*}

in Theorem 1, the motion trajectories of each agent can be obtained (as shown in the Figure 2a). It can be seen that the manned platform and agents have captured the target successfully. Figure 2b shows the relative distance between the target and each agent. At the terminal time

t_{f}

, the distances between each agent and the target are 0.0995 m, 0.1077 m, and 0.1012 m, which are all smaller than the capture radius.

Secondly, assume that the target moves along a sine trajectory, and the state of the target is

x_{t} = {[t, s i n (0.8 t), 1, 0.8 c o s (0.8 t)]}^{T}

. Assume that the manned platform pursues the target with the certain strategy. The weight coefficients are

\begin{matrix} \begin{matrix} α_{i} & = 1, \\ {\bar{τ}}_{11} & = 1, {\bar{τ}}_{12} = 10, {\bar{τ}}_{13} = 5, {\bar{τ}}_{11 f} = 1, {\bar{τ}}_{12 f} = 10, {\bar{τ}}_{13 f} = 5, \\ {\bar{τ}}_{21} & = 0.1, {\bar{τ}}_{22} = 9, {\bar{τ}}_{23} = 6, {\bar{τ}}_{21 f} = 1, {\bar{τ}}_{22 f} = 9, {\bar{τ}}_{23 f} = 6, \\ {\bar{τ}}_{31} & = 3, {\bar{τ}}_{32} = 12, {\bar{τ}}_{33} = 0, {\bar{τ}}_{31 f} = 3, {\bar{τ}}_{32 f} = 12, {\bar{τ}}_{33 f} = 0 . \end{matrix} \end{matrix}

Set the initial state of each agent

x_{1} (0) = {[1, - 2, 0, 0]}^{T}

,

x_{2} (0) = {[- 1, 2, 0, 0]}^{T}

,

x_{3} (0) = {[0.2, 5.5, 0, 0]}^{T}

. The simulation results are shown in Figure 3a,b. The distances between each agent and the target are 0.1174 m, 0.1058 m, and 0.1556 m, and the target was captured successfully.

Thirdly, assume that the target moves along a straight line, and the state of the target is

x_{t} = {[0.5 t, 5, 0, 0]}^{T}

. Assume that the manned platform pursues the target at a faster speed on the same straight line. The weight coefficients are

\begin{matrix} \begin{matrix} α_{1} & = 0.99, α_{2} = 1.11, α_{3} = 0.005 \\ {\bar{τ}}_{11} & = 1, {\bar{τ}}_{12} = 0.99, {\bar{τ}}_{13} = 21, {\bar{τ}}_{11 f} = 1, {\bar{τ}}_{12 f} = 0.99, {\bar{τ}}_{13 f} = 21, \\ {\bar{τ}}_{21} & = 0.5, {\bar{τ}}_{22} = 0, {\bar{τ}}_{23} = 22, {\bar{τ}}_{21 f} = 0.5, {\bar{τ}}_{22 f} = 0, {\bar{τ}}_{23 f} = 22, \\ {\bar{τ}}_{31} & = 10, {\bar{τ}}_{32} = 7.35, {\bar{τ}}_{33} = 0, {\bar{τ}}_{31 f} = 10, {\bar{τ}}_{32 f} = 7.35, {\bar{τ}}_{33 f} = 0 . \end{matrix} \end{matrix}

Set the initial state of each agent

x_{1} (0) = {[- 1.5, 6.5, 0, 0]}^{T}

,

x_{2} (0) = {[- 0.5, 2, 0, 0]}^{T}

,

x_{3} (0) = {[1.2, - 1, 0, 0]}^{T}

. The simulation results are shown in Figure 4a,b. The distances between each agent and the target are 0.1993 m, 0.0251 m, and 0.0965 m, and the target was captured successfully.

(2): Stationary Target

Assume that the target is stationary (

x_{t} = {[0, 0, 0, 0]}^{T}

). The manned platform pursues the target with the certain strategy. The weight coefficients are

\begin{matrix} \begin{matrix} α_{i} & = 1, \\ {\bar{τ}}_{11} & = 1, {\bar{τ}}_{12} = 10, {\bar{τ}}_{13} = 5, {\bar{τ}}_{11 f} = 1, {\bar{τ}}_{12 f} = 10, {\bar{τ}}_{13 f} = 5, \\ {\bar{τ}}_{21} & = 0.1, {\bar{τ}}_{22} = 9, {\bar{τ}}_{23} = 6, {\bar{τ}}_{21 f} = 1, {\bar{τ}}_{22 f} = 9, {\bar{τ}}_{23 f} = 6, \\ {\bar{τ}}_{31} & = 3, {\bar{τ}}_{32} = 12, {\bar{τ}}_{33} = 0, {\bar{τ}}_{31 f} = 3, {\bar{τ}}_{32 f} = 12, {\bar{τ}}_{33 f} = 0 . \end{matrix} \end{matrix}

Set the initial state of each agent

x_{1} (0) = {[2, 3.2, 0, 0]}^{T}

,

x_{2} (0) = {[4.7, - 2, 0, 0]}^{T}

,

x_{3} (0) = {[- 3, 5.5, 0, 0]}^{T}

. The simulation results are shown in Figure 5a,b. The distances between each agent and the target are

0.0682 \times 10^{- 3}

m,

0.1479 \times 10^{- 3}

m, and

0.2865 \times 10^{- 3}

m, and the target was captured successfully.

3.2.2. The Influence of Different Weight Coefficients on Strategies

Furthermore, consider the case of 5 agents (

l = 5

), where the corresponding incidence matrix is as follows, assuming that agent 3 and agent 5 cannot observe the target (

{\bar{τ}}_{33} = 0

,

{\bar{τ}}_{53} = 0

). In the initial state, set the weights

{\bar{τ}}_{i 1} = 1

,

{\bar{τ}}_{i 2} = 1

,

{\bar{τ}}_{i 3} = 1 (i \neq 3, 5)

. We adjust the weight coefficients

{\bar{τ}}_{i 1}

,

{\bar{τ}}_{i 2}

,

{\bar{τ}}_{i 13}

separately to study the impact of different weights on simulation results, then obtain the trajectory (Figure 6a–f) of each agent and distance (Figure 7a–f) between each agent and the target.

\begin{matrix} D = [\begin{matrix} - 1 & - 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & - 1 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & - 1 & - 1 & - 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 & - 1 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & - 1 \end{matrix}] . \end{matrix}

From the Figure 6a–c and Figure 7a–c, it can be seen that the larger the weight coefficient

{\bar{τ}}_{i 1}

, the higher the degree of cohesiveness of agents, and agents tend to shorten their distance from their neighbors. From the Figure 6a,d,e and Figure 7a,d,e, it can be seen that increasing the weight coefficient

{\bar{τ}}_{i 2}

means that each agent places more emphasis on cooperating with the manned platform, leading to a tendency for agents to narrow their distance from the manned platform in the decision-making process. The entire manned–unmanned system reaches the cohesive state in a shorter time. From the Figure 6a,f and Figure 7a,f, it can be seen that increasing the weight coefficient

{\bar{τ}}_{i 3}

makes each agent more inclined to pursue the target and reduces the degree of cohesiveness of the team. Due to the inability of agents 3 and 5 to observe the target, their trajectories’ changes are relatively small, while agents 1, 2, and 4 clearly maintained a close distance from the target at an earlier time.

3.2.3. Comparison with the Target-Pursuit Problem in the Multi-Agent System

In the known studies on target-pursuit problems, most of them only consider the case of multi-agent systems, without considering the coordination of the manned platform. Therefore, we compare such a simulation with this paper; the simulation results of the multi-agent system pursuing the target are shown in the Figure 8. By comparing with the Figure 6 and Figure 7, it can be seen that each agent tends to maintain team cohesion while pursuing the target without the trajectory of a manned platform. In manned–unmanned scenarios, the trajectory of the manned platform is equivalent to the reference trajectory of various agents. Agents that cannot observe the target also have motion references, so the manned–unmanned team tends to converge faster. Relatively speaking, in scenarios without a manned platform, it takes longer for the team to reach the cohesive state. Similarly, agents also need to spend more time reaching the target’s capture distance.

4. Discussion

In practical scenarios, the manned platform makes driving decisions using drivers. The strategies of drivers are complex and variable, and the numerical examples in this paper adopt different simulated trajectories. However, the multi-agent system makes decisions by observing the state of the manned platform

x_{t}

, so the simulation is effective in verifying Theorem 1.

The theorem in this paper can also be extended to the case where the target adopts an escape strategy. The actual scenario corresponding to this case is that the target can also observe our action states, as the strategy such as

u_{t} = - R_{t i}^{- 1} B_{t}^{T} {\bar{P}}_{t} z

is taken away from our platform. Therefore, the problem transforms into a standard pursuit–evasion game. The strategies are

u_{i}^{*} = - R_{i}^{- 1} B_{i}^{T} {\tilde{P}}_{i 11} \tilde{z} - R_{i}^{- 1} B_{i}^{T} {\tilde{b}}_{i}

according to [24] (

\tilde{z} = {[x_{1}^{T}, x_{2}^{T}, . . ., x_{l}^{T}, x_{t}^{T}]}^{T}

). At this point, iterative calculation of

{\tilde{P}}_{i 11}

requires information from

A_{t}

. If we cannot obtain the target’s model, we can use the reinforcement learning method to approximate the solution [33]. For these situations where adversarial strategies are adopted towards the target, it is more feasible to use reinforcement learning or other approximate approaches since it is difficult to obtain the target’s model. This is one of our future research directions. For future work, we will consider the following in more detail:

The environment in this paper is relatively ideal; we will consider the target-pursuit problem when obstacles exist. At the same time, we will consider the situation of multiple targets, so that the strategy can adapt to more practical scenarios.
For the situation where the target adopts an escape strategy, we will conduct further analysis, introduce reinforcement learning methods in cases where the model is difficult to obtain, and balance the relationship between computational cost and algorithm performance.
The cooperation between multi-agent systems and the manned platform is passive in this paper. In the future, we will introduce trajectory prediction methods for the manned platform, so that agents can make active decisions based on a certain degree of predictive information, thereby improving the efficiency of team task execution.

5. Conclusions

In this paper, we consider a manned–unmanned cooperative decision-making problem in a target-pursuit case. For the target, the manned platform and the multi-agent adopt a hierarchical decision-making approach. The manned platform makes decisions first, followed by agents observing the state of the manned platform and making decisions with interacting information with neighbors. The target-pursuit problem is formulated as a linear quadratic differential game through the directed communication graph. The strategies that be given by solving a system of coupled differential equations iteratively. To make decisions in different pursuit scenarios, we have explored the feasibility of strategies that can adapt to multiple types of targets. We have demonstrated the feasibility of the proposed method through simulation experiments. From the simulation results, the following can be seen: (1) The strategies can enable agents to cooperate with the manned platform to capture a target successfully. (2) The strategies can achieve the pursuit of both the static target and the dynamic target. (3) Different weight coefficients have impacts on algorithm convergence speed, team cohesiveness degree, and target-pursuit time. (4) Compared to the pure unmanned system performing target-pursuit tasks, the manned platform trajectory provides a certain reference for agents, enabling the manned–unmanned collaborative system to complete tasks more efficiently.

However, this approach is proposed in a relatively ideal environment. Additionally, the simulations did not consider more complex platform kinematics models and dynamics models. Therefore, the proposed approach needs to be further experimentally validated in different manned–unmanned collaborative systems, while gradually increasing the complexity of the scenario to ensure the robustness of these strategies.

Author Contributions

Conceptualization, L.H. and W.S.; methodology, L.H. and W.S.; software, L.H. and X.Y.; validation, T.Y., Z.T. and X.A.; formal analysis, L.H. and W.S.; investigation, T.Y. and Z.T.; resources, X.Y.; writing—original draft preparation, L.H.; visualization, L.H.; project administration, L.H. and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, X.; Su, H.; Wang, X.; Chen, G. An Overview of Coordinated Control for Multi-agent Systems Subject to Input Saturation. Perspect. Sci. 2016, 7, 133–139. [Google Scholar] [CrossRef]
Ge, S.; Liu, X.; Goh, C.; Xu, L. Formation Tracking Control of Multiagents in Constrained Space. IEEE Trans. Control Syst. Technol. 2016, 24, 992–1003. [Google Scholar] [CrossRef]
Wu, H.; Karkoub, M.; Hwang, C. Mixed Fuzzy Slidingmode Tracking with Backstepping Formation control for Multinonholonomic Mobile Robots subject to Uncertainties. J. Intell. Robot. Syst. 2015, 79, 73–86. [Google Scholar] [CrossRef]
Kawa, A.; Pawlewski, P.; Golinska, P.; Hajdul, M. Cooperative Purchasing of Logistics Services among Manufacturing Companies Based on Semantic Web and Multi-agent System. In Proceedings of the International Conference on Practical Applications of Agents and Multiagent Systems, Salamanca, Spain, 26–28 April 2010. [Google Scholar]
Zuiani, F.; Vasile, M. Multi-agent Collaborative Search with Tchebycheff Decomposition and Monotonic basin Hopping Steps. In Bioinspired Optimization Methods & Their Applications; BIOMA: Bohinj, Slovenia, 2012. [Google Scholar]
Marden, J.; Shamma, J. Game Theory and Distributed Control. In Handbook of Game Theory with Economic Applications; Elsevier: Amsterdam, The Netherlands, 2015; Volume 4, pp. 861–899. [Google Scholar]
Barreiro-Gomez, J. Distributed Formation Control using Population Games. In The Role of Population Games in the Design of Optimization-Based Controllers; Springer: Berlin/Heidelberg, Germany, 2019; pp. 97–110. [Google Scholar]
Qin, J.; Ban, X.; Li, X. Evolutionary Dynamics of Multiagent Formation. In Proceedings of the Chinese Control and Decision Conference, Guilin, China, 17–19 June 2009; pp. 3557–3561. [Google Scholar]
Mcgrew, T.M. Army Aviation Addressing Battlefield Anomalies in Real Time with the Teaming and Collaboration of Manned and Unmanned Aircraft. Geochem. Geophys. Geosyst. 2009. [Google Scholar] [CrossRef]
Dubois, T.; Host, C.; Butt, J.; Clemens, J.A.; Blanton, B. Manned/unmanned Collaborative Mission Concepts with Tiltrotor Aircraft in Support of Maritime Patrol and Search-and-rescue Operations. In Proceedings of the Australian International Aerospace Congress, Melbourne, Australia, 25–28 February 2013. [Google Scholar]
Das, A.; Kol, P.; Lundberg, C.; Doelling, K.; Sevil, H.; Lewis, F. A Rapid Situational Awareness Development Framework for Heterogeneous Manned-Unmanned Teams. In Proceedings of the National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 417–424. [Google Scholar]
Zhou, H.; Wang, X.; Shan, Y.-Z.; Zhao, Y.; Cui, N. Synergistic Path Planning for Multiple Vehicles based on an Iimproved Particle Swarm Optimization Method. Acta Autom. Sin. 2020, 46, 2670–2676. [Google Scholar]
Garcia, E.; Casbeer, D.; Moll, A.; Pachter, M. Multiple Pursuer Multiple Evader Differential Games. IEEE Trans. Autom. Control 2020, 66, 2345–2350. [Google Scholar] [CrossRef]
Nourbakhsh, I.; Sycara, K.; Koes, M.; Yong, M.; Lewis, M.; Burion, S. Human-Robot Teaming for Search and Rescue. Pervasive Comput. 2005, 4, 72–79. [Google Scholar] [CrossRef]
Benda, M.; Jagannathan, V.; Dodhiawalla, R. On Optimal Cooperation of Knowledge Sources: An Empirical Investigation; Technical Report BCS-G2010-28; The Boeing Company: Seattle, WA, USA, 1986. [Google Scholar]
Kitamura, Y.; Teranishi, K.; Tatsumi, S. Organizational Strategies for Multiagent Real-time Search. In Proceedings of the International Conference on Multi-agent Systems, Kyoto, Japan, 9–13 December 1996; pp. 150–156. [Google Scholar]
Undeger, C.; Polat, F. Multi-agent Real-time Pursuit. Auton. Agents Multi-Agent Syst. 2010, 21, 69–107. [Google Scholar] [CrossRef]
Keshmiri, S.; Payandeh, S. Toward Opportunistic Collaboration in Target Pursuit Problems. In Proceedings of the Autonomous and Intelligent Systems-Second International Conference, Burnaby, BC, Canada, 22–24 June 2011. [Google Scholar]
Pei, H.; Chen, S.; Lai, Q. Multi-target Consensus Circle Pursuit for Multi-agent Systems via a Distributed Multi-flocking Method. Int. J. Syst. Sci. 2015, 47, 3741–3748. [Google Scholar] [CrossRef]
Hamed, O.; Hamlich, M. Improvised Multi-robot Cooperation Strategy for Hunting a Dynamic Target. In Proceedings of the 2020 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Marrakech, Morocco, 25–27 November 2020; p. 168691. [Google Scholar]
Liu, K.; Zheng, X.; Lin, Y.; Han, L.; Xia, Y. Design of Optimal Strategies for the Pursuit-evasion Problem based on Differential Game. Acta Autom. Sin. 2021, 47, 15. [Google Scholar]
István, H.; Skrzypczyk, K. Robot Team Coordination for Target Tracking using Fuzzy Logic Controller in Game Theoretic Framework. Robot. Auton. Syst. 2009, 57, 75–86. [Google Scholar]
Bhattacharyya, A. Forbidden Minors for a Pursuit Game on Graphs. Sr. Proj. Spring 2013, 408. [Google Scholar]
Talebi, S.; Simaan, M.; Qu, Z. Cooperative, Non-cooperative and Greedy pursuers Strategies in Multiplayer Pursuit-evasion Games. In Proceedings of the IEEE Conference on Control Technology and Applications, Mauna Lani, HI, USA, 27–30 August 2017; pp. 2049–2056. [Google Scholar]
Talebi, S.; Simaan, M. Multi-pursuer Ppursuit-evasion Games under Parameters Uncertainty: A Monte Carlo approach. In Proceedings of the System of Systems Engineering Conference, Waikoloa, HI, USA, 18–21 June 2017; pp. 1–6. [Google Scholar]
Qu, Z.; Simaan, M. A Design of Distributed Game Strategies for Networked Agents. IFAC Proc. Vol. 2009, 42, 270–275. [Google Scholar] [CrossRef]
Liu, R.; Cai, Z. A Novel Approach based on Evolutionary Game Theoretic Model for Multi-player Pursuit Evasion. In Proceedings of the International Conference on Computer, Mechatronics, Control and Electronic Engineering, Changchun, China, 24–26 August 2010; pp. 107–110. [Google Scholar]
Fu, L.; Liu, J.; Meng, G.; Wang, D. Survey Of Manned/Unmanned Air Combat Decision Technology. In Proceedings of the Chinese Control and Decision Conference, Qingdao, China, 23–25 May 2015. [Google Scholar]
Liu, Y.; Zhang, A. Cooperative Task Assignment Method of Manned/unmanned Aerial Vehicle Formation. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Syst. Eng. Electron. 2010, 32, 584–588. [Google Scholar]
Xu, L.; Pan, X.; Wu, M. Analysis on Mmanned/unmanned Aerial Vehicle Cooperative Operation in Antisubmarine Warfare. Chin. J. Ship Res. 2018, 13, 154–159. [Google Scholar]
Liu, J.; Yuan, S.; Qi, Y.; Ye, W. Research on the Key Technology of the Cooperative Combat System of Manned Vehicle and Unmanned Aerial Vehicle. Ship Electron. Eng. 2012, 32, 1–3. [Google Scholar]
Engwerda, J. LQ Dynamic Optimization and Differential Games; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Li, M.; Qin, J.; Ma, Q.; Zheng, W.; Kang, Y. Hierarchical Optimal Synchronization for Linear Systems via Reinforcement Learning: A Stackelberg-Nash Game Perspective. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 1600–1611. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The communication graph.

Figure 2. The simulation of the target moving in a straight line and being pursued by a manned platform with the certain strategy. (a) Trajectories of agents and the manned platform. (b) Distances between the target and each agent.

Figure 3. Simulation of the target moving along a sine trajectory and being pursued by a manned platform with the certain strategy. (a) Trajectories of agents and the manned platform. (b) Distances between the target and each agent.

Figure 4. Simulation of the target moving along a straight line and being pursued by a manned platform at a faster speed on the same straight line. (a) Trajectories of agents and the manned platform. (b) Distances between the target and each agent.

Figure 5. Simulation of the target is stationary and being pursued by a manned platform with the certain strategy. (a) Trajectories of agents and the manned platform. (b) Distances between the target and each agent.

Figure 6. The trajectories of agents under different weight coefficients.

Figure 7. The distances between the target and each agent under different weight coefficients.

Figure 8. Simulation of the multi-agent system pursuing the target. (a) Trajectories of agents. (b) Distances between the target and each agent.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, L.; Song, W.; Yang, T.; Tian, Z.; Yu, X.; An, X. Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment. Electronics 2023, 12, 3630. https://doi.org/10.3390/electronics12173630

AMA Style

Han L, Song W, Yang T, Tian Z, Yu X, An X. Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment. Electronics. 2023; 12(17):3630. https://doi.org/10.3390/electronics12173630

Chicago/Turabian Style

Han, Le, Weilong Song, Tingting Yang, Zeyu Tian, Xuewei Yu, and Xuyang An. 2023. "Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment" Electronics 12, no. 17: 3630. https://doi.org/10.3390/electronics12173630

APA Style

Han, L., Song, W., Yang, T., Tian, Z., Yu, X., & An, X. (2023). Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment. Electronics, 12(17), 3630. https://doi.org/10.3390/electronics12173630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cooperative Decisions of a Multi-Agent System for the Target-Pursuit Problem in Manned–Unmanned Environment

Abstract

1. Introduction

2. Problem Formulation

2.1. Communication Graph

2.2. Game Model of Target-Pursuit Problem

3. Results

3.1. Solution to the Target-Pursuit Problem

3.2. Simulation

3.2.1. Simulations under Different Target Trajectories

3.2.2. The Influence of Different Weight Coefficients on Strategies

3.2.3. Comparison with the Target-Pursuit Problem in the Multi-Agent System

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI