A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism

Cui, Siyuan; Li, Hao; Fan, Xiangyu; Ni, Lei; Hou, Jiahang

doi:10.3390/drones9080592

Open AccessArticle

A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism

by

Siyuan Cui

¹

,

Hao Li

^1,*,

Xiangyu Fan

²

,

Lei Ni

^1,*

and

Jiahang Hou

¹

Early Warning Academy, Wuhan 430010, China

²

Department of Bomber and Transport Aircraft Pilots Conversion, Air Force Harbin Flying College, Harbin 150088, China

^*

Authors to whom correspondence should be addressed.

Drones 2025, 9(8), 592; https://doi.org/10.3390/drones9080592

Submission received: 8 July 2025 / Revised: 13 August 2025 / Accepted: 20 August 2025 / Published: 21 August 2025

Download

Browse Figures

Versions Notes

Abstract

This paper addresses the core issues of slow coverage rate growth and high repeated detection rates in multi-UAV cooperative search operations within unknown areas. A distributed cooperative search algorithm based on the maximum entropy mechanism is proposed to resolve these challenges. It innovatively integrates the entropy gradient decision framework with DMPC-OODA (Distributed Model Predictive Control-Observe, Orient, Decide, Act) rolling optimization: environmental uncertainty is quantified through an exponential decay entropy model to drive UAVs to migrate toward high-entropy regions; element-wise product operations are employed to efficiently update environmental maps; and a dynamic weight function is designed to adaptively adjust the weights of coverage gain and entropy gain, thereby balancing “rapid coverage” and “accurate exploration”. Through multiple independent repeated experiments, the algorithm demonstrates significant improvements in coverage efficiency—by 6.95%, 12.22%, and 59.49%, respectively—compared with the Search Intent Interaction (SII) mode, non-entropy mode, and random mode, which effectively enhances resource utilization.

Keywords:

multi-UAV; maximum entropy mechanism; cooperative search; distributed model predictive control

1. Introduction

With the continuous theoretical innovations and expanding applications of unmanned aerial vehicle (UAV) technology, it has become one of the most transformative technological achievements of the 21st century [1,2,3,4]. UAVs, by virtue of their high maneuverability and low cost, have deeply penetrated multiple fields such as military, civil, and commercial sectors [5,6,7]. However, as the application demands for UAVs in various fields continue to deepen and mission environments become increasingly complex and variable, the limitations of a single UAV can no longer handle some tasks. In practical applications, it has been replaced by the cooperative operation mode of UAV swarms, which can break through the performance bottlenecks of a single UAV, and this mode has become a key direction in the development of UAV technology. Among the numerous application scenarios of UAV swarm operations, cooperative search missions have become an important field to test the maturity of UAV swarm cooperative operation technology due to their requirements for timeliness of environmental perception and efficiency of mission execution. Such missions are also widely applied in fields such as post-earthquake investigation, real-time monitoring of forest fires, and information gathering in military missions [8,9,10,11,12,13,14].

In the related fields of cooperative search, scholars at home and abroad have conducted extensive research and achieved abundant results. Existing research methods are mainly divided into two major categories: centralized and distributed cooperative search, based on different task architectures and decision-making methods. Centralized cooperative search relies on a central node for global decision-making, with representative methods including formation search [15,16] and region division methods [17,18,19]. Although centralized control methods exhibit good performance in simple mission environments or with small-scale UAVs, in complex environments or with large-scale UAVs, they can no longer be well adapted to the multi-UAV cooperative search problem due to the surge in computational demands.

Distributed cooperative search, with its high architectural flexibility and strong fault tolerance, has gradually become the mainstream, with main methods including distributed model prediction and distributed search graphs. In terms of distributed model prediction, Refs. [20,21] realized the optimal decision-making of a single UAV through distributed model predictive control (DMPC) combined with genetic algorithm (GA). By predicting the flight dynamics of other UAVs to avoid collisions, they could complete trajectory planning and cooperative search tasks for multiple UAVs in complex mission environments, effectively reducing the solution scale of the UAV system. In Ref. [13], considering the impact of limited communication and electromagnetic interference on battlefield situation awareness, the authors divided the mission phase into normal communication and impaired communication. This method uses the Voronoi diagram to generate real-time search areas for all UAVs, and combines UAV search control decision planning to improve the search efficiency and safety within the search area.

In terms of distributed search graphs, the authors in [22] proposed a distributed cooperative search algorithm based on the pheromone decision-making mechanism. Considering UAV communication constraints, this method mapped the environmental map to a pheromone map to establish a mission environment model. UAVs updated the pheromone map during the search process, realized pheromone map fusion in the mission area through the communication network, and guided UAVs to carry out the next search actions based on the grid pheromone concentration of local and global information. This method could achieve effective coverage search in the mission area. In Ref. [23], the authors constructed a distributed UAV area coverage model continuously evolving based on optimal solutions to achieve optimal deployment. This model used the central Voronoi diagram to decompose the mission area, and realized the probability map information update and fusion mechanism in different scenarios by defining the sensor performance function and distribution density function, which enhanced the anti-interference capability of UAVs against environmental changes and improved the search model.

In recent years, although novel search strategies based on reinforcement learning [24,25,26] and deep reinforcement learning [27,28] have emerged, they are constrained by inherent limitations. These include a strong dependence on large-scale simulation data, difficulties in quantifying environmental uncertainty caused by the black-box nature of the strategies, and high repeated detection rates resulting from weak dynamic adaptation capabilities between communication topology and cooperative strategies in large-scale swarm scenarios. As a result, they face transferability challenges and real-time computing bottlenecks in unknown environments without prior information.

Although the above research has made certain progress in the cooperative search technology of multiple UAVs, there are still problems of low coverage rate and high repeated detection rate [29,30]. To address this issue, this paper proposes a distributed cooperative search algorithm for multiple UAVs based on the maximum entropy mechanism, which innovatively combines the entropy decay model with the DMPC-OODA decision loop. It drives UAVs to migrate to high-uncertainty regions through entropy gradient and designs a dynamic weight multi-objective function to achieve the dynamic balance of “rapid coverage—precise exploration”, so as to improve the problems of low coverage rate and high repeated detection rate in the above cooperative search methods. The main contributions of this paper are as follows:

(1): A search mechanism based on dynamic entropy guides UAVs to explore unknown areas. An entropy matrix is constructed to quantify the exploration value of the mission area, and a dynamic entropy model is established in the environment update function: the entropy value of explored areas decays exponentially, while unexplored areas maintain high entropy values. By using entropy to reflect regional uncertainty, UAVs are guided to high-entropy regions, enabling efficient multi-UAV search.
(2): A cooperative framework based on DMPC is combined with a receding horizon optimization model to quickly implement the Observe-Orient-Decide-Act (OODA) closed-loop mechanism. Through a dynamic prediction model, each UAV solves for the optimal heading angle based on existing environmental information, realizing local information interaction in the Observe phase, search direction plan adjustment in the Orient phase, decision selection in the Decide phase, and final action execution in the Act phase.
(3): A multi-objective optimization function based on dynamic weights was designed to address the issues of multi-machine collaborative obstacle avoidance and coverage balancing. Through adaptive weight adjustment, a dynamic balance is achieved between rapid coverage in the initial stage and fine search in the later stage, and Adaptive Differential Evolution (ADE) is used to solve the optimal heading angle sequence in real time, improving search intelligence.

2. System Model

2.1. Problem Description

Multi-UAV cooperative search problems are mainly divided into two categories: the first type relies on known environmental information and uses prior data to quickly achieve key target positioning or focused area monitoring; the second type targets unknown environments, where full-area coverage is achieved through real-time information interaction within the mission area [31,32,33]. This paper focuses on the second type of problem, where multiple UAVs equipped with communication devices and optical sensors conduct searches in the mission area, as shown in Figure 1. The blue circle in the figure represents the area that the drone can search and cover at its current position, and the black dashed line represents the drone’s previous flight path.

In complex operational scenarios lacking prior environmental information, after entering unknown areas, multi-UAVs need to collect environmental information using their onboard optical sensing equipment and complete real-time fusion and dynamic updates of multi-source data through a communication network, providing reliable data support and decision-making basis for subsequent operational actions. Therefore, this paper focuses on exploring how to construct an efficient online area coverage planning method to enable multi-UAVs to complete the coverage task of the target area in the shortest possible time.

2.2. UAV Mission Space Model

During the execution of search tasks, UAVs search and cover the mission area using optical sensors from high altitude. To simplify the analysis, the mission area is modeled as a two-dimensional map. Suppose the mission area

D

is a

L_{x} \times L_{y}

rectangular region, which is discretized into

M \times N

equal-area grids at a fixed interval of

∆ d

. A Cartesian coordinate system is established to digitally represent any grid position in the mission area:

p_{i, j} = \{(x, y) |x = ⌈\frac{L_{x}}{∆ d}⌉ \in [1, M], y = ⌈\frac{L_{y}}{∆ d}⌉ \in [1, N]\}

(1)

where ⌈ ⌉ denotes rounding up. The coverage state of each grid is quantified by assigning binary values to its state: when a grid is within the UAV’s search range, its state value is 0, marked as covered; when a grid is not included in the detection range, its state value is 1, marked as uncovered. To realize the digital mapping of the coverage state of the mission area, the spatial distribution of the above grid states is structurally stored in the environment matrix

E

. The grid state expression and environment matrix are as follows:

\begin{matrix} p_{i, j} (t) = \{\begin{matrix} 0 & (i, j) \in D_{c} (t) \\ 1 & (i, j) \in D_{n c} (t) \end{matrix} \\ E (t) = {[p_{i, j} (t)]}_{M \times N} \end{matrix}

(2)

where

p_{i, j} (t)

represents the coverage state of grid (i,j) at time t,

D_{c} (t)

represents the set of detected grids, and

D_{n c} (t)

represents the set of undetected grids. At the initial stage of the mission, i.e., t = 0, the initial state value of each grid in the environment matrix is 1, indicating that the entire mission area is uncovered, as shown in Figure 2. As the search task progresses, the coverage distribution map will be updated in real time according to the value of

p_{i, j} (t)

. The updated environment matrix and situation information are shared through the communication network, supporting each UAV to make real-time decisions based on the global coverage state, so as to improve the cooperative search efficiency of multiple UAVs.

2.3. UAV Model

During the execution of search tasks, the selection of flight altitude involves multiple performance indicators: a higher flight altitude will increase air resistance and shorten endurance time, while also reducing the spatial resolution of search images due to the characteristics of optical imaging systems; a lower flight altitude will increase the risk of UAVs being shot down. To reduce system complexity and focus on core research issues, this paper reasonably simplifies the UAV motion model: it is assumed that UAVs fly at a constant altitude above the mission area and are regarded as mass points moving at a constant speed in a two-dimensional space [34,35]. The corresponding state equation is as follows:

[\begin{matrix} x_{i} (t + 1) \\ y_{i} (t + 1) \\ ϕ_{i} (t + 1) \end{matrix}] = [\begin{matrix} x_{i} (t) \\ y_{i} (t) \\ ϕ_{i} (t) \end{matrix}] + [\begin{matrix} v ∆ t \cos ϕ_{i} (t) \\ v ∆ t \sin ϕ_{i} (t) \\ ∆ ϕ_{i} (t) \end{matrix}]

(3)

where

[x_{i} (t), y_{i} (t)]

is the position of

{UAV}_{i}

in the mission area at time t;

ϕ_{i}

is the heading angle, the heading deflection angle

∆ ϕ_{i}

is the control input, satisfying

∆ ϕ_{i} \in [- ϕ_{\max}, ϕ_{\max}]

, and

ϕ_{\max}

is the maximum turning angle limited by maneuvering performance;

v

is the level flight speed of the UAV; and

∆ t

is the time step. Denote the state variable of

{UAV}_{i}

at time t as

ω_{i} (t) = {[x_{i} (t), y_{i} (t), ϕ_{i} (t)]}^{T}

, then the state variable of

{UAV}_{i}

at time t + 1 is as follows:

ω_{i} (t + 1) = f (ω_{i}, ∆ ϕ_{i} (t))

(4)

where

f (\cdot)

is the state transition function, determined by Formula (3). Then the predicted state model of

{UAV}_{i}

at time t + j in this system is as follows:

\begin{matrix} {\tilde{ω}}_{i} (t + j | t) = f ({\tilde{ω}}_{i} (t + j - 1 | t), ∆ ϕ_{i} (t + j - 1 | t)) \\ i \in {1, 2, \dots, n}; j \in {1, 2, \dots, T} \end{matrix}

(5)

where n is the number of UAVs, and T is the prediction period. By optimizing the control input of this prediction model, UAVs are guided to unexplored areas, so as to improve the overall search coverage efficiency of the system.

The visualization of this prediction model is shown in Figure 3. The blue circle represents the sensor detection range with a radius R, centered at the position of

{UAV}_{i}

. As shown in Figure 4, a circumscribed square is formed in this circular area and gridded into

K \times K

grids. Each grid is assigned a value

ρ_{a, b}

to represent the grid detection state. Combined with Formula (2), the expression of the detection matrix

S_{K \times K}

for grid states

ρ_{a, b}

and

{UAV}_{i}

is as follows:

ρ_{a, b} = \{\begin{matrix} 0 & (a, b) \in Ω_{c} \\ 1 & (a, b) \in Ω_{n c} \end{matrix} S_{K \times K} = [\begin{matrix} ρ_{K 1} & \dots & ρ_{K K} \\ ⋮ & ⋮ \\ ρ_{11} & \dots & ρ_{1 K} \end{matrix}] = [\begin{matrix} 0 & \dots & 0 \\ ⋮ & ⋮ \\ 0 & \dots & 0 \end{matrix}]

(6)

where

Ω_{c}

represents the sensor detectable area (see the blue area in Figure 4), and

Ω_{n c}

represents the area outside the sensor detection range (see the red area in Figure 4).

2.4. Maximum Entropy Model

To address the limitations of traditional exhaustive search strategies, a maximum entropy mechanism is introduced as a guide to construct an environmental characterization model [36,37,38,39]. This model is based on the concept of entropy in information theory, quantifying the uncertainty of the mission area. As a key indicator for measuring uncertainty, entropy is positively correlated with the degree of uncertainty. In this paper, an environmental entropy field is constructed to map UAV perception information, assigning maximum entropy values to unexplored areas. As the cooperative detection mission of multiple UAVs proceeds, the entropy value of the mission area exhibits a gradient decay characteristic with the number of detections. Its mathematical essence is manifested in the construction of a two-dimensional discrete entropy matrix:

H (t) = {[\begin{matrix} h_{i, j} (t) \end{matrix}]}_{M \times N}

(7)

where

h_{i, j} [t] \in [0, 1]

represents the information entropy value of grid (i,j) at time t. In the initial state, the uncertainty of the unexplored area is the largest, set as

h_{i, j} [0] = 1

. The entropy value of grids in the mission area follows the detection decay rule. When

{UAV}_{i}

detects grid (i,j) at time t, the entropy value of this grid decays exponentially:

h_{i, j} (t + 1) = h_{i, j} (t) \cdot e^{- α \cdot N_{k}}

(8)

where

α

is the entropy gain coefficient, and

N_{k}

is the number of detections for this grid. This nonlinear decay characteristic ensures that the uncertainty of detected areas is significantly reduced, thereby enabling the entropy value to converge rapidly, as shown in Figure 5. Compared with traditional methods, the maximum entropy model can drive UAVs to explore high-entropy areas based on uncertainty quantification results by constructing a gradient entropy search potential field.

3. Cooperative Decision-Making Optimization Framework Based on Maximum Entropy Mechanism

3.1. Update and Fusion of Environmental Maps

A reasonable dynamic environment characterization model enables multiple UAVs to efficiently complete cooperative search tasks. In traditional search modes, environmental maps are updated by judging whether each grid has been searched one by one and assigning values accordingly. To avoid the surge in computation caused by traversal judgment and individual assignment, this paper adopts Element-wise Multiplication [40,41,42] to update grids in the environmental map, optimizing the operation process of environmental map update, significantly reducing the computation load of environmental update, and improving the execution efficiency of multi-UAV cooperative search tasks.

An environmental submatrix

E_{i}^{x_{0}, y_{0}} (t)

with dimension

K \times K

, centered at

{UAV}_{i}

, is extracted from the environmental matrix

E (t)

. Since this environmental submatrix has the same dimension as the detection matrix

S_{K \times K}

, element-wise multiplication of the two can be performed to update the environmental submatrix, as shown in Formula (9):

E_{i}^{x_{0}, y_{0}} (t + 1) = E_{i}^{x_{0}, y_{0}} (t) ⊙ S_{K \times K}

(9)

where “

⊙

” denotes the element-wise multiplication symbol. Update the corresponding grid cells within the UAV’s search range in environmental submatrix

E_{i}^{x_{0}, y_{0}} (t)

, and replace the corresponding sub-block of environmental matrix

E (t + 1)

with the updated environmental submatrix

E_{i}^{x_{0}, y_{0}} (t + 1)

to realize the update of the environmental map, as shown in Figure 6:

After each UAV completes the update of its local environmental map, the environmental maps are broadcast through the network to achieve fusion and sharing. At time t, the local environmental map of

{UAV}_{i}

is

E_{i} (t)

, and the environmental maps received from other UAVs are

E_{j \neq i} (t)

. Element-wise multiplication is used again to realize the fusion of environmental map information among UAVs, as shown in Formula (10):

E_{i} (t) = E_{i} (t) ⊙ \prod_{j \neq i}^{n} E_{j} (t)

(10)

The entropy information of each grid in the mission area and the environmental map information are synchronously fused, and the fusion method is as shown in Formula (11):

H [\begin{matrix} t \end{matrix}] = \oplus_{1}^{N} H_{i} [\begin{matrix} t \end{matrix}]

(11)

where the “

\oplus

“ operator represents the principle of taking the maximum entropy value, i.e., when different UAVs have conflicting entropy values for the same grid, the maximum value is retained to ensure that the grid retains a higher degree of uncertainty. This mechanism ensures that multiple UAVs can share environmental cognition, avoiding repeated detection while maintaining continuous search in high-uncertainty areas.

Since the mission area is mostly a complex environment, communication interruptions or data packet loss may occur. If

{UAV}_{i}

experiences a communication interruption at time t, each UAV can still update its local environmental map even though it cannot complete the fusion of environmental map information. When the communication link is restored at time t + k, the fused environmental map information obtained through element-wise multiplication already includes the environmental information lost from time t to t + k, as shown in Figure 7.

Although communication interruption from time t to t + k will affect the decision-making benefits of each UAV, subsequent decisions will not be affected once communication is restored, which, to a certain extent, reduces the negative impact of communication interruptions and data packet loss on the entire search task [43].

3.2. Search Benefit Function

The goal of multi-UAV cooperative search is to minimize environmental uncertainty in the mission area. During the search process, each UAV evaluates each predicted route using a search reward function based on the current environmental map and selects the optimal route to achieve rapid coverage of the mission area [44]. Based on distributed model predictive control, this paper mainly considers the following constraints:

(1): Coverage increment: Each UAV in the cluster will prioritize the direction with the largest increment in coverage area when adjusting its flight angle, so as to complete the search task in a shorter time. The coverage rate of the mission area at the corresponding time is obtained by calculating $η (t)$ at that time:

η (t) = \frac{M N - \sum_{i = 1}^{M} \sum_{j = 1}^{N} p_{i, j} (t)}{M N}

(12)

Then, the coverage increment from time t to t + 1 is as shown in Formula (13):

I_{A} (t) = η (t + 1) - η (t)

(13)

where

η (t)

denotes the regional coverage rate corresponding to time t;

I_{A} (t)

represents the coverage rate increment of the next moment relative to the current moment, which is intended to guide the UAVs to search in the direction where the coverage rate increment is larger.

(2): Entropy gain: Entropy reflects the uncertainty in the mission area rather than a simple coverage state [45,46,47]. The search potential field generated by the gradient characteristics of the entropy field drives UAVs to search in high-entropy areas. The corresponding entropy gain is as shown in Formula (14):

I_{B} (t) = \sum_{(i, j)} h_{i, j} (t) \cdot p_{i, j} (t)

(14)

where

h_{i, j} (t)

denotes the entropy value of the corresponding grid cell, which is dynamically updated according to Equation (8);

I_{B} (t)

represents the entropy gain of the set of newly detected grid cells within the prediction horizon, serving as a core indicator for accurate exploration. It is aimed at guiding UAVs to prioritize searching regions with a high degree of uncertainty, reducing repeated searches in already detected areas, and thereby lowering the repeated detection rate.

As reward functions, coverage gain and entropy gain can generate a significant synergistic effect, achieving an optimization effect of “1 + 1 > 2”. Coverage area gain aims to maximize the proportion of covered grids, which can quickly expand the search range; entropy gain quantifies the uncertainty of grids and guides UAVs to prioritize exploring high-entropy areas. The two are dynamically fused through weight coefficients: in the early stage of exploration, when the coverage rate is low, a relatively higher weight is assigned to coverage area gain to accelerate the expansion of the search range; as the coverage rate increases, the weight of entropy gain is gradually increased to promote UAVs to search in high-uncertainty areas. This dynamic fusion effectively avoids the local optimization problem that may be caused by single-objective optimization, achieving a dynamic balance between “rapid coverage” and “precision exploration”. In addition, the entropy matrix decays synchronously during coverage updates, reducing the entropy of covered areas while keeping uncovered areas with high entropy, prompting UAVs to naturally tend to high-entropy areas while expanding the coverage range, thus achieving dual optimization of coverage efficiency and information gain.

(3): Yaw angle constraint: The yaw angle is the main control variable in the entire search task, but a larger yaw angle is accompanied by greater energy consumption, which affects endurance time. Therefore, this yaw angle constraint is designed to reduce the negative impact caused by excessively large turning angles, as shown in Formula (15):

I_{C} (t) = - \frac{| ∆ ϕ_{i} (t) |}{ϕ_{\max}}

(15)

where

∆ ϕ_{i} (t)

denotes the yaw angle variation at the current moment;

ϕ_{\max}

represents the maximum yaw angle of the UAV;

I_{C} (t)

is the yaw angle constraint term, aimed at restricting the UAV from sharp turns and extending its endurance time.

(4): Boundary constraint: When ${UAV}_{i}$ explores near the boundary, the effective search area of the UAV will be reduced. A virtual potential field is designed to provide boundary repulsion to prevent this situation [48]. The magnitude of the boundary repulsion is inversely proportional to the distance from the UAV to the boundary; the closer the distance, the greater the repulsion, as shown in Formula (16) and Figure 8:

I_{D} (t) = - [(\frac{1}{x_{i} (t)} + \frac{1}{L_{x} - x_{i} (t)}) + (\frac{1}{y_{i} (t)} + \frac{1}{L_{y} - y_{i} (t)})]

(16)

In the formula,

x_{i} (t)

and

y_{i} (t)

denote the position coordinates of the UAV at the current moment;

L_{x}

and

L_{y}

represent the length and width of the mission area;

I_{D} (t)

is the boundary constraint term, aimed at preventing the UAV from approaching the boundary, which would reduce the effective detection range, ensuring that search resources are concentrated within the mission area, and thereby improving the overall coverage efficiency.

To sum up, the search benefit function for

{UAV}_{i}

in selecting a predicted path can be obtained as follows:

I (ω_{i} (t), p_{i, j} (t)) = λ_{A} I_{A} + λ_{B} I_{B} + λ_{C} I_{C} + λ_{D} I_{D}

(17)

where

λ_{A}, λ_{B}, λ_{C}, λ_{D}

is the weight of each constraint condition and satisfies

λ_{A} + λ_{B} + λ_{C} + λ_{D} = 1

.

{UAV}_{i}

is obtained through optimization to maximize the search benefit function

I

, thereby solving for the optimal control input

∆ ϕ_{i} (t)

at time t. To improve search efficiency, this method can be extended to K-step prediction, where the search benefits of K steps are summed to obtain the K-step cumulative search benefit function:

I^{k} = \sum_{k = 1}^{n} β^{k - 1} I (ω_{i} (t + k - 1), p_{i, j} (t + k - 1))

(18)

where

β

is the calculation error factor for multi-step prediction, satisfying

0 < β < 1

. It reflects the multi-step prediction error while highlighting the weight distribution between immediate search rewards and long-term search rewards, emphasizing the direct value of immediate search rewards. By using an optimization algorithm to maximize the cumulative search benefit function

I^{k}

, the optimal control input

∆ ϕ_{i}^{k} (t)

of

{UAV}_{i}

at step k can be obtained.

3.3. Decision Optimization and Solution of DMPC-OODA Based on Maximum Entropy Mechanism

To address the problem of cooperative search for multiple UAVs in mission areas without prior information, this paper proposes a cooperative decision-making method that combines DMPC with the OODA decision [49,50,51,52] loop based on the maximum entropy mechanism. This method utilizes the “Observe-Orient-Decide-Act” fast closed-loop mechanism of the OODA decision loop to reconstruct the time sequence of the rolling optimization process of DMPC, aiming to improve the search efficiency and cooperative performance of multiple UAVs in complex unknown environments.

DMPC is a control method that decomposes centralized model predictive control into local MPCs of multiple subsystems. Each subsystem only needs to handle its own optimization problem without relying on a central node, thereby reducing communication load and system computation, and accelerating the decision-making speed of the system. On this basis, combined with the OODA decision loop, the real-time perception and rapid response capabilities of UAVs to environmental changes are further enhanced, enabling them to quickly adjust search strategies in dynamic environments. This mechanism dynamically guides the search behavior of UAVs through the search benefit function and adopts a four-level hierarchical architecture to achieve rapid coverage of unknown areas.

Observe: Based on its own state and the existing coverage map of the mission area,

{UAV}_{i}

uses DMPC to solve the initial optimal control input and the corresponding environmental matrix relying on Formula (19), as shown in Figure 9:

∆ {\tilde{ϕ}}_{i} (t) = \underset{∆ ϕ_{i} (t)}{a r g \max} I

(19)

where

∆ {\tilde{ϕ}}_{i} (t)

denotes the predicted initial optimal control input. Following the solution flowchart, this formula calculates the maximum value of the search benefit function based on its own environmental perception information to obtain the initial optimal control input, and accordingly derives the corresponding predicted environmental matrix.

Orient:

{UAV}_{i}

shares the initial flight direction and predicted environmental matrix with other

UAV

through the communication network, and adopts the element-wise product method to accomplish the fusion of environmental information between its own data and that of other UAVs, so as to realize the update of environmental information within the mission area, as shown in Figure 10:

Decide: The obtained fused environmental coverage map contains the flight intentions of other UAVs, which serves as the basis for cooperative search of multiple UAVs. The initial optimal control input is optimized according to this fused environmental coverage map to determine the final control input

∆ ϕ_{i}

. Similarly, according to Formula (19), the k-step optimal control input of

{UAV}_{i}

can be solved as follows:

∆ ϕ_{i} (t) = \underset{∆ ϕ_{i} (t)}{a r g \max} I^{k}

(20)

where

∆ ϕ_{i} (t)

represents the final control input for step k, which is obtained based on environmental information after fusing the environmental coverage map.

Act:

{UAV}_{i}

adjusts the flight heading angle according to

∆ ϕ_{i}

determined in the Decide phase to advance the search progress of the mission area.

The multi-UAV cooperative search process can be regarded as the cooperative interaction of multiple OODA decision loops, and the cooperation is mainly reflected in the steps of fusing the environmental coverage map and determining the final decision. As shown in Figure 11, after

{UAV}_{i}

obtains the preliminary control input at time t, it needs to perform fusion and sharing of the predicted environmental matrix, so that each UAV can grasp the environmental perception information of the entire cluster. In the Decide phase, UAVs are no longer limited to their own environmental perception information, but decide the next search direction based on the fused environmental information of the cluster. Among them, before determining the final decision, the initial decision “intention” interaction should be carried out within the cluster, and each UAV makes decisions based on the overall search intention of the cluster, so as to realize cluster negotiation and decision-making. The pseudocode of the multi-UAV cooperative search process is shown in Algorithm 1:

Algorithm 1 Solution of DMPC-OODA based on maximum entropy mechanism

Initialization task parameters: UAV state equation

ω_{i} (t)

, Environmental matrix

E (t)

, Grid state

p_{i, j} (t)

, Entropy matrix

H (t)

for t = 1:

t_{\max}

for i = 1:

N_{\max}

Calculate the UAV state equation

{\tilde{ω}}_{i} (t + 1)

and preliminary control input

∆ {\tilde{ϕ}}_{i} (t)

according to Equation (4) and Equation (20)
Calculate the environment matrix

{\tilde{E}}_{i} (t + 1)

and entropy matrix

{\tilde{H}}_{i} (t + 1)

according to Equation (2) and Equation (8)
end for
Communication network sharing and sending

{\tilde{E}}_{i} (t + 1)

,

{\tilde{ω}}_{i} (t + 1)

.
Receive

{\tilde{E}}_{j} (t + 1)

,

{\tilde{ω}}_{j} (t + 1)

from other machine
for i = 1:

N_{\max}

Update

\tilde{E} (t + 1)

according to Equation (10)
Based on

\tilde{E} (t + 1)

calculation, the final decision is

∆ ϕ_{i} (t)

update

E_{i} (t + 1)

end for
Update the global

ω_{i} (t + 1), E (t + 1), p_{i, j} (t + 1), H (t + 1)

if

η (t) >

set threshold
break
end if
end for

The control input

∆ ϕ_{i} (t)

contains k unknown variables, and this optimization problem is a nonlinear one. Considering the advantages of swarm intelligence algorithms in solving optimization problems, especially nonlinear ones [53,54,55], the ADE algorithm is adopted for local optimization of subsystems, with details of the algorithm not elaborated here.

4. Simulation Experiments

To verify the effectiveness of the method proposed in this paper, we conduct all experiments using MATLAB 2021a on an Intel Core i7-14650HX CPU with 16GB RAM.

In this simulation experiment, the symbolic representations and relevant values of parameters such as the environmental model, UAV motion parameters, and reward function weights are shown in Table 1. The mission environment is set as an unobstructed flat area. To demonstrate the advantages of the proposed algorithm in this paper, comparative simulations are conducted using the random search, the DMPC algorithm without an entropy mechanism, the Search Intention Interaction (SII) method from reference [32], and the proposed algorithm in this paper, respectively. The change in the mission area coverage rate over time is shown in Figure 12.

Figure 12 contains coverage rate curves of three different strategies: the maximum entropy mechanism, the non-entropy mechanism, and the random algorithm. As can be seen from the figure, the coverage rate of the maximum entropy mechanism shows a continuous upward trend over time, and reaches the highest coverage rate within the same test time, which proves that it has significant advantages in this multi-UAV cooperative search experiment. In contrast, although the coverage rates of the other three methods also increase over time, their growth rates are relatively slower, and their final coverage rates are all lower than that of the maximum entropy mechanism.

To gain a more intuitive understanding of the multi-UAV cooperative search under each mode, the trajectory diagrams of different stages under each mode are shown in Figure 13, Figure 14 and Figure 15, where the black areas represent uncovered regions. As the step value increases from 60 to 180, the differences in coverage paths of UAVs under the four mechanisms become increasingly significant. Benefiting from the guidance of information entropy, the maximum entropy mechanism makes UAVs more inclined to search in unknown regions with high information entropy; their trajectories are evenly distributed within the mission area and gradually advance toward uncovered areas, effectively filling the coverage gaps in the mission area. In contrast, the trajectory distribution under the non-entropy mechanism is slightly disorganized, with repeated coverage phenomena gradually increasing, and some edge areas remain uncovered for a long time, which reflects the shortcomings of the mode lacking information entropy guidance. In the random mode, since the flight yaw angles of UAVs are randomly generated, their flight trajectories show an irregular state, with coverage areas scattered and a large number of uncovered areas remaining untouched. Under the SII mode, in the early and middle stages, the intention interaction mechanism enables multi-UAVs to cover the mission area in an orderly manner, but in the later stage, due to the lack of directional guidance, the growth of its search coverage rate is relatively slow, and some uncovered areas remain untouched.

In combination with actual mission requirements, when performing search tasks in unknown areas without prior information, setting a coverage threshold is a key indicator for evaluating mission progress and completion. In the subsequent experiment, the coverage threshold is set to 90%, which provides a quantitative standard for mission termination and ensures the rational allocation and utilization of resources, avoiding a large amount of resource waste caused by continuous ineffective exploration in the later stage of the search. Meanwhile, to further explore the guiding mechanism of information entropy in the multi-UAV cooperative search process, the experiment further analyzes the repeated detection rate, an important indicator for measuring search efficiency, which can accurately reflect the repeated search situations during the multi-UAV search process.

From the quantitative analysis in Figure 16, it can be seen that the coverage growth rate of the maximum entropy mechanism shows significant advantages, achieving 90% area coverage at 25 min, while the other three methods reach the same coverage level at 29 min, 30 min, and approximately 60 min, respectively. This data difference confirms the significant superiority of the maximum entropy mechanism in terms of time efficiency. By guiding the UAVs’ search direction through information entropy, it can complete the area coverage task in a shorter time.

Figure 17 shows the comparison results of repeated detection rates when the coverage rate reaches 90% for the three modes. In the early stage of the experiment, all three modes show a surge in repeated detection rates, which is attributed to the fact that the moving distance of a single UAV per unit time is smaller than its detection range, resulting in overlapping detection areas between the current moment and the next moment (see Figure 3 for the overlapping phenomenon). Due to the low proportion of the initial detection area, the repeated detection rate caused by this factor increases rapidly; as the proportion of the detection area increases, the negative impact caused by this factor weakens quickly. It should be noted that this phenomenon is a common objective condition limitation of the three modes, which does not affect the fairness of experimental data comparison and ensures the reliability of performance difference analysis between different modes. Analysis combining the chart data and mechanism characteristics shows that the maximum entropy mode, with information entropy as the guide for uncertainty, directs UAVs to prioritize exploring high-entropy unknown regions. The growth rate of its repeated detection rate throughout the process is significantly lower than that of other modes, effectively suppressing repeated search behaviors while rapidly advancing the search task. In contrast, the other three methods not only take longer to complete the task but also maintain a high level of repeated detection rate, resulting in a large amount of idle and wasted resources.

The above macro-indicator analysis based on the repeated detection rate has revealed significant differences in search efficiency and resource utilization among different modes. Subsequent in-depth analysis at the micro level will be conducted through the flight path pheromone rainbow diagram and the comparison diagram of the number of repeatedly detected grids.

The flight path pheromone rainbow map intuitively demonstrates the path planning characteristics during task execution by presenting the spatial movement trajectories of UAVs and the cumulative degree of pheromones retained by the flight paths. It compares the detection status of different regions through the pheromones retained in the flight paths. Here, pheromone originates from the bionic concept of biological group cooperation; for example, ants achieve group coordination by secreting chemical substances to mark paths. In this experiment, pheromone is abstracted as the accumulation of UAV flight traces, whose concentration is related to the detection frequency and duration of a region, and can quickly reflect the detection intensity.

According to the comparison of pheromone distribution characteristics in Figure 18, the pheromone field of the maximum entropy mode presents a coherent and uniform gradient structure, with no high-pheromone areas caused by repeated exploration of the same region. In contrast, due to the lack of effective information guidance, the other three methods result in a large number of ineffective explorations in the mission area, with some clusters of high pheromones. Thus, the maximum entropy mode not only breaks through the “local optimal trap” exposed by the local pheromone clusters in the non-entropy mode but also avoids the “exploration redundancy” caused by the discrete and disordered pheromone distribution in the random mode.

In Figure 19, the comparison chart of the number of repeatedly detected grids quantifies the specific distribution and quantity of repeated detections from a grid-level perspective: the horizontal axis denotes the number of times the same grid is repeatedly explored, and the vertical axis represents the corresponding number of grids. The more repeated explorations there are, the more significant the associated cost loss. The number of grids in the mission area is 160,000, calculated from Table 1.

A comparison of the three strategies shows that the distribution interval of repeated exploration times for grids in the maximum entropy mode converges to 2–6 times, and the number of corresponding grids decays exponentially as the number of repetitions increases, exhibiting a ‘narrow interval, rapid decay’ repetition distribution. This performance benefits from the dynamic balance between “rapid coverage and accurate exploration” achieved by information entropy, which effectively suppresses redundant searches and demonstrates the advantages of low repetition loss and high detection efficiency.

The interval distribution of the non-entropy mode and SII mode ranges from 2 to 9 times, with some grids being explored multiple times, showing a situation of “regional repeated stagnation”, which results in the inefficient allocation of exploration resources. The random mode has a wider interval range, covering 2–18 times. Unordered searching leads to excessive exploration of some grids, seriously wasting search resources and presenting the drawback of “global repeated proliferation”.

To investigate the impact of contingency on the experimental results and verify the stability of the maximum entropy mechanism in cooperative search, this study conducted 30 independent simulation comparison experiments for each of the three mechanisms, uniformly setting the mission area coverage rate of 90% as the completion threshold. The relevant experimental data and charts are as follows.

A dual-axis box plot is used for visual comparison of the number of simulation steps required to complete the task in multiple independent experiments, which can intuitively present the three-dimensional distribution characteristics of the data. In the Figure 20, the distribution of blue data points corresponds to the blue axis, and the distribution of yellow data points corresponds to the yellow axis. The results show that the box plot data distribution of the maximum entropy mode exhibits the characteristics of being the lowest and most concentrated, with its median and interquartile range significantly smaller than those of other modes. This indicates that the maximum entropy mode not only improves search efficiency but also significantly enhances system stability. In contrast, the data distributions of the other three methods are relatively dispersed with a large span of steps, reflecting system defects of high cost and strong uncertainty.

The data in the Figure 21 indicate that the repetition rate of the maximum entropy mode is significantly lower than that of the other two modes, with a smaller fluctuation range. This shows that this mode can effectively avoid repeated searches and has high stability, verifying the vital role of the maximum entropy mode in constraining repeated exploration and improving search efficiency. In contrast, the data distributions of the other three methods have defects of high repetition rates and large fluctuations, indicating that they lack a directional guidance mechanism and tend to fall into the trap of repeated exploration.

Following the aforementioned simulation visualization analysis, to further quantify the performance differences in search efficiency and stability among the three modes, a multi-indicator quantitative comparison system was established using mean, standard deviation, extreme values, and median, focusing on the two core indicators of simulation steps and repetition rate. The results are shown in Table 2. In terms of simulation steps, the maximum entropy mode exhibits optimal performance in all indicators. Its mean value is 6.95%, 12.22%, and 59.49% higher than that of the SII mode, non-entropy mode, and random mode, respectively, confirming its high efficiency in completing tasks with low time costs. The narrow convergence of the extreme value interval and low standard deviation demonstrates the system stability of the maximum entropy mode. In contrast, the mean values and extreme value intervals of the other three methods are significantly widened, reflecting efficiency delays and resource consumption caused by local convergence and disordered search. In terms of the repetition rate indicator, all indicators of the maximum entropy mode are also optimal, highlighting its ability to accurately allocate search resources and suppress repeated exploration. The high mean values and wide extreme value intervals of the other three methods expose the resource waste defects of “regional repeated stagnation” and “global repeated proliferation”, respectively.

To systematically verify the generalization performance and scalability of the maximum entropy model, especially its adaptability in large-scale mission scenarios, this study breaks through the scale constraints of traditional experiments, expanding the mission area to a spatial range of 15 km × 15 km and simultaneously increasing the number of multi-UAVs to 10. The large-scale mission environment not only places higher requirements on the system’s resource management capabilities but also tests the system’s stability and robustness. Therefore, this study verifies whether the maximum entropy mode can effectively meet the needs of large-scale search tasks by expanding the experimental scale, and the relevant experimental data are shown in Figure 22, Figure 23 and Figure 24.

From the evolution curves of coverage rate and repeat detection rate in Figure 22, it can be seen that the rapid increase in coverage rate and the relatively gentle trend of repeat detection rate indicate that the maximum entropy mode can efficiently cover a large area of task area in a short period of time, while effectively suppressing the phenomenon of repeated detection that wastes search resources, verifying its efficiency and resource utilization ability in large-scale scenarios. The trajectory map in Figure 23 and the pheromone distribution map in Figure 24 jointly reflect that in the maximum entropy mode, multiple drones can plan their flight paths reasonably and improve search efficiency through information entropy guidance. Through multidimensional analysis of key indicators such as threshold time and repeat detection rate, as well as the distribution of pheromones in drone trajectories and paths, the system has verified the efficiency and full-dimensional adaptability of the maximum entropy mode in large-scale task scenarios, demonstrating its significant advantages in search efficiency and resource optimization.

5. Summary

To address the challenges of slow coverage rate growth and high repeated detection rates in multi-UAV cooperative search in unknown areas, this paper proposes a distributed cooperative search algorithm based on the maximum entropy mechanism. The core innovations of this algorithm are as follows:

(1): Dynamic Exponentially Decaying Entropy Model: An entropy model is constructed to quantify environmental uncertainty, effectively guiding UAVs toward high-entropy (high-uncertainty) regions.
(2): Integration of DMPC and OODA Decision Loop: DMPC is innovatively integrated with the OODA Decision Loop and achieves rapid updating and fusion of environmental maps through element-by-element multiplication operations, resulting in a highly efficient rolling time-domain co-optimization framework.
(3): Adaptive Dynamic Weight Function: A dynamic weight function is designed to balance coverage gain and entropy gain, facilitating a smooth transition of strategies from “rapid coverage” to “accurate exploration”.

Simulation experiments and multiple independent validations demonstrate that the proposed maximum entropy mechanism significantly improves search efficiency and resource utilization. Compared with the Search Intent Interaction (SII) mode, non-entropy mode, and random mode, the proposed algorithm increases time efficiency in reaching the set coverage threshold by 6.95%, 12.22%, and 59.49%, respectively, while reducing the repeated detection rate by 7.62%, 13.94%, and 41.38%, respectively. These results fully verify the significant advantages of the entropy gradient-guided cooperative strategy in overcoming local optima, suppressing redundant searches, improving overall coverage efficiency, and demonstrating engineering robustness. Future work will further investigate path planning in complex obstacle environments and explore the impact of communication delays on cooperative efficiency.

Author Contributions

Conceptualization, S.C. and H.L.; methodology, S.C. and H.L.; software, S.C. and X.F.; validation, S.C. and J.H.; formal analysis, S.C.; resources, H.L. and X.F.; writing—original draft preparation, S.C.; writing—review and editing, S.C. and L.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No.61502522), the National Social Science Foundation of China (No. 2022-SKJJ-B-056; No. 2025-SKJJ-T-036), the Natural Science Foundation of Hubei Province (No. 2025AFB419; No. 2023AFB1028), and the Air Force Equipment Research Project (No. KJ2024C008).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, H.; Xin, B.; Dou, L.-H.; Chen, J.; Hirota, K. A Review of Cooperative Path Planning of an Unmanned Aerial Vehicle Group. Front. Inf. Technol. Electron. Eng. 2020, 21, 1671–1694. [Google Scholar] [CrossRef]
Zhou, Y.; Rao, B.; Wang, W. UAV Swarm Intelligence: Recent Advances and Future Trends. IEEE Access 2020, 8, 183856–183878. [Google Scholar] [CrossRef]
Li, H.; Sun, H.; Zhou, R.; Zhang, H. Hybrid TDOA/FDOA and track optimization of UAV swarm based on A-optimality. Syst. Engin. Electron. 2023, 34, 149–159. [Google Scholar] [CrossRef]
Sheng, L.; Shi, M.H.; Qi, Y.C.; Li, H.; Pang, M.J. Dynamic offense and defense of UAV swarm based on situation evolution game. Syst. Engin. Electron. 2023, 45, 2332–2342. [Google Scholar]
Fan, X.; Li, H.; Chen, Y.; Dong, D. A Path-Planning Method for UAV Swarm under Multiple Environmental Threats. Drones 2024, 8, 171. [Google Scholar] [CrossRef]
Fan, X.; Li, H.; Chen, Y.; Dong, D. UAV Swarm Search Path Planning Method Based on Probability of Containment. Drones 2024, 8, 132. [Google Scholar] [CrossRef]
Cheng, Z.; Yang, J.; Sun, J.; Zhao, L. Trajectory Planning of Unmanned Aerial Vehicles in Complex Environments Based on Intelligent Algorithm. Drones 2025, 9, 468. [Google Scholar] [CrossRef]
Hao, L.; Xiangyu, F.; Manhong, S. Research on the cooperative passive location of moving targets based on improved particle swarm optimization. Drones 2023, 7, 264. [Google Scholar] [CrossRef]
Wang, W.; Bai, P.; Li, H.; Liang, X. Optimal configuration and path planning for UAV swarms using a novel localization approach. Appl. Sci. 2018, 8, 1001. [Google Scholar] [CrossRef]
Zhang, N.; Zhang, B.; Zhang, Q.; Gao, C.; Feng, J.; Yue, L. Large-area coverage path planning method based on vehicle–UAV collaboration. Appl. Sci. 2025, 15, 1247. [Google Scholar] [CrossRef]
Lun, Y.; Wang, H.; Wu, J.; Liu, Y.; Wang, Y. Target Search in Dynamic Environments With Multiple Solar-Powered UAVs. IEEE Trans. Veh. Technol. 2022, 71, 9309–9321. [Google Scholar] [CrossRef]
Fei, B.; Bao, W.; Zhu, X.; Liu, D.; Men, T.; Xiao, Z. Autonomous Cooperative Search Model for Multi-UAV With Limited Communication Network. IEEE Internet Things J. 2022, 9, 19346–19361. [Google Scholar] [CrossRef]
Chai, S.; Yang, Z.; Huang, J.; Li, X.; Zhao, Y.; Zhou, D. Cooperative UAV search strategy based on DMPC-AACO algorithm in restricted communication scenarios. Def. Technol. 2024, 31, 295–311. [Google Scholar] [CrossRef]
Xu, S.; Zhou, Z.; Li, J.; Wang, L.; Zhang, X.; Gao, H. Communication-Constrained UAVs’ Coverage Search Method in Uncertain Scenarios. IEEE Sens. J. 2024, 24, 17092–17101. [Google Scholar] [CrossRef]
Zeng, G.Q.; Bai, Z.; Lin, W.; Ding, W.R. Cooperative Search Method for Ground Moving Targets Using Multiple UAVs. Syst. Eng. Electron. 2018, 40, 1498–1505. [Google Scholar] [CrossRef]
Li, Y. Research on Cooperative Formation Control of Multi-UAVs Based on Distributed Search and Capture Algorithm. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2020. [Google Scholar]
Luo, L. Research on the Strategy of Cooperative Search for Moving Targets by UAV Swarm Based on Search Benefit Balance. Master’s Thesis, Beijing University of Posts and Telecommunications, Beijing, China, 2024. [Google Scholar] [CrossRef]
Xie, P.Z.; Wei, C. Research on Multi-UAV Scanning Line Search Method Based on Unilateral Region Segmentation. Aero Weapon. 2020, 27, 67–72. [Google Scholar]
Chen, X.L.; Peng, X.H.; Kong, J.T. Cooperative Search and Experimental Verification of Multi-UAVs Under Limited Information Conditions. Control Eng. 2025, 1–8. [Google Scholar] [CrossRef]
Niu, Z.; Jia, X.; Yao, W. Communication-Free MPC-Based Neighbors Trajectory Prediction for Distributed Multi-UAV Motion Planning. IEEE Access 2022, 10, 13481–13489. [Google Scholar] [CrossRef]
Du, J.Y.; Zhang, F.M.; Mao, H.B.; Liu, H.W.; Mao, Y.Y. Game Theory Model and Fast Solution Method for Multi-UAV Cooperative Search. J. Shanghai Jiaotong Univ. 2013, 47, 667–673. [Google Scholar]
Wu, A.; Yang, R.N.; Liang, X.L.; Hou, Y.Q. Cooperative Search Algorithm for UAV Swarm Based on Pheromone Decision. J. Beijing Univ. Aeronaut. Astronaut. 2021, 47, 814–827. [Google Scholar] [CrossRef]
Li, C. Research on Autonomous Cooperative Search Using UAV Swarms. Master’s Thesis, Zhejiang University, Hangzhou, China, 2019; pp. 1–8. [Google Scholar]
Wang, W.; Chen, Y.; Zhang, Y.; Chen, Y.; Du, Y. Collaborative Search Algorithm for Multi-UAVs Under Interference Conditions: A Multi-Agent Deep Reinforcement Learning Approach. Drones 2025, 9, 445. [Google Scholar] [CrossRef]
Zhao, Y.; Lu, S.; Wang, C.; Liu, Y.; Ding, Y.; Wang, H. Integrated Reinforcement Learning Framework for UAV Swarm Two-Stage Cooperative Multitarget Detection Tasks. IEEE Internet Things J. 2025, 12, 9435–9448. [Google Scholar] [CrossRef]
Dong, P.; Liu, J.; Tao, H.; Ruby, R.; Jian, M.; Luo, H. An Optimized Scheduling Scheme for UAV-USV Cooperative Search via Multi-Agent Reinforcement Learning Approach. In Proceedings of the 2024 20th International Conference on Mobility, Sensing and Networking (MSN), Harbin, China, 20–22 December 2024; pp. 172–179. [Google Scholar] [CrossRef]
Zou, L.; Tan, Y. Collaborative Search Planning of UAV Swarms Based on Deep Reinforcement Learning. In Proceedings of the 2023 5th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP), Chengdu, China, 19–21 May 2023; pp. 1184–1187. [Google Scholar] [CrossRef]
Guo, K.; Luo, H.; Tao, H.; Ruby, R.; Qin, Z.; Liu, K. Multi-UAVs Collaborative Search Scheme in Marine Environments using Deep Reinforcement Learning. In Proceedings of the 2023 Eleventh International Conference on Advanced Cloud and Big Data (CBD), Danzhou, China, 18–19 December 2023; pp. 39–44. [Google Scholar] [CrossRef]
Ni, J.; Tang, G.; Mo, Z.; Cao, W.; Yang, S.X. An Improved Potential Game Theory Based Method for Multi-UAV Cooperative Search. IEEE Access 2020, 8, 47787–47796. [Google Scholar] [CrossRef]
Ma, T.; Jiang, J.; Liu, X.; Liu, R.; Sun, H. Target Search of UAV Swarm Based on Improved Wolf Pack Algorithm. In Proceedings of the 2023 6th International Symposium on Autonomous Systems (ISAS), Nanjing, China, 23–25 June 2023; pp. 1–6. [Google Scholar]
Hou, Y.Q.; Liang, X.L.; He, L.L.; Liu, L. Cooperative Area Search Algorithm for UAV Swarm in Unknown Environment. J. Beijing Univ. Aeronaut. Astronaut. 2019, 45, 347–356. [Google Scholar]
Wang, N.; Li, Z.; Liang, X.L.; Hou, Y.; Wu, A. Cooperative Search Algorithm for UAV Swarm Based on Search Intention Interaction. J. Beijing Univ. Aeronaut. Astronaut. 2022, 48, 454–463. [Google Scholar]
Zheng, H.J. Investigation on the UAV Path Planning Problem of Area Reconnaissance. Master’s Thesis, National University of Defense Technology, Changsha, China, 2011; pp. 5–8. [Google Scholar]
Liu, C.; Gao, X.G.; Fu, X.W. Distributed Cooperative Target Search Algorithm for Multi-UAVs With Controllable Revisit Mechanism Based on Digital Pheromone. Syst. Eng. Electron. 2017, 39, 1999–2010. [Google Scholar]
Shen, D.; Wei, R.X.; Qi, X.M.; Guan, X. Receding Horizon Decision Method Based on MTPM and DPM for Multi-UAVs Cooperative Large Area Target Search. Acta Autom. Sin. 2014, 40, 1391–1403. [Google Scholar]
Qin, Y.; Zhao, F.Y.; Li, X.Z.; Wei, D.; Chen, S.; Li, Z. Prediction of Flight Status of Logistics UAVs Based on an Information Entropy Radial Basis Function Neural Network. Sensors 2021, 21, 3651. [Google Scholar] [CrossRef]
Zhang, Y.H.; Yang, Z.H.; Liu, T.; Zhang, Y.; Zhang, N. An Information-Entropy-Based Hierarchical Serialization Allocation Method for UAV Tracking in 6G Networks. Wirel. Commun. Mob. Comput. 2022, 2022, 1–15. [Google Scholar] [CrossRef]
Xuan, Y.L.; Jiang, G.D.; Sun, T.; Tan, L.; Wang, L.F. Template-Guided Frequency Attention and Adaptive Cross-Entropy Loss for UAV Visual Tracking. Chin. J. Aeronaut. 2023, 36, 299–312. [Google Scholar]
Ju, R.; Zhang, H.; Han, W.; Ge, S.S. Maximum Entropy Searching. CAAI Trans. Intell. Technol. 2019, 4, 1–8. [Google Scholar] [CrossRef]
Singh, C.; Sharma, A. Modified Online Newton Step Based on Element Wise Multiplication. Comput. Intell. 2020, 36, 1010–1025. [Google Scholar] [CrossRef]
Ozel, M.; Varol, D. On the Lower Bounds for the Minimum Eigenvalue of the Hadamard Product of an M-Matrix and Its Inverse. Linear Algebra Appl. 2021, 28, 439–443. [Google Scholar]
Zou, R.; Guo, C.C.; Chen, X.C.; Li, F.; Yi, D.; Wu, Y. Genetic Algorithm Optimised Hadamard Product Method for Inconsistency Judgement Matrix Adjustment in AHP and Automatic Analysis System Development. Expert Syst. Appl. 2023, 211, 118689. [Google Scholar]
Tian, W.Y.; Liu, L.; Wang, Q.S. Fast Search Method for Moving Targets Using Multiple UAVs Cooperative Searching. In Proceedings of the 2022 International Conference on Unmanned Aircraft Systems (ICUAS), Dubrovnik, Croatia, 21–24 June 2022. [Google Scholar]
Wu, W.C.; Huang, C.Q.; Song, L.; Tang, S.Q.; Bai, R.C. Cooperative Search and Path Planning of Multi-Unmanned Air Vehicles in Uncertain Environment. Acta Armamentarii 2011, 32, 1337–1342. [Google Scholar]
Zhang, S.Q.; Yang, H.T. Quantitative Description of Uncertainty and Entropic Uncertainty Relation. Acta Phys. Sin. 2023, 72. [Google Scholar] [CrossRef]
Deng, Z.Y.; Yuan, H.; Wang, D.; Yuan, H.; Yang, J.; Ye, L. Experimental Investigation of Entropic Uncertainty Relations and Coherence Uncertainty Relations. Phys. Rev. A 2020, 101, 032101. [Google Scholar] [CrossRef]
Floerchinger, S.; Haas, T.; Hoeber, B. Relative Entropic Uncertainty Relation. Phys. Rev. A 2021, 103, 062209. [Google Scholar] [CrossRef]
Fu, Q.X.; Liang, X.L.; Zhang, J.Q. Design and Implementation of Autonomous Flight Unmanned Aircraft System Geo-Fence Algorithm. J. Xi’an Jiaotong Univ. 2019, 53, 167–175. [Google Scholar]
Bryant, D. Rethinking OODA: Toward a Modern Cognitive Framework of Command Decision Making. Mil. Psychol. 2006, 18, 183–206. [Google Scholar] [CrossRef]
Huang, Y.Y. Modeling and Simulation Method of the Emergency Response Systems Based on OODA. Knowl.-Based Syst. 2015, 89, 527–540. [Google Scholar] [CrossRef]
Wu, W.L.; Zhou, X.S.; Shen, B. Comprehensive Evaluation of the Intelligence Levels for Unmanned Swarms Based on the Collective OODA Loop and Group Extension Cloud Model. Connect. Sci. 2022, 34, 630–651. [Google Scholar] [CrossRef]
Revay, M.; Liska, M. OODA Loop in Command & Control Systems. In Proceedings of the 2017 Communication and Information Technologies (KIT), Vysoke Tatry, Slovakia, 4–6 October 2017; pp. 127–130. [Google Scholar]
Huang, J.; Zhou, N.; Cao, M. Adaptive Fuzzy Behavioral Control of Second-Order Autonomous Agents With Prioritized Missions: Theory and Experiments. IEEE Trans. Ind. Electron. 2019, 66, 9612–9622. [Google Scholar] [CrossRef]
Brezocnik, L.; Fister, I.; Podgorelec, V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef]
Chen, X.F.; Ye, C.M. Improved Teaching-Learning Optimization Algorithm Based on Nonlinear Convergence Factor and Benchmarking Management. J. Shanghai Univ. Sci. Technol. 2022, 44, 508–518. [Google Scholar]

Figure 1. Schematic diagram of multi-UAV cooperative search.

Figure 2. Gridding of mission space.

Figure 3. Route map of the prediction model.

Figure 4. Grid map of detection area.

Figure 5. Entropy convergence diagram.

Figure 6. Environmental matrix update flowchart.

Figure 7. Recovery mechanisms for environmental map fusion after communication outages.

Figure 8. Boundary virtual potential field constraints.

Figure 9. Flow chart for solving control inputs for each drone.

Figure 10. Integration mechanism of task area environment coverage map.

Figure 11. DMPC-OODA collaborative decision-making closed-loop process diagram.

Figure 12. Coverage rate change diagram.

Figure 13. Environmental coverage map of each mode at step = 60.

Figure 14. Environmental coverage map of each mode at step = 120.

Figure 15. Environmental coverage map of each mode at step = 180.

Figure 16. Comparison chart of threshold duration reached.

Figure 17. Comparison of repetitive detection rates.

Figure 18. Flight path pheromone rainbow diagram.

Figure 19. Comparison diagram of the number of repeatedly detected grids.

Figure 20. Dual-Axis box plot.

Figure 21. Repetition rate distribution diagram.

Figure 22. Repetitive detection rate and threshold time graph for large-scale task scenarios.

Figure 23. Flight path diagram for large-scale mission scenarios.

Figure 24. Rainbow diagram of flight path pheromone in large-scale mission scenarios.

Table 1. Parameter settings.

Parameters	Variable Symbol	Value
Environmental length	L_x	8000 m
Environmental width	L_y	8000 m
Grid step size	∆d	20 m
Number of grids	M × N	400 × 400
Detecting radius	R	200 m
UAV speed	v	30 m/s
Step size during simulation	∆t	10 s
Maximum steering angle	ϕ_max	$\frac{π}{3}$
Predict the number of steps	k	5
Coverage area weight	λ_A	0.5/0.3
Entropy gain weight	λ_B	0.3/0.5
Yaw angle constraint weight	λ_C	0.1
Boundary constraint weight	λ_D	0.1
Starting point A	-	[200, 0]
Starting point B	-	[2200, 0]
Starting point C	-	[4200, 0]
Starting point D	-	[6200, 0]

Table 2. Comparison of values of independent experiments.

Vlaue	Mode	Average	SD	Min	Max	Median
Simulation steps	Entropy	158	5.59	146	168	158.5
	No entropy	180	12.72	165	224	178.5
	Random	390	51.34	308	501	394
	SII	168	9.20	156	200	165.5
Repetition rate	Entropy	50.43	2.60	44.14	54.71	50.44
	No entropy	58.60	3.88	53.29	71.48	58.03
	Random	86.04	2.65	81.27	90.98	85.46
	SII	54.59	3.60	50.71	64.43	53.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, S.; Li, H.; Fan, X.; Ni, L.; Hou, J. A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism. Drones 2025, 9, 592. https://doi.org/10.3390/drones9080592

AMA Style

Cui S, Li H, Fan X, Ni L, Hou J. A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism. Drones. 2025; 9(8):592. https://doi.org/10.3390/drones9080592

Chicago/Turabian Style

Cui, Siyuan, Hao Li, Xiangyu Fan, Lei Ni, and Jiahang Hou. 2025. "A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism" Drones 9, no. 8: 592. https://doi.org/10.3390/drones9080592

APA Style

Cui, S., Li, H., Fan, X., Ni, L., & Hou, J. (2025). A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism. Drones, 9(8), 592. https://doi.org/10.3390/drones9080592

Article Menu

A Multi-UAV Distributed Collaborative Search Algorithm Based on Maximum Entropy Mechanism

Abstract

1. Introduction

2. System Model

2.1. Problem Description

2.2. UAV Mission Space Model

2.3. UAV Model

2.4. Maximum Entropy Model

3. Cooperative Decision-Making Optimization Framework Based on Maximum Entropy Mechanism

3.1. Update and Fusion of Environmental Maps

3.2. Search Benefit Function

3.3. Decision Optimization and Solution of DMPC-OODA Based on Maximum Entropy Mechanism

4. Simulation Experiments

5. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI