Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement

Liu, Qiting; Li, Meixuan; Zhu, Xianqiang

doi:10.3390/drones10060406

Open AccessArticle

Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement

by

Qiting Liu

,

Meixuan Li

and

Xianqiang Zhu

^*

National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(6), 406; https://doi.org/10.3390/drones10060406

Submission received: 10 April 2026 / Revised: 17 May 2026 / Accepted: 19 May 2026 / Published: 25 May 2026

(This article belongs to the Special Issue Advanced Optimization Strategies for UAV Mission Planning and Operation)

Download

Browse Figures

Review Reports Versions Notes

Highlights

What are the main findings?

The proposed Experience Library enables capability-aware force-ratio selection (i.e., non-uniform swarm sizing across targets), showing that allocation ratios have a decisive impact on mission completion: appropriate force distribution yields smooth and monotonic performance improvements across budgets, whereas Uniform Allocation can trigger threshold-like failures under heterogeneity.
Under dynamic disturbances and budget cuts, the method exhibits principled triage: allocation surges to the most demanding feasible targets and is withdrawn from lowreturn or infeasible ones.

What are the implications of the main findings?

Capability-aware allocation improves mission resilience by maximizing global success rather than per-target persistence, delivering better continuity across changing conditions.
The EL’s lightweight inference enables rapid re-optimization, supporting low-latency allocation computation with efficient budget usage and reduced redundant deployments.

Abstract

The size of an unmanned aerial vehicle (UAV) swarm and the capabilities of each individual in the swarm are critical determinants of mission effectiveness in multi-target encirclement. This paper proposes a framework for strategic force allocation during mission execution. A lightweight estimator is designed to predict the required scale of UAV teams based on target difficulty indicators in the tested settings, including the number of hunted targets and their mobility. Owing to the computational efficiency of the Experience Library, the proposed framework can rapidly re-optimize UAV allocation when target difficulty changes or the available UAV budget is reduced, thereby enabling adaptive swarm resizing and resource redistribution under dynamic disturbances and constraints. Meanwhile, system performance under different force allocations is evaluated from two dimensions: mission success rate and time-to-target. The aim is to achieve mission lightweighting and avoid redundant resource consumption.

Keywords:

cooperative encirclement; dynamic force allocation; experience library; adaptive resource reallocation; multi-target pursuit

1. Introduction

Multi-target encirclement is a fundamental cooperative mission in multi-agent systems, where a team of unmanned aerial vehicles (UAVs) must surround and constrain one or multiple evasive targets to maintain containment, tracking, or interception capability. A typical multi-UAV target encirclement scenario is depicted in Figure 1. This problem arises in applications such as maritime interdiction, urban surveillance, and emergency response [1,2]. Compared with nominal single-target settings, practical deployments are adversarial, resource-limited, and dynamically evolving: target mobility changes over time, mission difficulty is heterogeneous across targets, and the available UAV budget may be insufficient or fluctuate during execution.

Existing encirclement research has made important progress in control design and intelligent decision-making. Control-theoretic approaches provide stability guarantees under dynamics and disturbances [1,3], while learning-based methods improve adaptability in high-dimensional and uncertain environments [2,4]. However, most prior work has primarily focused on motion control and swarm coordination once assignments are given [5,6], while the upper-layer resource allocation problem remains less explored in heterogeneous and adversarial multi-target settings.

From a task-allocation perspective, existing methods mainly fall into three categories. The first category is constraint-based assignment and optimization, such as integer programming or deterministic resource assignment, which usually assumes known task utilities and static assignment constraints [7,8]. The second category is auction-based or consensus-based cooperative allocation, where agents negotiate assignments through local bids or distributed agreement, emphasizing scalability and decentralized coordination [9]. The third category is greedy marginal-gain allocation, which incrementally assigns resources according to the current local reward improvement [10]. Although effective in some settings, these methods are not tailored to multi-target encirclement missions in which the capture utility is strongly nonlinear, depends on target-specific force ratio, and exhibits a minimum-team-size threshold.

This leads to a key practical and methodological question: under dynamic and adversarial conditions, how many UAVs are required to achieve effective multi-target encirclement, and how should they be allocated across targets of different difficulty? This question is closely related to the force ratio problem, namely, the quantitative relationship between opposing resources and achievable mission effectiveness. Unlike classical static allocation problems, however, force ratio in multi-target encirclement is not a fixed scalar. It is jointly shaped by target speed, local interaction geometry, and the number of UAVs assigned to each target, and therefore must be treated as a mission-dependent effectiveness variable.

Motivated by this gap, this paper investigates an Adaptive Force Ratio Allocation framework for heterogeneous multi-target encirclement. The core idea is not to assign UAVs according to fixed geometric rules or instantaneous local utilities, but to explicitly model the mapping from target capability and assigned UAV number to capture effectiveness, and then perform global resource allocation under a fixed total budget. Specifically, we construct an Experience Library to estimate target-wise success-rate curves under different speed ratios and UAV counts, and use these curves to support budget-constrained allocation and online reallocation.

The main contributions are summarized as follows:

We formulate a force-ratio-aware effectiveness modeling framework for heterogeneous multi-target encirclement, explicitly characterizing how capture success varies with target speed ratio and assigned UAV number.
We propose an Experience Library-based allocation mechanism that transforms multi-target UAV dispatch into a budget-constrained global optimization problem, enabling capability-aware allocation and fast online re-optimization when target conditions or available resources change.
We provide comparative experiments against Uniform Allocation and greedy marginal-gain baselines, together with adaptive non-uniform sampling and trajectory-level analysis, to show that the proposed method achieves more robust and consistent mission effectiveness under heterogeneous target difficulty.

The remainder of this paper is organized as follows. Section 3 and Section 4 present the modeling and allocation framework. Section 5 reports effectiveness evaluation and comparison experiments. Section 6 concludes the paper and discusses future directions on online force-ratio adaptation.

2. Related Works

This paper studies Adaptive Force Ratio Allocation for Multi-Target Encirclement. Since the proposed method lies at the intersection of task allocation and mission-effectiveness modeling, related studies are reviewed from two aspects: (i) adaptive resource allocation methods for multi-UAV systems, and (ii) mission-effectiveness analysis for pursuit–encirclement under adversarial and uncertain conditions.

2.1. Adaptive Resource Allocation in Multi-UAV Systems

Adaptive resource allocation has been extensively studied in multi-agent and multi-UAV systems. Existing methods can be broadly grouped into three categories: optimization-based allocation, consensus/auction-based cooperative allocation, and learning-based adaptive allocation.

Optimization-based methods formulate task allocation as a constrained assignment or scheduling problem and solve it by combinatorial optimization, integer programming, or heuristic search [7,8]. These methods are effective when task utilities and constraints are clearly specified, but they usually rely on predefined static cost functions and are less suitable when the marginal effectiveness of additional resources is strongly nonlinear or target-dependent.

Consensus-based and auction-based approaches, represented by the Consensus-Based Bundle Algorithm (CBBA) family and related distributed negotiation frameworks, provide scalable and decentralized solutions for conflict-free assignment [11,12,13]. Subsequent works extended these mechanisms to dynamic mission reassignment, time-critical scheduling, and communication-constrained environments [14,15,16,17,18]. Their main advantage lies in distributed coordination efficiency. However, they typically emphasize local agreement, bidding consistency, or scheduling feasibility, rather than explicitly modeling the thresholded mission-effectiveness change caused by different pursuer-to-target force ratios.

Learning-based allocation, especially Multi-Agent Reinforcement Learning (MARL), has become an important direction for adaptive decision-making in uncertain and non-stationary environments [19,20,21]. MARL is attractive when explicit system models are unavailable and decentralized policies are desired. Nevertheless, such methods generally require extensive training data, are sensitive to task distribution shifts, and often provide limited interpretability in resource-limited encirclement scenarios where a clear relation between target capability and required UAV number is needed.

Compared with these studies, our work focuses on a different problem structure. We do not address generic utility maximization or distributed negotiation alone. Instead, we consider a heterogeneous multi-target encirclement setting in which mission effectiveness is governed by force-ratio-dependent capture success and exhibits a minimum-team-size threshold. The proposed method therefore models target-wise success curves through an Experience Library and performs budget-constrained global allocation accordingly. In this sense, the main distinction from adaptive allocation literature is that our allocation variable is not only task priority or local reward, but the estimated mission success contribution under different target-specific UAV counts.

2.2. Mission Effectiveness Analysis for Encirclement and Pursuit–Evasion

Mission effectiveness in UAV-swarm encirclement and pursuit–evasion has usually been evaluated from three aspects: mission-level outcomes such as capture success and completion time, control-level quality such as convergence and enclosure maintenance, and system-level operability such as communication feasibility and distributed coordination stability [22,23,24,25,26,27]. Existing studies have improved these aspects through model-based control and planning, learning-based adaptation under uncertainty, and distributed mission coordination mechanisms [28,29,30].

However, most of these works evaluate performance under fixed or weakly varying settings. Systematic analysis of how mission success changes as the force ratio evolves remains limited, especially in heterogeneous multi-target scenarios where some targets require substantially more UAV resources than others. In such cases, performance degradation is often dominated not by controller instability, but by resource–demand mismatch across targets.

Related resilience studies have investigated failure recovery, communication degradation, system reconfiguration, and mission continuity in multi-UAV systems [31,32,33,34]. These studies provide important foundations for robust multi-agent operation, but they generally do not couple three key factors into one allocation–analysis pipeline: dynamic force-ratio effects, heterogeneous target capability, and effectiveness-aware resource redistribution.

Therefore, the existing literature still lacks a framework that jointly answers two questions: how mission effectiveness changes under varying force-ratio conditions, and how limited UAV resources should be allocated to maximize capture performance in heterogeneous multi-target encirclement. Our work addresses this gap by combining force-ratio-oriented effectiveness evaluation with Experience-Library-based allocation and online re-optimization.

2.3. Position of This Work

To clarify the methodological position of this paper, we summarize its distinction from existing studies as follows. First, unlike traditional constrained assignment methods, the proposed approach does not assume a fixed handcrafted utility for each task; instead, it uses empirically estimated success-rate curves to capture the nonlinear relationship between target capability, assigned UAV number, and capture effectiveness. Second, unlike consensus- or auction-based cooperative allocation, our method is not designed around distributed bid agreement, but around mission-level global optimization under a fixed total UAV budget. Third, unlike greedy marginal-gain strategies, which choose the current locally best increment, our method performs global resource allocation over the entire target set and is therefore better suited to handling budget coupling induced by a shared resource constraint. Finally, unlike MARL-based adaptive allocation, the proposed framework does not depend on large-scale policy training, and it offers a more transparent mechanism for explaining why a target should receive more or fewer UAVs under a given force-ratio condition.

For this reason, the contribution of this paper is not simply an engineering integration of existing encirclement and allocation modules. Its methodological contribution lies in introducing a force-ratio-aware effectiveness modeling and allocation framework for heterogeneous multi-target encirclement, where empirical success estimation, budget-constrained optimization, and online reallocation are integrated into a unified and interpretable decision pipeline.

3. Single-Target Mission Effectiveness Analysis

3.1. System Description

To show the influence of swarm sizes on mission effectiveness, we first establish a kinematic model and a potential field-based control framework for single-target encirclement. For theoretical exposition, we consider a system consisting of N UAVs and a single dynamic target in a two-dimensional planar workspace

W \subseteq R^{2}

. The simulator used to generate the Experience Library further adopts a bounded square workspace with reflective boundaries and circular obstacles, as detailed in Section 3.2. The obstacle term in the control law is included to match this simulation environment. Extending the framework to full three-dimensional encirclement and more realistic obstacle-rich environments is left for future work. Let the position and velocity of the i-th UAV be denoted by

p_{i}, v_{i} \in R^{2}

(

i = 1, \dots, N

), and the target’s state by

p_{T}, v_{T} \in R^{2}

.

The motion of all agents follows second-order kinematics with saturation constraints on speed and acceleration, which can be expressed as follows

\{\begin{matrix} \dot{p} = v \\ \dot{v} = {sat}_{a_{max}} (u) \\ v \leftarrow {sat}_{v_{max}} (v) \end{matrix}

(1)

where

u

represents the control input (force), and

{sat}_{v a l} (\cdot)

limits the magnitude of a vector to

v a l

.

Our primary objective is not to devise novel control strategies, but to elucidate the impact of varying swarm sizes on mission effectiveness. Consequently, we employ the widely established Artificial Potential Field (APF) method as a simple and transparent baseline controller [35,36]. The resulting Experience Library success landscape should be interpreted as conditional on the chosen simulator and controller settings. The control input

u_{i}

for the i-th UAV comprises an attractive force towards the target, a repulsive force for inter-agent collision avoidance, and an obstacle avoidance term, which is formulated as

u_{i} = k_{a t t} (p_{T} - p_{i}) + \sum_{j \neq i} f_{s e p} (p_{i}, p_{j}) + f_{o b s} (p_{i}, O) .

(2)

Conversely, the non-cooperative target employs an active evasion strategy driven by the repulsive forces from all detecting UAVs, given by

u_{T} = \sum_{i = 1}^{N} \frac{p_{T} - p_{i}}{∥ p_{T} - p_{i} ∥^{2} + ϵ} .

(3)

A successful encirclement mission is defined by two simultaneous conditions:

Proximity: At least $N_{r e q} \geq 4$ UAVs must enter the capture radius $R_{e n c}$ of the target: $∥ p_{i} - p_{T} ∥ < R_{e n c}$ .
Geometric Encirclement: The angular distribution of the encircling UAVs relative to the target must prevent escape. Let $θ_{i}$ be the bearing of the i-th UAV relative to the target. The mission is successful if the maximum angular gap satisfies the following constraint

$max_{j} (θ_{j + 1} - θ_{j}) < π,$

(4)

where angles are sorted $θ_{1} \leq \dots \leq θ_{k}$ and $θ_{k + 1} = θ_{1} + 2 π$ .

3.2. Simulation Pipeline and Parameterization

All results are obtained from a discrete-time simulator implementing the second-order kinematics in (1) with explicit saturation on acceleration and speed. The simulation step is

Δ t = 0.5 s

and the horizon is

N_{max} = 1000

steps (i.e.,

T_{max} = 500 s

). The workspace is a square

[0, L] \times [0, L]

with

L = 5000 m

; boundary handling is reflective (positions are clamped and the crossed velocity component is reflected).

For UAV i, the control input in (2) is the sum of attraction to the assigned target, inter-UAV separation/spacing, and obstacle repulsion. The attraction term is

u_{i}^{att} = k_{att} (p^{T} - p_{i}), k_{att} = 2.0 .

(5)

Inter-UAV interaction uses a near-field safety repulsion within

2 D_{safe}

and a looser spacing term within the encirclement radius

R_{enc}

,

u_{i}^{sep} = \sum_{j \neq i} \{\begin{matrix} k_{sep} \frac{p_{i} - p_{j}}{{∥ p_{i} - p_{j} ∥}^{2} + ϵ_{sep}}, & ∥ p_{i} - p_{j} ∥ < 2 D_{safe}, \\ k_{form} \frac{p_{i} - p_{j}}{∥ p_{i} - p_{j} ∥ + ϵ_{form}}, & ∥ p_{i} - p_{j} ∥ < R_{enc}, \\ 0, & otherwise, \end{matrix}

(6)

with

D_{safe} = 10 m

,

R_{enc} = 200 m

,

k_{sep} = 500

,

k_{form} = 50

,

ϵ_{sep} = 0.1

, and

ϵ_{form} = 1

. Obstacles are modeled as circles and generate repulsion within a detection margin:

u_{i}^{obs} = \sum_{o \in O} 1 {∥ p_{i} - p_{o} ∥ < r_{o} + R_{\det}} k_{obs} \frac{p_{i} - p_{o}}{{(∥ p_{i} - p_{o} ∥ - r_{o} + ϵ_{obs})}^{2}},

(7)

where

k_{obs} = 1000

,

R_{\det} = 50 m

, and

ϵ_{obs} = 0.1

.

The target applies an evasion policy driven by repulsion from nearby UAVs within

R_{comm} = 500 m

, plus obstacle repulsion:

\begin{matrix} u^{T} & = k_{T} \sum_{i : ∥ p^{T} - p_{i} ∥ < R_{comm}} \frac{p^{T} - p_{i}}{{∥ p^{T} - p_{i} ∥}^{2} + ϵ_{T}} \\ + k_{T}^{obs} \sum_{o \in O} 1 {∥ p^{T} - p_{o} ∥ < r_{o} + 100} \frac{p^{T} - p_{o}}{{(∥ p^{T} - p_{o} ∥ - r_{o} + ϵ_{obs})}^{2}}, \end{matrix}

(8)

with

ϵ_{T} = 0.01

,

k_{T} = 1000

, and

k_{T}^{obs} = 5

. If no UAV is detected, the target executes a small random walk sampled from

N (0, 10^{2} I)

.

UAV maximum speed is

v_{U} = 30 m / s

, and target maximum speed is varied via the speed ratio

λ = v_{U} / v_{max}^{T}

. The acceleration saturation is

a_{max} = 50 m / s^{2}

. An episode is successful if the capture conditions in Section 3.1 are satisfied before

T_{max}

; otherwise it is counted as failure (timeout). A safety distance

D_{safe} = 10 m

is enforced; if an agent enters an obstacle’s inflated radius

(r_{o} + D_{safe})

, a reflective update is applied to prevent interpenetration.

Each episode contains

N_{obs} = 15

circular obstacles with radius

r_{o} = 120 m

. Obstacle centers are sampled uniformly and rejected if they are within

(r_{o} + 100)

meters of the initial target position or the UAV base position. The initial target position is sampled from

{[0.1 L, 0.9 L]}^{2}

. The UAV swarm is initialized around a base point sampled from

{[0.1 L, 0.9 L]}^{2}

subject to

∥ p_{base} - p_{0}^{T} ∥ > 1000 m

; individual UAV initial positions are

p_{i} (0) = p_{base} + N (0, 50^{2} I)

and clamped to

{[0, L]}^{2}

.

3.3. Mission Effectiveness Analysis Under Dynamic Speed Ratios

We define the Dynamic Force Ratio

λ

as the ratio of the UAV’s maximum speed to the target’s maximum speed (

λ = v_{max}^{U} / v_{max}^{T}

). To quantify system resilience against superior targets, we conduct Monte Carlo simulations varying

λ

from 0.5 to 2.0 in increments of 0.1. For each

(λ, N)

pair, we run

M_{EL} = 50

independent trials under the simulation settings in Section 3.2. Figure 2 illustrates the relationship between mission effectiveness (success rate) and

λ

for varying swarm sizes (

N \in {4, 5, 6, 7}

).

The experimental results (with

N \in {4, 5, 6, 7}

) indicate three practical operating regimes:

1.: Highly Effective Regime ( $λ ≳ 1.1$ ): When the UAVs are at least as fast as the target, all configurations maintain high mission effectiveness. In this region, increasing swarm size mainly improves robustness margin rather than fundamentally changing outcome. A saturation regime emerges around $λ \approx 1.6$ , beyond which further increases in the UAV/target speed ratio yield only marginal improvements.
2.: Sensitivity/Transition Regime ( $0.7 < λ ≲ 1.1$ ): Around speed parity, success rate becomes strongly dependent on swarm size. Smaller teams (e.g., $N = 4$ ) show a sharp effectiveness drop, while larger teams ( $N = 6, 7$ ) degrade more gradually and preserve substantially higher success probability through stronger cooperative enclosure.
3.: Resource-Limited Regime ( $λ ≲ 0.7$ ): When the target is substantially faster than the UAVs, effectiveness declines across all settings, but the decline is not uniform: $N = 7$ still retains non-trivial capture capability, whereas $N = 4$ approaches near-failure. This highlights that additional agents remain beneficial, yet gains become increasingly costly and scenario-dependent.

These observations reinforce the need for capability-aware resource allocation in multi-target missions. Once

λ < 1

, resilience is governed not only by how many UAVs are deployed, but by how well limited UAV resources are matched to target difficulty levels.

4. Multi-Target Mission Effectiveness with Experience Library

In this section, we analyze multi-target mission effectiveness using an Experience Library-based framework. The flowchart of the proposed algorithm is illustrated in Figure 3. We present adaptive non-uniform sampling and Experience-Library-constrained allocation, then compare mission-level success trends and trajectory-level behaviors to reveal how robustness evolves under heterogeneous targets and limited UAV resources.

4.1. Adaptive Non-Uniform Sampling Strategy

To efficiently characterize the relationship between the number of UAVs N and the mission success probability

P_{success}

while maintaining computational tractability, we employ an adaptive non-uniform sampling approach. This strategy is particularly well-suited to multi-parameter systems in which the primary analytical focus is on a single variable, the swarm size N, while other environmental or operational parameters are held fixed during the initial phase of analysis.

Let the full system parameter vector be denoted by

θ = {(N, λ, R_{s}, \dots)}^{⊤}

, where

N \in Z_{+}

represents the number of cooperating UAVs,

λ = v_{U} / v_{target}

denotes the speed ratio between the UAV and the evading target, and

R_{s}

is the sensing radius that governs the perceptual range of each agent. In the preliminary exploration stage, non-critical parameters are fixed to representative nominal values; specifically, to isolate the dominant influence of team size on capture effectiveness and to control the cost of library construction, we set

λ = 1.5

and

R_{s} = 20 m

, thereby reducing the problem to the estimation of a univariate function

P_{success} = f (N; θ_{∖ N}),

(9)

where

θ_{∖ N}

collectively denotes all parameters other than N, which remain constant throughout this evaluation.

Given that the function

f (N)

typically exhibits high sensitivity—i.e., a large magnitude of derivative—in the regime of small N, and gradually saturates as N increases (so that

\partial f / \partial N \to 0

for sufficiently large N), we adopt a non-uniform sampling design over the domain

N \in [N_{min}, N_{max}]

with

N_{min} = 3

and

N_{max} = 30

. Here, the lower bound

N_{min} = 3

is introduced only to capture the pre-threshold failure region of the success-rate curve. An initial coarse set of sampling points is defined as

N^{(0)} = {3, 6, 10, 15, 20, 25, 30} .

(10)

For each candidate

N_{i} \in N^{(k)}

at iteration k, we execute

M_{EL}

independent Monte Carlo trials of the capture mission under identical conditions. Unless otherwise stated, we use

M_{EL} = 50

for Experience Library construction. Let

S_{i}

denote the number of successful trials out of

M_{EL}

. The empirical success probability and its standard error are then estimated as

{\hat{P}}_{success} (N_{i}) = \frac{S_{i}}{M_{EL}},

(11)

\hat{SE} ({\hat{P}}_{success} (N_{i})) = \sqrt{\frac{{\hat{P}}_{success} (N_{i}) (1 - {\hat{P}}_{success} (N_{i}))}{M_{EL}}} .

(12)

For each sampled configuration (speed-ratio descriptor

λ

and candidate UAV team size N), we run

M_{EL}

independent Monte Carlo rollouts under the simulator settings in Section 3.2. Each rollout returns a binary success indicator and (if successful) the encirclement completion time. The Experience Library stores the empirical success probability

{\hat{P}}_{success} (λ, N) = S / M_{EL}

and the mean encirclement time over successful rollouts, together with the binomial standard error

SE = \sqrt{{\hat{P}}_{success} (1 - {\hat{P}}_{success}) / M_{EL}} .

(13)

To query unsampled N, we build a monotone success curve

{\hat{P}}_{success} (λ, \cdot)

over the integer domain by shape-preserving interpolation between sampled anchor points (piecewise linear/monotone scheme). For a query

(λ^{*}, N^{*})

, we return the interpolated value when

N^{*}

is between anchors and clamp to the nearest boundary value outside the sampled domain. For speed ratios not explicitly sampled, we linearly interpolate between the two nearest ratios in the library (and clamp outside the ratio range).

When uncertainty-aware allocation is required, we use a conservative lower-confidence estimate

\tilde{P} (λ, N) = max (0, \hat{P} (λ, N) - z \cdot SE (λ, N))

with

z = 1.96

and use

\tilde{P}

in the DP objective (constraints unchanged). Unless otherwise stated, the reported experimental results in this paper use the nominal estimate

\hat{P} (λ, N)

, and the uncertainty-aware variant is optional for conservative planning.

An adaptive refinement step is subsequently applied: if any interval

[N_{a}, N_{b}] \subset N^{(k)}

satisfies

|\frac{{\hat{P}}_{success} (N_{b}) - {\hat{P}}_{success} (N_{a})}{N_{b} - N_{a}}| > τ,

(14)

where

τ > 0

is a pre-specified gradient threshold that identifies regions of rapid performance change, additional sampling points, such as midpoints or locations weighted by local curvature, are inserted into that interval to form the updated set

N^{(k + 1)}

. This iterative procedure continues until the estimated curve stabilizes or a maximum simulation budget is exhausted. We set

τ = 0.015

based on a pilot sweep balancing sampling cost and curve-approximation error. For

τ \in {0.01, 0.015, 0.02, 0.03}

, the final allocation remains unchanged in the representative heterogeneous setting; we report this as a preliminary robustness check rather than a full systematic sensitivity analysis. A more complete sensitivity study should additionally report, for each

τ

, the sampled-point count, approximation error, final allocation vector, and resulting mission performance.

In contrast to exhaustive grid search, which suffers from combinatorial complexity scaling with the product of discretization levels across all parameters, the proposed adaptive scheme concentrates computational effort in regions of high information gain. More specifically, in the multi-target encirclement scenario, intervals exhibiting large gradients or high estimation uncertainty, thereby achieving an accurate approximation of

f (N)

with significantly fewer simulation runs.

4.2. Experience-Library-Constrained Knapsack Formulation

To make the resource allocation mechanism explicit, we model the multi-target assignment under the Experience Library as a multiple-choice knapsack problem (MCKP).

Assume there are n targets, each with a speed-ratio descriptor

λ_{i}

. For target i, assigning k UAVs yields a library-inferred success probability, given by

p_{i} (k) = {\hat{P}}_{success} (λ_{i}, k),

(15)

where

{\hat{P}}_{success} (\cdot)

is obtained from the Experience Library interpolation/inference model.

For each target

i \in {1, \dots, n}

, we define a discrete option set

K_{i}

consisting of candidate UAV allocations. In this paper, the candidate set is shared across targets and defined as

K_{i} = {0} \cup {4, 5, \dots, K_{max}}

, where

k = 0

denotes not assigning UAVs to target i and

K_{max} = 15

in our implementation. Selecting option

k \in K_{i}

assigns exactly k UAVs to target i and yields an estimated success reward

p_{i} (k)

from the Experience Library; the binary variable

x_{i, k} \in {0, 1}

indicates whether option k is chosen for target i. Each target must choose exactly one option from its candidate set

K_{i}

(including

k = 0

if no UAV is assigned), i.e.,

\sum_{k \in K_{i}} x_{i, k} = 1, \forall i = 1, \dots, n .

(16)

The total UAV budget is constrained by

\sum_{i = 1}^{n} \sum_{k \in K_{i}} k x_{i, k} \leq N_{tot} .

(17)

The optimization objective is

max_{x_{i, k}} \sum_{i = 1}^{n} \sum_{k \in K_{i}} p_{i} (k) x_{i, k} .

(18)

This objective is the expected number of captured targets, expressed as

E [C] = \sum_{i = 1}^{n} Pr (target i captured) \approx \sum_{i = 1}^{n} p_{i} (k_{i}),

(19)

and therefore the expected mission success rate is

E [R_{success}] = \frac{1}{n} E [C] .

(20)

We formulate the allocation objective to match what the Experience Library can estimate reliably, i.e., target-wise success probabilities as functions of the speed ratio and the assigned UAV count. This design yields a DP-based solver that is computationally efficient and supports low-latency reallocation computation under disturbances.

Hence, the planner does expected-value maximization rather than direct maximization of the joint event

Pr (\cap_{i} {capture}_{i})

. The term “mission success rate” refers to the normalized expected number of captured targets, i.e., the expected capture ratio

E [C] / n

, rather than the joint probability that all targets are captured. The latter is generally harder to estimate and optimize, while the former is tractable and aligns with the dynamic-programming implementation used in this work.

It is important to distinguish two types of coupling in the multi-target setting. The current formulation couples targets only through the shared UAV budget constraint in (24), i.e., assigning more UAVs to one target reduces the remaining budget for others. By contrast, physical or probabilistic cross-target coupling (e.g., inter-swarm collisions, communication interference, target cooperation, or correlated success/failure events) is not explicitly modeled in the Experience Library inference or in the additive objective. Therefore, statements about “cross-target coupling” in this paper refer to budget coupling rather than explicit interaction coupling in the environment.

The framework can be extended to incorporate weighted or risk-averse objectives when mission planners require minimum service levels for high-value targets. For example, a weighted objective

max \sum_{i} w_{i} p_{i} (k_{i})

can prioritize critical targets, and additional constraints such as

p_{i} (k_{i}) \geq η_{i}

can enforce target-specific service guarantees. Risk-averse variants that penalize uncertainty (e.g., confidence-adjusted

p_{i}

or chance-constrained formulations) can also be incorporated while maintaining the same DP-based optimization structure.

Let

d p^{(t)} (w)

denote the maximum cumulative reward after processing t targets with budget w. Then

d p^{(t + 1)} (w) = max_{k \in K_{t + 1}, k \leq w} \{d p^{(t)} (w - k) + p_{t + 1} (k)\},

(21)

which is consistent with the MCKP structure.

The online reallocation step consists of querying the Experience Library to obtain

{\hat{P}}_{success, i} (k)

and solving the DP-based knapsack once to update

{k_{i}}_{i = 1}^{n}

. Let n be the number of targets,

N_{tot}

the total UAV budget, and

| K |

the number of candidate UAV-count options per target. The DP solver has time complexity

O (n N_{tot} | K |)

and memory complexity

O (n N_{tot})

. Since

| K |

is a small constant in our implementation, the reallocation computation is lightweight and suitable for online re-optimization.

5. Experimental Study

In this section, we investigate the effectiveness and adaptability of the proposed Experience Library for heterogeneous multi-target missions. We first present a fixed-budget characterization of the multi-target Experience Library to reveal how mission success and encirclement efficiency vary with target-side complexity. We then study the adaptive resource reallocation behavior of the proposed method under dynamic disturbances and UAV budget reduction. The results show that the Experience Library not only captures a structured resilience landscape over the scenario space, but also supports autonomous re-optimization of UAV resources in response to changing mission conditions.

Table 1 summarizes the complete simulation/algorithm parameterization and scenario-generation rules.

5.1. Resilience and Performance Characterization of Multi-Target Experience Library

To evaluate the global coverage and robustness characteristics of the Experience Library in a multi-target setting, we visualize the sampled scenario space in Figure 4. In this experiment, the total UAV budget is fixed at

N_{UAV} = 50

for all sampled scenarios, so that performance variation is attributed to target-side complexity (number of targets and speed ratio) rather than resource-scale changes. Each bubble corresponds to one sampled scenario from the adaptive non-uniform process; the horizontal axis is the number of targets

N_{t} \in {10, 11, \dots, 17}

, the vertical axis is the mean UAV/target speed ratio

\bar{λ} \in {0.6, 0.75, 0.9, 1.05, 1.2, 1.35}

, bubble color denotes average success rate, and bubble size denotes average encirclement time. The color scale starts near 0.60 because the sampled scenarios in our Experience Library dataset do not contain success rates below this level.

Let one sampled scenario be denoted by

x_{i} = (N_{t, i}, {\bar{λ}}_{i}),

(22)

where

N_{t, i}

is the number of targets and

{\bar{λ}}_{i}

is the mean speed ratio. For each point, the library stores

E_{i} = ({\hat{P}}_{success, i}, {\hat{T}}_{encircle, i}),

(23)

where

{\hat{P}}_{success, i}

is the empirical mission success rate and

{\hat{T}}_{encircle, i}

is the empirical average encirclement time.

The resulting map reveals three consistent tendencies. First, mission effectiveness degrades as the mean speed ratio decreases, especially near the lower-right region where targets are both numerous and difficult. Second, high-success regions (green/yellow bubbles) concentrate in higher-

λ

and lower-complexity areas, indicating that the Experience Library captures a structured operating envelope rather than isolated points. Third, larger bubbles are mostly distributed in harder regions, suggesting that successful completion in high-difficulty scenarios requires longer coordination time even when capture remains feasible.

Overall, this section provides a compact scenario-level characterization of multi-target resilience under a fixed UAV budget and serves as a basis for comparing allocation strategies under heterogeneous difficulty profiles.

Figure 4. Multi-target Experience Library visualization with fixed UAV budget (

N_{UAV} = 50

). Each bubble denotes a sampled scenario; color represents the average success rate, and bubble size indicates the average encirclement time.)

Figure 4. Multi-target Experience Library visualization with fixed UAV budget (

N_{UAV} = 50

). Each bubble denotes a sampled scenario; color represents the average success rate, and bubble size indicates the average encirclement time.)

5.2. Adaptive Resource Reallocation

To investigate the internal decision-making process of the proposed Experience Library (EL) method, we conducted experiments to analyze the mechanism under dynamic environmental conditions. This experiment visualizes how the UAV allocation matrix evolves in response to sudden disturbances and resource constraints, demonstrating the system’s capability for autonomous “triage” and strategic re-optimization.

5.2.1. Experimental Setup

The experiment simulates a continuous mission with

n = 10

targets, divided into three sequential phases (

T_{A}, T_{B}, T_{C}

) to introduce discrete state changes:

Phase A (Normal State): A swarm of $N = 60$ UAVs engages targets with mixed difficulty levels. Targets $T_{0} - T_{2}$ are Easy ( $λ = 1.6$ ), $T_{3} - T_{6}$ are Medium ( $λ = 1.0$ ), and $T_{7} - T_{9}$ are Hard ( $λ = 0.7$ ).
Phase B (Disturbance): A sudden threat escalation occurs. The speed ratios of $T_{0}, T_{1}, T_{2}$ simultaneously decrease from $1.6$ to $0.7$ (Very Hard), drastically raising the resource demand.
Phase C (Constraint): The mission encounters a severe resource shortage. The total number of available UAVs is reduced by 25% ( $N \to 45$ ), forcing the system to operate under strict constraints.

5.2.2. Global Resource Flow Analysis

Figure 5 presents the global allocation heatmap across the three phases. In Phase A, the EL method distributes resources proportionally to target difficulty, ensuring baseline coverage for most targets. The zero allocation for

T_{9}

is an intentional triage decision rather than an inconsistency: under the fixed budget, the optimizer prefers to concentrate UAVs on targets with higher expected marginal return and temporarily leave one hard target uncovered. In Phase B, the system detects a velocity surge in the first three targets. The heatmap reveals a distinct selective engagement strategy: the system concentrates significant resources on

T_{0}

(increasing from five to nine UAVs) to guarantee its capture, while strategically reducing allocation to

T_{1}

and

T_{2}

to zero. This “triage” behavior prevents a system-wide failure that would occur if limited resources were spread thinly across all three high-speed targets. In Phase C, faced with a budget cut, the algorithm prioritizes the “Medium” difficulty group (

T_{3} - T_{6}

), which offers the highest marginal success probability per UAV. Consequently, resources are withdrawn from the most “expensive” targets (including

T_{0}

), demonstrating that the method maximizes global mission success rate rather than maintaining individual target persistence.

5.2.3. Micro-Level Evolution (Target $T_{0}$ )

To intuitively illustrate this adaptive response, Figure 6 details the evolution of the spatial allocation specifically for Target

T_{0}

.

Normal State: In the initial state, $T_{0}$ is an easy target, and a small, efficient formation of five UAVs is assigned to orbit it.
Disturbance Response: When the speed ratio of $T_{0}$ drops from $λ = 1.6$ to $λ = 0.7$ , the EL algorithm dynamically recalculates the required force. Recognizing the increased difficulty, the number of allocated UAVs increases to nine, forming a dense encirclement to counter the target’s higher relative maneuverability.
Dynamic Adjustment: In Phase C, as the global budget tightens, the cost to capture $T_{0}$ (nine UAVs) becomes prohibitively high relative to its contribution to the global success score. The system autonomously decides to abandon $T_{0}$ (zero allocation of UAV resources) to redirect those valuable assets to safeguard the capture of multiple medium-difficulty targets.

Figure 6. Evolution of allocation strategy for Target

T_{0}

. The visualization highlights the transition from a standard engagement (Phase A) to a heavy-resource surge (Phase B), and finally to strategic abandonment under constraint (Phase C).

Figure 6. Evolution of allocation strategy for Target

T_{0}

. The visualization highlights the transition from a standard engagement (Phase A) to a heavy-resource surge (Phase B), and finally to strategic abandonment under constraint (Phase C).

5.3. Comparison Experiment

To evaluate the proposed Experience Library strategy under heterogeneous target conditions, we conduct a comparative experiment with three allocation strategies: Experience Library, Greedy Marginal Gain, and Uniform Allocation. The target set contains mixed-difficulty agents (easy/medium/hard), and the total UAV budget is varied to characterize effectiveness transitions under limited resources.

The three allocation strategies are defined as follows. The proposed Experience Library strategy first queries the target-wise success-rate curves from the Experience Library and then solves a budget-constrained global allocation problem to maximize the expected number of captured targets under a fixed total UAV budget. The greedy marginal-gain baseline starts from zero allocation for all targets and repeatedly assigns UAVs to the target that yields the largest marginal success improvement per additional UAV. Since successful encirclement requires at least four UAVs, an unassigned target is activated by first allocating four UAVs, and then additional UAVs are added one by one according to the current marginal gain. The Uniform Allocation baseline is the rule-based baseline used in this paper: given total budget N and n targets, each target first receives

⌊ N / n ⌋

UAVs, and the remaining

N mod n

UAVs are then distributed one by one across targets up to integer rounding. This baseline is non-adaptive and ignores target difficulty.

5.3.1. Experimental Protocol

We evaluate the three strategies under identical heterogeneous-target settings. A trial returns the mission success rate defined as the fraction of targets captured within the time horizon

T_{max}

. For the trajectory-level comparison in Figure 7, all methods are tested in the same fixed heterogeneous scenario so that the difference comes only from the allocation strategy. For the statistical comparison in Figure 8, to avoid circularity we evaluate allocations using an independent simulator-based success table: we estimate

p (λ, k)

from Monte Carlo rollouts of the APF-based simulator for each speed-ratio level and UAV count option, and then compute the mission success rate by sampling per-target capture outcomes based on these simulator-estimated probabilities. This statistical comparison therefore validates the proposed allocation framework under the paper’s target-wise, budget-coupled modeling assumptions; it does not constitute simultaneous end-to-end multi-swarm rollouts in a shared environment, and does not capture cross-target physical/probabilistic interactions (e.g., inter-swarm collisions, communication interference, or correlated outcomes). For each tested total UAV budget N, we perform

M_{cmp} = 20

independent trials and report the mean mission success rate across trials; error bars indicate 95% bootstrap confidence intervals.

The heterogeneous target set contains

n = 15

targets, including five easy, five medium, and five hard targets. Their speed ratios are sampled from three Gaussian clusters:

λ

∼

N (1.6, 0 . 02^{2})

(easy),

λ

∼

N (1.0, 0 . 02^{2})

(medium), and

λ

∼

N (0.7, 0 . 02^{2})

(hard), and then shuffled to avoid ordering bias. For reproducibility, the random seed is fixed to 42 when generating this target set.

In the quantitative comparison, we evaluate all methods on an identical discrete budget grid

N \in {30, 35, 40, 45, 50, 55, 60, 65, 70, 75}

, with a denser coverage around the transition region. The reported error bars are 95% bootstrap confidence intervals computed from the

M_{cmp} = 20

trial outcomes using 1000 bootstrap resamples, and are reported for all methods at all budget points. We interpret a difference of 0.05 in mission success rate as practically meaningful in this setting, since it corresponds to about

0.05 \times 15 = 0.75

additional captured targets in expectation of the

n = 15

-target scenario.

5.3.2. Runtime and Scalability of Online Reallocation

We report the wall-clock latency of a single reallocation (Experience Library query + one DP solve) under different numbers of targets n and total UAV budgets

N_{tot}

. Each setting is repeated 200 times and we report the median and the 95th percentile runtime. The results in Table 2 show that the reallocation latency remains at the millisecond level and scales approximately linearly with n and

N_{tot}

, supporting fast online re-optimization. We emphasize that this runtime analysis covers only the allocation module; end-to-end real-time deployment also depends on sensing, communication, controller execution, replanning frequency, and multi-agent coordination latency, which are not benchmarked in this manuscript.

5.3.3. Trajectory-Level Interpretation

To explain the performance gap at the trajectory level, we further compare rollout trajectories in the same heterogeneous scenario with

N = 50

UAVs, as shown in Figure 7. Gray lines denote UAV trajectories, and blue lines denote the trajectories of captured targets. The Experience Library strategy produces a more spatially distributed pursuit pattern and covers more target regions within the same resource budget. The greedy marginal-gain baseline achieves a comparable global trend but still tends to prioritize locally profitable assignments, leading to less balanced spatial coverage. In contrast, the Uniform Allocation baseline allocates UAVs uniformly across targets regardless of target difficulty, which results in weaker support for hard targets and more localized engagement. These trajectory patterns are consistent with the quantitative comparison: under heterogeneous target difficulty, mission resilience depends not only on the fleet size but also on whether the allocation policy matches resources to target-specific capture demand.

Figure 7. Trajectory comparison in the same heterogeneous scenario with 50 UAVs. Gray lines represent UAV trajectories, and blue lines represent the trajectories of captured targets. From left to right: Experience Library, Greedy Marginal Gain, and Uniform Allocation.

5.3.4. Quantitative Comparison

Figure 8 reports the mission success rate under heterogeneous targets for different total UAV budgets on the denser grid

N = 30 : 5 : 75

. The proposed Experience Library strategy remains the best-performing method across the tested budget range and shows a smoother improvement trend from low to high budgets. The greedy marginal-gain baseline provides a stronger comparison than the Uniform Allocation rule, but it remains below the proposed method because it optimizes only the current local gain and does not perform global allocation over the full target set. The Uniform Allocation baseline exhibits a clear threshold-like transition: its success rate stays near zero in the low-budget regime and increases sharply only after the budget becomes sufficient, indicating that equal-share allocation is prone to severe under-allocation for difficult targets. Overall, the updated figure further supports that capability-aware global allocation yields more stable performance in the critical transition region. Error bars indicate 95% bootstrap confidence intervals over

M = 20

Monte Carlo trials.

Figure 8. Mission success rate comparison under heterogeneous targets on the denser budget grid

N = 30 : 5 : 75

. Error bars indicate 95% bootstrap confidence intervals over

M = 20

Monte Carlo trials.

Figure 8. Mission success rate comparison under heterogeneous targets on the denser budget grid

N = 30 : 5 : 75

. Error bars indicate 95% bootstrap confidence intervals over

M = 20

Monte Carlo trials.

5.4. Discussion

The above experimental study provides complementary evidence for the proposed Experience Library from three perspectives: (i) scenario-space characterization under a fixed UAV budget, (ii) adaptivity under dynamic disturbances and constraints, and (iii) comparative performance against rule-based and greedy baselines under heterogeneous targets.

Scenario-space characterization. The Experience Library map (Figure 4) offers a compact view of how mission outcomes vary with target-side complexity when the UAV budget is fixed (

N_{UAV} = 50

). Two systematic trends are observed. First, decreasing the mean speed ratio generally reduces mission success, especially when the number of targets is also large, indicating that target-side mobility dominance is a primary limiting factor. Second, the bubble-size distribution suggests that even in successful cases, encirclement time grows in high-difficulty regions, which implies that resilience is not solely about feasibility (success/failure) but also about efficiency (how quickly coordination converges). This map therefore functions as a structured operating envelope, summarizing where the system is both effective and time-efficient.

Adaptivity. The adaptive reallocation experiment (Figure 5 and Figure 6) demonstrates how the proposed method responds to abrupt environment changes without re-training. When target difficulty increases (Phase B) or the total UAV budget is reduced (Phase C), the allocation matrix is re-optimized in a capability-aware manner. Importantly, the heatmap indicates that the method does not attempt to preserve uniform per-target effort; instead, it reallocates resources to maximize the global mission score. This leads to a triage-like behavior: feasible high-utility targets receive allocation surges, while infeasible or low-return targets may be temporarily dropped. The target-level visualization for

T_{0}

further clarifies that the same target can shift from normal engagement to intensified pursuit and eventually to abandonment when its capture becomes globally inefficient under tighter budgets. This behavior aligns with the objective of avoiding redundant consumption and improving resilience continuity under changing conditions.

Comparative implications under heterogeneity. The comparison experiment (Figure 8) highlights that allocation quality is a key determinant of performance in heterogeneous missions. The Uniform Allocation baseline exhibits a threshold-like transition: under low budgets it fails to provide sufficient force for difficult targets, and the overall mission success remains near zero until the budget enters a sufficient-resource regime. In contrast, the Experience Library strategy improves more smoothly with increasing budget, suggesting that the learned capability curves and the dynamic programming-based assignment reduce severe under-allocation.

Takeaway. Overall, these results indicate that the Experience Library plays two roles: it serves as a resilience model (mapping tested target-side difficulty factors to expected success/time), and it enables fast adaptive re-optimization of UAV resources in dynamic settings. Together, these properties support the paper’s goal of task lightweighting, where the swarm size and force ratio can be adjusted to match mission difficulty, mitigating unnecessary deployment while maintaining robust performance.

6. Conclusions

Multi-target UAV encirclement is inherently adversarial and resource-constrained. In practical deployment, determining how many UAVs should be assigned to heterogeneous targets is a key issue, yet this problem remains difficult due to coupled target dynamics, uncertain capture outcomes, and non-uniform difficulty across scenarios. In this work, we addressed this challenge from a resilience-oriented perspective by introducing an Experience Library-based allocation framework. The code and data supporting the findings of this study are publicly available at https://gitee.com/lqt20/multi-uav (accessed on 11 May 2026).

Our experiments reveal clear and actionable patterns. First, the adaptive non-uniform sampling strategy efficiently characterizes high-sensitivity regions of the success landscape while reducing unnecessary simulation cost. Second, under heterogeneous targets, the Experience Library policy provides a more stable low-to-mid budget performance profile, while the Uniform Allocation baseline exhibits threshold-like behavior with abrupt transitions. Third, trajectory-level evidence shows that capability-aware allocation leads to broader and more balanced capture coverage, whereas uniform assignment tends to produce localized engagement and lower robustness in constrained regimes. To operationalize these observations, we formulate resource assignment as an Experience-Library-constrained MCKP optimization, maximizing expected capture effectiveness under a fixed UAV budget.

These findings indicate that resilient multi-target encirclement is governed not only by fleet size, but by how well resources are matched to target capability. A few limitations remain. The current formulation relies on target-wise marginal success estimates and an additive expected-capture objective, which improves computational tractability but does not explicitly model cross-target dependence, coupled uncertainty, or tail-risk events; such effects may become more significant in dense obstacle fields and other strongly coupled scenarios. The Experience Library also depends on the coverage of sampled scenarios and may lose accuracy under out-of-distribution conditions. In addition, real-world online latency is influenced not only by the DP solve but also by perception, communication, and control constraints. Future work will explore dynamic library updates, uncertainty-aware probability estimation, generative-data–physical-modeling integration, and real-time reallocation under communication degradation, target behavior shifts, and agent failures.

Author Contributions

Conceptualization, M.L.; methodology, Q.L.; software, Q.L.; validation, Q.L. and M.L.; formal analysis, Q.L.; investigation, Q.L.; resources, X.Z.; data curation, Q.L.; writing—original draft preparation, Q.L.; writing—review and editing, M.L. and X.Z.; visualization, Q.L.; supervision, X.Z.; project administration, X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by Xianqiang Zhu.

Data Availability Statement

The data presented in this study are openly available in a public repository at https://gitee.com/lqt20/multi-uav (accessed on 11 May 2026).

DURC Statement

Current research is limited to the field of unmanned aerial vehicle mission planning, which has innovative academic value and engineering application significance in intelligent cooperative decision-making, and does not pose a threat to public health or national security. Authors acknowledge the dual-use potential of the research involving multi-UAV cooperative control and mission assignment and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, authors strictly adhere to relevant national and international laws about DURC. Authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting, mitigating misuse risks and foster beneficial outcomes.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhang, F.; Shao, X.; Xia, Y.; Zhang, W. Elliptical encirclement control capable of reinforcing performances for UAVs around a dynamic target. Def. Technol. 2024, 32, 104–119. [Google Scholar] [CrossRef]
Zhang, C.; Zeng, R.; Lin, B.; Zhang, Y.; Xie, W.; Zhang, W. Multi-USV cooperative target encirclement through learning-based distributed transferable policy and experimental validation. Ocean. Eng. 2025, 318, 120124. [Google Scholar] [CrossRef]
Liu, F.; Yuan, S.; Cao, K.; Meng, W.; Xie, L. Distance-Based Multiple Noncooperative Ground Target Encirclement for Complex Environments. IEEE Trans. Control Syst. Technol. 2025, 33, 261–273. [Google Scholar] [CrossRef]
Zhao, X.; Tan, J.; Meng, W.; Yu, Z.; Yan, Y.; Zhang, Z. Cooperative Encirclement and Obstacle Avoidance of Fixed-Wing UAVs via MADDPG with Curriculum Learning. Drones 2025, 9, 727. [Google Scholar] [CrossRef]
Yang, Z.; Cui, Y.; Li, Y. Transformer-Based Cooperative UAV Encirclement Policies Under Uncertainty in Low-Altitude Wireless Networks. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 3525–3537. [Google Scholar] [CrossRef]
Chen, J.; Wang, Y.; Zhang, Y.; Lu, Y.; Shu, Q.; Hu, Y. Extrinsic-and-Intrinsic Reward-Based Multi-Agent Reinforcement Learning for Multi-UAV Cooperative Target Encirclement. IEEE Trans. Intell. Transp. Syst. 2025, 26, 17653–17665. [Google Scholar] [CrossRef]
Tillman, M. Optimizing force ratios to develop a course of action for the G3 (operations officer). Math. Comput. Model. 1996, 23, 55–63. [Google Scholar] [CrossRef]
Fan, S.; Li, W.; Xu, C. Research on Optimization Algorithm of Air Defense Force Operation Method Based on Multi-Combat Environment. In Proceedings of the 2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE); IEEE: New York, NY, USA, 2024; pp. 1517–1522. [Google Scholar] [CrossRef]
Pedroso, C.; Uehara de Moraes, Y.; Nogueira, M.; Santos, A. Managing Consensus-Based Cooperative Task Allocation for IIoT Networks. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Zhou, J.; Zhao, X.; Zhang, X.; Zhao, D.; Li, H. Task allocation for multi-agent systems based on distributed many-objective evolutionary algorithm and greedy algorithm. IEEE Access 2020, 8, 19306–19318. [Google Scholar] [CrossRef]
Choi, H.L.; Brunet, L.; How, J.P. Consensus-Based Decentralized Auctions for Robust Task Allocation. IEEE Trans. Robot. 2009, 25, 912–926. [Google Scholar] [CrossRef]
Mercker, T.; Casbeer, D.W.; Millet, P.T.; Akella, M.R. An extension of consensus-based auction algorithms for decentralized, time-constrained task assignment. In Proceedings of the 2010 American Control Conference; IEEE: New York, NY, USA, 2010; pp. 6324–6329. [Google Scholar] [CrossRef]
Das, G.P.; McGinnity, T.M.; Coleman, S.A.; Behera, L. A fast distributed auction and consensus process using parallel task allocation and execution. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: New York, NY, USA, 2011; pp. 4716–4721. [Google Scholar] [CrossRef]
Sujit, P.B.; Sinha, A.; Ghose, D. Multiple UAV task allocation using negotiation. In AAMAS ’06: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems; Association for Computing Machinery: New York, NY, USA, 2006; pp. 471–478. [Google Scholar] [CrossRef]
Whitbrook, A.; Meng, Q.; Chung, P.W.H. A novel distributed scheduling algorithm for time-critical multi-agent systems. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2015; pp. 6451–6458. [Google Scholar] [CrossRef]
Sampedro, C.; Bavle, H.; Sanchez-Lopez, J.L.; Fernández, R.A.S.; Rodríguez-Ramos, A.; Molina, M.; Campoy, P. A flexible and dynamic mission planning architecture for UAV swarm coordination. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: New York, NY, USA, 2016; pp. 355–363. [Google Scholar] [CrossRef]
Wang, S.; Liu, Y.; Qiu, Y.; Zhou, J. Consensus-Based Decentralized Task Allocation for Multi-Agent Systems and Simultaneous Multi-Agent Tasks. IEEE Robot. Autom. Lett. 2022, 7, 12593–12600. [Google Scholar] [CrossRef]
Wang, S.; Liu, Y.; Qiu, Y.; Li, S.; Zhou, J. An Efficient Distributed Task Allocation Method for Maximizing Task Allocations of Multirobot Systems. IEEE Trans. Autom. Sci. Eng. 2024, 21, 3588–3602. [Google Scholar] [CrossRef]
Yan, S.; Feng, J.; Pan, F. A Distributed Task Allocation Method for Multi-UAV Systems in Communication-Constrained Environments. Drones 2024, 8, 342. [Google Scholar] [CrossRef]
Zhang, C.; Xu, C.; Li, G.; He, B. A distributed task allocation approach for multi-UAV persistent monitoring in dynamic environments. Sci. Rep. 2025, 15, 6437. [Google Scholar] [CrossRef] [PubMed]
Hady, M.A.; Hu, S.; Pratama, M.; Kowalczyk, Z.C.R. Multi-agent reinforcement learning for resources allocation optimization: A survey. Artif. Intell. Rev. 2025, 58, 354. [Google Scholar] [CrossRef]
Chen, F.; Ren, W.; Cao, Y. Surrounding control in cooperative agent networks. Syst. Control Lett. 2010, 59, 704–712. [Google Scholar] [CrossRef]
Kim, T.H.; Hara, S.; Hori, Y. Cooperative control of multi-agent dynamical systems in target-enclosing operations using cyclic pursuit strategy. Int. J. Control 2010, 83, 2040–2052. [Google Scholar] [CrossRef]
Lan, Y.; Yan, G.; Lin, Z. Distributed control of cooperative target enclosing based on reachability and invariance analysis. Syst. Control Lett. 2010, 59, 381–389. [Google Scholar] [CrossRef]
Franchi, A.; Stegagno, P.; Oriolo, G. Decentralized multi-robot encirclement of a 3D target with guaranteed collision avoidance. Auton. Robot. 2016, 40, 245–265. [Google Scholar] [CrossRef]
Hafez, A.; Iskandarani, M.; Givigi, S.; Yousefi, S.; Beaulieu, A. UAVs in Formation and Dynamic Encirclement via Model Predictive Control. IFAC Proc. Vol. 2014, 47, 1241–1246. [Google Scholar] [CrossRef]
Hafez, A.T.; Iskandarani, M.; Givigi, S.N.; Yousefi, S.; Noureldin, A.; Beaulieu, A. Encirclement of moving target using linear model predictive control via feedback linearization. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: New York, NY, USA, 2014. [Google Scholar]
Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Zhang, T.; Liu, Z.; Pu, Z.; Yi, J. Multi-Target Encirclement with Collision Avoidance via Deep Reinforcement Learning using Relational Graphs. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2022; pp. 8794–8800. [Google Scholar] [CrossRef]
Qu, X.; Li, C.; Jiang, S.; Liu, G.; Zhang, R. Multi-Agent Reinforcement Learning-Based Cooperative Encirclement Control of Autonomous Surface Vehicles Against Multiple Targets. J. Mar. Sci. Eng. 2025, 13, 1558. [Google Scholar] [CrossRef]
Xing, L.; Johnson, B.W. Reliability Theory and Practice for Unmanned Aerial Vehicles. IEEE Internet Things J. 2023, 10, 3548–3566. [Google Scholar] [CrossRef]
Feng, Q.; Liu, M.; Sun, B.; Dui, H.; Hai, X.; Ren, Y.; Lu, C.; Wang, Z. Resilience Measure and Formation Reconfiguration Optimization for Multi-UAV Systems. IEEE Internet Things J. 2024, 11, 10616–10626. [Google Scholar] [CrossRef]
Wang, D.; Zong, Q.; Tian, B.; Lu, H.; Wang, J. Adaptive finite-time reconfiguration control of unmanned aerial vehicles with a moving leader. Nonlinear Dyn. 2019, 95, 1099–1116. [Google Scholar] [CrossRef]
Yuan, H.; Li, R.; Wang, L.; Ren, B.; Chen, T.; Guo, D. Toward resilient communication architecture: Online network reconfiguration for UAV failure. Comput. Netw. 2025, 263, 111210. [Google Scholar] [CrossRef]
Li, H. Robotic Path Planning Strategy Based on Improved Artificial Potential Field. In Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering (ICAICE); IEEE: New York, NY, USA, 2020; pp. 67–71. [Google Scholar] [CrossRef]
Liu, C.; Qiming, M.; Yuhong, P.; Huakui, W. Unmanned Maneuvering Force Route Recommendation Method Based on Improved Artificial Potential Field Method. In Proceedings of the 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC); IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. Multi-UAV target encirclement scenario.

Figure 2. Mission effectiveness versus UAV/target speed ratio.

Figure 3. The flow chart of the Experience Library-based multi-target UAV resource allocation algorithm.

Figure 5. Heatmap of the UAV allocation matrix evolution. The cell values indicate the number of allocated UAVs, and values in parentheses denote the current UAV/target speed ratio. Darker colors represent higher resource concentration.

Table 1. Reproducibility summary: simulator settings, scenario generation, and experimental protocol.

Item	Setting/Rule
Time step	$Δ t = 0.5 s$
Horizon	$N_{max} = 1000$ steps ( $T_{max} = 500 s$ )
Workspace	$[0, 5000] \times [0, 5000]$ m, reflective boundary handling
UAV max speed	$v_{U} = 30 m / s$
Target max speed	$v_{max}^{T} = v_{U} / λ$ (speed ratio $λ = v_{U} / v_{max}^{T}$ )
Acceleration limit	$a_{max} = 50 m / s^{2}$
Encirclement radius	$R_{enc} = 200$ m
Interaction radius	$R_{comm} = 500$ m
Safety distance	$D_{safe} = 10$ m
Obstacles	$N_{obs} = 15$ circles, radius $r_{o} = 120$ m
Obstacle sampling	Uniform in workspace; reject near initial target/base (margin $r_{o} + 100$ m)
Initial target position	Uniform in ${[0.1 L, 0.9 L]}^{2}$
UAV base position	Uniform in ${[0.1 L, 0.9 L]}^{2}$ , with $∥ p_{base} - p_{0}^{T} ∥ > 1000$ m
UAV initial jitter	$p_{i} (0) = p_{base} + N (0, 50^{2} I)$ , then clamp to workspace
APF gains	$k_{att} = 2.0$ , $k_{sep} = 500$ , $k_{form} = 50$ , $k_{obs} = 1000$
APF epsilons	$ϵ_{sep} = 0.1$ , $ϵ_{form} = 1$ , $ϵ_{obs} = 0.1$
Obstacle detection margin	$R_{\det} = 50$ m
Target evasion	$k_{T} = 1000$ , $k_{T}^{obs} = 5$ , $ϵ_{T} = 0.01$ ; random walk if no UAV detected
Capture success	Capture criteria in Section 3.1; timeout counts as failure
Monte Carlo trials	Experience Library construction: $M_{EL} = 50$ per sampled $(λ, N)$ ; comparison curves: $M_{cmp} = 20$ per budget point
Comparison grid	$N \in {30, 35, 40, 45, 50, 55, 60, 65, 70, 75}$
Random seeds	Heterogeneous target-set seed=42; fixed trajectory scenario seed=2026; Monte Carlo trials use independent RNG streams
Adaptive sampling	Gradient threshold $τ = 0.015$ ; anchor budgets $N = {30, 45, 60, 75}$

Table 2. Online reallocation latency (Experience Library query + DP solve). Each entry reports median/p95 (ms) over 200 runs.

n (Targets)	$N_{tot}$ (UAVs)	Runtime (ms)
10	50	0.76/1.34
15	50	1.22/2.35
20	50	1.61/2.55
30	50	2.49/4.23
15	100	2.34/3.98
20	100	3.28/5.58
30	100	5.08/7.52
30	150	7.56/12.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Q.; Li, M.; Zhu, X. Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement. Drones 2026, 10, 406. https://doi.org/10.3390/drones10060406

AMA Style

Liu Q, Li M, Zhu X. Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement. Drones. 2026; 10(6):406. https://doi.org/10.3390/drones10060406

Chicago/Turabian Style

Liu, Qiting, Meixuan Li, and Xianqiang Zhu. 2026. "Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement" Drones 10, no. 6: 406. https://doi.org/10.3390/drones10060406

APA Style

Liu, Q., Li, M., & Zhu, X. (2026). Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement. Drones, 10(6), 406. https://doi.org/10.3390/drones10060406

Article Menu

Adaptive Force Ratio Allocation for Multi-UAV Cooperative Multi-Target Encirclement

Highlights

Abstract

1. Introduction

2. Related Works

2.1. Adaptive Resource Allocation in Multi-UAV Systems

2.2. Mission Effectiveness Analysis for Encirclement and Pursuit–Evasion

2.3. Position of This Work

3. Single-Target Mission Effectiveness Analysis

3.1. System Description

3.2. Simulation Pipeline and Parameterization

3.3. Mission Effectiveness Analysis Under Dynamic Speed Ratios

4. Multi-Target Mission Effectiveness with Experience Library

4.1. Adaptive Non-Uniform Sampling Strategy

4.2. Experience-Library-Constrained Knapsack Formulation

5. Experimental Study

5.1. Resilience and Performance Characterization of Multi-Target Experience Library

5.2. Adaptive Resource Reallocation

5.2.1. Experimental Setup

5.2.2. Global Resource Flow Analysis

5.2.3. Micro-Level Evolution (Target T 0 )

5.3. Comparison Experiment

5.3.1. Experimental Protocol

5.3.2. Runtime and Scalability of Online Reallocation

5.3.3. Trajectory-Level Interpretation

5.3.4. Quantitative Comparison

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

DURC Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2.3. Micro-Level Evolution (Target $T_{0}$ )