Next Article in Journal
RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs
Next Article in Special Issue
Geometric Control with Decoupled Yaw for Quadrotor Cable-Suspended Payload Transportation with Viewpoint Control
Previous Article in Journal
QoS-Aware Downlink Paging Control for UAV-Assisted 5G-Advanced Networks with On-Demand Coverage
Previous Article in Special Issue
Research on Simultaneous Arrival Route Planning for UAV Clusters Based on an Improved NSGA-III Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Integrated Decision-Control Cooperative Target Assignment for Cross-Domain Unmanned Systems Based on a Bi-Level Optimization Framework

1
Air Traffic Control and Navigation School, Air Force Engineering University, Xi’an 710038, China
2
Shaanxi Key Laboratory of Meta-Synthesis for Electronic and Information System, Air Force Engineering University, Xi’an 710038, China
3
National Key Laboratory of Digital Intelligence Empowerment for Aeronautical Equipment, Air Force Engineering University, Xi’an 710038, China
*
Author to whom correspondence should be addressed.
Drones 2026, 10(3), 193; https://doi.org/10.3390/drones10030193
Submission received: 10 January 2026 / Revised: 1 March 2026 / Accepted: 4 March 2026 / Published: 10 March 2026

Highlights

What are the main findings?
  • A bi-level optimization framework is proposed that tightly integrates task assignment with optimal control, ensuring kinematic feasibility and smooth trajectory generation for heterogeneous unmanned platforms.
  • The proposed method significantly reduces the maximum mission completion time compared to traditional Euclidean-distance-based assignment approaches, as validated through simulations.
What are the implications of the main findings?
  • This study provides an integrated “decision-control” paradigm that bridges the gap between high-level planning and low-level execution, enhancing the practicality and performance of cross-domain unmanned swarm operations.
  • The framework supports future research in dynamic task reassignment, robust multi-objective optimization, and distributed solving architectures for large-scale unmanned systems.

Abstract

Addressing prevalent challenges in current cooperative task assignment methods for cross-domain unmanned swarm, such as the disconnection between decision-making and execution processes, and the inadequate incorporation of platform kinematic constraints, this study introduces an integrated decision-control cooperative task assignment approach based on a bi-level optimization framework. The proposed framework formulates a bi-level programming model that tightly couples upper-level task assignment with lower-level optimal control. The upper-level model aims to minimize the maximum task completion time by optimizing the assignment and visitation sequences of diverse target types across heterogeneous unmanned platforms. The lower-level model, given the task sequences from the upper level, addresses a minimum-time optimal control problem based on a comprehensive nonlinear kinematic model. This approach enables precise computation of task execution times, which are subsequently fed back to the decision-making layer, thereby establishing a closed-loop optimization mechanism. To solve this complex model efficiently, the lower-level employs differential flatness transformation to eliminate trigonometric functions in the kinematic equations and discretizes the continuous-time optimal control problem into a nonlinear programming problem via the Radau pseudospectral method. For the upper-level combinatorial optimization, an improved genetic algorithm is developed, integrating hybrid encoding, dual-archive elitism preservation, adaptive crossover and mutation strategies, and periodic local search. Simulation results demonstrate that, compared with traditional Euclidean-distance-based assignment methods, the proposed approach generates kinematically feasible and smooth trajectories while thoroughly accounting for the kinematic constraints of heterogeneous platforms, thereby demonstrating its effectiveness and superiority in improving the comprehensive mission performance of cross-domain unmanned swarms.

1. Introduction

1.1. Motivation

In recent years, unmanned systems have been extensively utilized in complex operations such as military reconnaissance, coordinated strikes, and disaster response, due to their significant advantages including cost-effectiveness, high mobility, and the capacity to function without endangering human lives. In expansive or heterogeneous environments, the capabilities of individual platforms are often constrained. Therefore, multiple unmanned systems are required to be organized into swarms to cooperatively execute missions, which has stimulated considerable research interest in cross-domain collaboration and swarm intelligence [1,2,3,4,5,6]. Within this context, the cooperative task assignment for multiple unmanned systems represents a significant challenge in swarm intelligent decision-making. This process involves allocating suitable task sequences to heterogeneous platforms within the swarm to maximize overall system effectiveness. The quality of the solutions to this problem critically influences the efficiency of mission execution and the success rate of the swarm.
Conventional task assignment methodologies commonly represent unmanned systems as point masses capable of instantaneous directional changes and uniform velocity transitions between task locations, and optimize based on this idealized abstraction [7,8]. This simplification presents a fundamental limitation when applied to platforms exhibiting complex kinematic constraints, such as fixed-wing unmanned aerial vehicles that are restricted by minimum turning radius and continuous acceleration requirements. Consequently, the generated “optimal” geometric trajectories are often dynamically infeasible. Enforcing adherence to such trajectories can induce platform instability or loss of control. Moreover, the application of local trajectory smoothing subsequent to task assignment complicates the assurance of global optimality and may undermine the integrity of the initial assignment strategy. This disconnect between high-level planning and low-level control constitutes a significant impediment to the practical deployment of task assignment frameworks on real-world platforms.
Therefore, how to organically incorporate the kinematic constraints of platforms into the task assignment decision-making process, establishing an integrated cooperative planning paradigm in which “an assigned plan is inherently executable”, constitutes a critical issue that requires immediate attention to improve the overall effectiveness of unmanned swarms operating in dynamic and complex environments. Investigations into integrated “decision-control” task assignment methodologies that explicitly consider kinematic constraints possess substantial theoretical significance and practical applicability.

1.2. Literature Review

Research on cooperative task assignment has predominantly advanced through two primary approaches: (1) task assignment based on combinatorial optimization techniques, and (2) sequential or integrated optimization frameworks that combine task assignment with trajectory planning.
The predominant methodology frequently simplifies the problem by representing it through classical mathematical frameworks, including the Multiple Traveling Salesman Problem (MTSP) [7], the Bin Packing Problem [9], and the Vehicle Routing Problem (VRP) [10]. Solutions to these models are typically obtained via mixed-integer programming techniques or heuristic algorithms [8,11]. Approaches within this category generally conceptualize unmanned systems as point masses capable of instantaneous directional changes and constant-speed travel between designated task locations, with optimization objectives focused on minimizing either the total path length or the overall mission duration. Owing to their relative model simplicity and the availability of well-established solution algorithms, these methods have been extensively employed in preliminary research stages. For example, literature [12] conducted a joint optimization of waypoint assignment, communication scheduling, and UAV trajectory planning within a patrol inspection context, aiming to balance task completion time and reduce energy consumption. In [13], the many-to-many target assignment problem for UAV swarms operating in three-dimensional environments was addressed by incorporating damage and time costs as evaluation criteria, utilizing a bio-inspired swarm intelligence optimization algorithm. Similarly, literature [14] introduced a two-stage greedy auction algorithm tailored for large-scale cooperative strike missions involving UAV swarms, accounting for parameters such as distance, angle, interception rate, and recognition rate. The study in [15] developed a multi-objective optimization framework for heterogeneous UAV cooperative multi-task assignment, targeting minimization of total flight distance and task completion time. Additionally, literature [16] formulated the multi-UAV task assignment problem by considering flight distance, task revenue, and temporal constraints, and proposed a hybrid particle swarm optimization algorithm to derive solutions. Beyond military applications, optimization models grounded in task decomposition and load balancing have also been applied in civilian sectors, including emergency remote sensing [17] and cooperative observation in space-aeronautics domains.
However, a critical limitation of these approaches is their disregard for the intrinsic kinematic constraints of platforms such as fixed-wing UAVs, which often results in the generation of “optimal” paths that are dynamically infeasible.
To integrate kinematic constraints, most studies employ a sequential “assign-then-plan” framework. This approach initially determines the task sequence based on a simplified model, which is subsequently followed by local trajectory smoothing or optimization tailored to each platform. For example, literature [18] combined an improved particle swarm optimization algorithm with a genetic algorithm to address the task assignment and tarjectory planning problem for multiple UAVs operating in a marine environments. Literature [19] proposed a Graph Attention Task Allocator that synthesizes information from neighboring agents within a multi-robot system to achieve globally optimal target localization for heterogeneous robots. Literature [20] tackled heterogeneous multi-UAV cooperative mission planning in complex three-dimensional mountainous terrain by integrating a Life-cycle Swarm Optimization algorithm with a Rapidly-exploring Random Tree algorithm. Literature [21] reformulated the multi-UAV cooperative search problem into single-UAV coverage path planning within subdivided regions, employing “Z”-pattern tarjectories and Dubins curves for path generation.
Despite advancements in trajectory generation, the fundamental open-loop decoupling approach remains largely unaltered. A notable inconsistency persists between the cost metrics employed in high-level assignment decisions, such as Euclidean distance, and the actual execution times associated with low-level trajectories. Furthermore, feasibility information derived at the trajectory level is not effectively communicated back to the decision-making layer, thereby impeding assurance of global optimality in the final solution.
Recent studies have begun to investigate the joint optimization of task assignment and path planning. For instance, literature [22] developed a cost function incorporating actual path cost based on Dubins curves, facilitating the simultaneous optimization of multi-UAV multi-target task assignment and trajectory planning. Literature [23] discretized UAV heading angles within a three-dimensional Dubins model, reformulating the path planning and task assignment problems into a unified discrete graph framework. These works reflect a research trend shifting from “decoupled” to “coupled” methodologies. Nonetheless, these approaches frequently depend on parameterized path templates or simplified motion models, thereby falling short of achieving a globally time-optimal joint optimization grounded in comprehensive nonlinear dynamics.
The aforementioned review reveals two fundamental and interconnected limitations inherent in existing methodologies. First, the decoupling of decision-making and execution persists as a core issue. Whether in classical combinatorial optimization or sequential “assign-then-plan” frameworks, the trajectory planning layer passively adapts to predetermined task sequences, precluding any guarantee of global optimality and potentially undermining the original assignment strategy when kinematic constraints are subsequently enforced. Second, the neglect of platform-specific kinematic constraints remains prevalent. The predominant point-mass abstraction, while computationally convenient, ignores critical dynamics such as minimum turning radius and continuous acceleration limits, rendering geometrically “optimal” assignments dynamically infeasible in practice. These limitations stem from the absence of a unified optimization framework that embeds kinematic feasibility directly into the high-level decision process. Consequently, establishing an integrated “decision-control” paradigm—wherein task assignments are inherently executable and kinematic considerations are prospectively evaluated at the planning stage—constitutes an essential advancement for enhancing the operational efficacy of cross-domain unmanned swarms. This study addresses this gap by proposing a bi-level optimization framework that tightly couples upper-level task assignment with lower-level optimal control, establishing a closed-loop mechanism where precise trajectory-level performance feeds back to guide assignment decisions.

1.3. Proposed Approach

To address the aforementioned research gaps, this paper introduces a novel integrated “decision-control” framework for task assignment. The central concept involves the development of a tightly coupled bi-level optimization model. The upper level addresses the primary task assignment problem, whose decision variables are the task sequences for the unmanned systems. The objective at this level is to minimize the maximum completion time required across all agents to complete their assigned sequences. Embedded within the upper level, the lower level solves a minimum-time optimal control problem for each specified task sequence, utilizing a nonlinear kinematic model. This lower-level problem aims to accurately compute the true minimum time required to execute the given sequence. This hierarchical structure represents a fundamental departure from conventional sequential methodologies. Rather than a simple concatenation of two independent stages, it forms a closed-loop feedback system wherein upper-level decisions inform lower-level computations, and the outcomes of these computations provide the exclusive reliable foundation for upper-level decision-making. The principal contributions of this paper are as follows:
(1)
A tightly integrated bi-level optimization framework combining task assignment with optimal control is developed.
The upper level is the task assignment layer, which aims to minimize the maximum mission completion time by determining the assignment and visitation sequence of multiple targets across heterogeneous platforms. The lower level is the trajectory optimization layer. For each task sequence generated by the upper level, it formulates and solves a minimum-time optimal control problem based on a nonlinear kinematic model. The two levels operate within a closed-loop system through the precise feedback of target execution durations, thereby incorporating embedding kinematic feasibility directly into the assignment decisions. This integrated approach guarantees, at the modeling stage, both the global optimality and practical implementability of the resulting operational plan.
(2)
Effective solution methodologies for the proposed bi-level model are developed.
At the lower level, addressing a strongly nonlinear and high-dimensional optimal control problem, the differential flatness property is employed to remove trigonometric terms from the kinematic equations. Subsequently, the Radau pseudospectral method is applied to discretize the problem into a nonlinear programming formulation, which is then efficiently solved using a well-established solver. At the upper level, which involves a complex combinatorial optimization problem, an enhanced genetic algorithm is devised. This algorithm integrates features such as hybrid encoding, dual-archive elitism preservation for feasible and infeasible solutions, adaptive crossover and mutation strategies, and periodic local search procedures. These components collectively facilitate a balanced trade-off between global exploration and local exploitation, thereby improving both the efficiency and robustness of the solution process.
(3)
The effectiveness and superiority of the proposed approach are validated through systematic simulation experiments.
Firstly, by comparing with conventional assignment methods predicated on Euclidean distance, it is verified that the integrated model markedly reduces the actual mission completion time when kinematic constraints are incorporated. Subsequently, simulations are performed within a heterogeneous multi-target engagement context involving unmanned aerial vehicles (UAVs), unmanned surface vehicles (USVs), and unmanned underwater vehicles (UUVs). The results indicate that the proposed method can autonomously generate smooth trajectories that adhere to the kinematic properties of each platform type while achieving an equitable distribution of task loads. These results fully prove the method’s practicality, scalability, and its superiority in enhancing the overall operational effectiveness of the swarm.

2. Integrated Decision-Control Target Assignment Model for Cross-Domain Unmanned Swarms

This section introduces a mathematical formulation for the integrated decision-control target assignment problem. It provides a concise overview of the objective function and constraints, thereby establishing the framework of the integrated decision-control target assignment (IDCTA) model.

2.1. Problem Formulation

2.1.1. Targets

Assume there are N enemy targets situated within the operational region. These targets are classified into K distinct classes according to their spatial locations and characteristics, such as surface targets and underwater targets. The set of target indices is denoted as follows:
T = 1 , 2 , , N
The target set is given by
T = T n n T
where T n represents the information set of the n-th target, specifically defined as
T n = P n , χ n
Here, P n denotes the position of the target n, and χ n 1 , 2 , , K is the classification label indicating the specific type of the target.

2.1.2. Unmanned Systems

In the context of operational combat scenarios, unmanned systems are deployed to engage designated targets. Various categories of unmanned systems exhibit distinct operational capabilities and are suitable for engaging different types of targets. This paper focuses on three categories of unmanned systems: unmanned aerial vehicles (UAVs), unmanned surface vehicles (USVs), and unmanned underwater vehicles (UUVs). Let the total quantity of unmanned systems be denoted by M, comprising M a UAVs, M s USVs, and M u UUVs. The corresponding index sets for these systems are defined as follows:
U a = 1 , 2 , , M a ,
U s = M a + 1 , M a + 2 , , M a + M s ,
U u = M a + M s + 1 , M a + M s + 2 , , M a + M s + M u
U = U a U s U u
The set of all unmanned systems is denoted by:
U = U m m U
Each unmanned system possesses unique engagement capabilities, enabling it to undertake different types of engagement targets. The set of target types that a given unmanned system is capable of engaging is characterized by its capability set:
C m = c m , 1 , c m , 2 , , c m , K , m U ,
where c m , k is a non-negative integer representing the maximum number of type-k targets that unmanned system m can engage.

2.2. Objective Function and Constraints

This section establishes a bi-level optimization model to address the integrated decision-control task assignment problem. The upper level corresponds to the task assignment layer, wherein the decision variables encompass the task assignment schemes and execution sequences of unmanned systems. The objective at this level is to minimize the maximum value of the minimum execution time required for all unmanned systems to complete their assigned task sequences. The lower level constitutes the trajectory optimization layer, which determines the minimum-time trajectories that satisfy kinematic constraints for any given task sequence provided by the upper layer. These two levels are intricately interconnected through the task execution times, resulting in a nested optimization problem.

2.2.1. Upper-Level Target Assignment Model

The central focus of the upper-level model is to identify the optimal task assignment scheme and execution sequence subject to multiple constraints, with the objective of minimizing the maximum value of the minimum execution time across all unmanned systems.
(1)
Decision Variables and Objective Function
First, define the following decision variable:
q m , n = 1 , m - th unmanned system attack the target n 0 , others
where m U , n T .
Based on the decision variable q m , n , the execution sequence for the m-th unmanned system is defined as a mapping:
π m : 1 , , N m n ¯ , q m , n ¯ = 1 , m U
where N m represents the number of tasks performed by unmanned system m, and π m i denotes the target index of its i-th engage.
The ensemble of decision variables q m , n is defined as a decision vector q with the following meaning
q = q 1 , 1 , , q 1 , N , , q M , N T
In practical operations, the timeliness of mission execution constitutes a vital performance indicator. Therefore, an optimal task assignment scheme should complete the engagement of all designated targets within the minimal possible duration. Let t m be the minimum time required for the unmanned system m to engage its assigned targets following the prescribed sequence π m . This duration t m is not predetermined but is derived by solving the lower-level minimum-time optimal control problem, and fundamentally depends on the assignment scheme q m , n as well as the sequence π m :
t m = ϕ m q m , n , π m
Therefore, the optimization objective of the upper-level task assignment model is formulated as minimizing the maximum value of the minimum execution times among all unmanned systems, specifically,
min max m U t m
(2)
Constraints
The upper-level model is required to adhere to the following constraints:
(a)
Capability Matching Constraint
In the process of target assignment, considering the variability in engagement capabilities among unmanned systems and the differing target types, each unmanned system may only be assigned to target types for which it possesses the requisite engagement capability. This constraint can be formally represented as
q m , n c m , χ n , m U , n T
where c m , χ n denotes the capability indicator specifying whether unmanned system m is capable of engaging target n.
(b)
Engagement Capacity Constraint
The aggregate quantity of targets of a particular type assigned to any unmanned system shall not exceed the maximum engagement capacity of that system for the given target type, that is,
n T k q m , n c m , k , m U , k 1 , 2 , , K
where T k = n T χ n = k represents the subset comprising all targets of type k. This constraint guarantees that the task assignment scheme complies with the practical capacity limitations of the unmanned systems.
(c)
Target Completion Constraint
Each target is required to be assigned precisely one time, meaning that
m U q m , n = 1 , n T
(d)
Constraint on Maximum Operating Duration
The cumulative time allocated to engagement tasks for each unmanned system shall not surpass its designated maximum safe operating duration. Therefore, the following condition must be satisfied:
t m t m , max , m U
where t m , max denotes the maximum allowable operating duration for unmanned system m. Exceeding this duration may lead to energy depletion, performance degradation, or mission failure.
Based on the established objective function (10) and constraints (15)–(17), the upper-level target assignment model is expressed as the following optimization problem:
min max m U t m s . t . q m , n c m , χ n , m U , n T n T k q m , n c m , k , m U , k 1 , 2 , , K m U q m , n = 1 , n T t m t m , max , m U

2.2.2. Lower-Level Optimal Control Model

The upper-level target assignment framework necessitates the determination of the minimum time t m for the unmanned system m to complete engagement with its designated targets following the prescribed task sequence. This section establishes an optimal control model aimed at computing this minimum time.
(1)
System Model and Performance Metrics
First, the kinematic model of the unmanned system is described as follows:
d x m = v m cos θ m d t d y m = v m sin θ m d t d v m = a m d t d φ m = ρ m d t
where m U , p m = x m , y m T denotes the position coordinates of the unmanned system m, θ m corresponds to its heading angle, and a m and ρ m denote the acceleration and angular velocity, respectively.
Define the state vector of the unmanned system as X m = x m , y m , v m , θ m T and the control vector as u m = a m , ρ m T . The system dynamics can then be expressed as
d X m = f X m , u m d t
Given the task sequence for unmanned system m:
π m = π m 1 , , π m N m
the total mission completion time required for executing this sequence is denoted by t m , f .
The performance metrics of the lower-level model is to minimize the total time required to complete the entire sequence of tasks:
min 0 t m , f d t
(2)
Constraints
The unmanned system is required to adhere to the following constraints throughout its motion:
(a)
System Dynamics Constraint
The unmanned system m complies with the dynamic equation:
d X m = f X m , u m d t
(b)
Initial and Terminal Conditions
The unmanned system commences operation from a designated initial position and subsequently returns to this same position upon completion of the target engagement sequence. Consequently, it is governed by the following initial and terminal conditions:
X m 0 = X m , 0 , X m t f = X m , 0
(c)
Waypoint Constraints
In order to fulfill the engagement objectives, the unmanned system is required to sequentially arrive at each designated target location:
x m t m , i , y m t m , i T = P π m i , i = 1 , 2 , , N m
where t 0 t m , i t m , f denotes the time at which the unmanned system m arrives at the i-th target.
(d)
Constraints on Motion Performance
Throughout operation, the motion parameters, including velocity, acceleration, and angular velocity, of the unmanned system restrict its inherent physical performance capabilities. For example, fixed-wing UAVs are required to sustain a minimum airspeed during flight to avoid aerodynamic stall. Consequently, the following constraints are established:
ω m , min < ω m t < ω m , max ,
v m , min < v m t < v m , max ,
a m , min < v ˙ m t < a m , max
where v m , max , v m , min , a m , max , a m , min , ω m , max and ω m , min represent the upper and lower bounds on the velocity, acceleration, and angular velocity of unmanned system m, respectively.
Based on the established performance index (22) and constraints (24)–(29), the lower-level minimum-time optimal control model is formulated as follows:
min 0 t m , f d t s . t . d X m = f X m , u m d t X m 0 = X m , 0 , X m t f = X m , 0 x m t m , i , y m t m , i T = P π m i , i = 1 , 2 , , N m ω m , min < ω m t < ω m , max , v m , min < v m t < v m , max , a m , min < v ˙ m t < a m , max

2.2.3. Integrated Decision-Control Bi-Level Target Assignment Model

Drawing upon the upper-level target assignment model and lower-level optimal control model established in Section 2.2.1 and Section 2.2.2, the optimization problem can be formally expressed as follows:
( upper-level model ) min q , π max m U t m s . t . q m , n c m , χ n , m U , n T n T k q m , n c m , k , m U , k 1 , 2 , , K m U q m , n = 1 , n T t m t m , max , m U t m = ϕ m q , π m , m U ( lower-level model ) ϕ m q , π m = min X m , u m , t m , f 0 t m , f d t s . t . d X m = f X m , u m d t X m 0 = X m , 0 , X m t f = X m , 0 x m t m , i , y m t m , i T = P π m i , i = 1 , 2 , , N m ω m , min < ω m t < ω m , max , v m , min < v m t < v m , max , a m , min < v ˙ m t < a m , max
This problem exemplifies a standard bi-level programming framework. At the upper-level, the model addresses task assignment from a global standpoint, with the objective of balancing the workload across all unmanned systems and minimizing the maximum value among the minimum execution time required for each unmanned system to complete its assigned task sequences. At the lower level, for each individual unmanned system, the model determines the minimum-time trajectory that satisfies dynamic constraints, given the target sequence specified by the upper level. These two levels are intricately interconnected through the execution time, thereby establishing a closed-loop “decision-control” framework: the upper level’s assignment decisions determine the assignment scheme q and execution sequence π m for each unmanned system, while the lower level computes the minimum execution time t m based on this sequence. This execution time acts as a vital feedback signal returned to the upper level, which subsequently utilizes t m to evaluate and optimize the target assignment strategy.

3. Transformation of the Optimal Control Model

Given that the optimal control model established in Section 2.2.3 involves strongly nonlinear kinematic equations containing trigonometric functions and multiple complex constraints, directly solving this continuous-time optimal control problem poses significant challenges. To address this issue, the present paper adopts a transformation strategy that combines differential flatness with the Radau pseudospectral method, motivated by the following considerations.
First, differential flatness enables mapping the state variables with nonlinear trigonometric relationships, such as velocity and heading angle, onto a flat output space, thereby converting the system dynamics into a linear form. This transformation not only eliminates the nonlinear coupling introduced by trigonometric functions but also substantially simplifies the model structure, providing a solid foundation for subsequent numerical discretization.
Second, compared with conventional direct collocation methods such as Euler or trapezoidal schemes, the Radau pseudospectral method offers higher approximation accuracy and faster convergence. By employing high-order Lagrange interpolation of state and control variables at Legendre–Gauss–Radau points, it accurately approximates continuous-time trajectories with a relatively small number of discrete nodes. This is particularly advantageous for optimal control problems involving complex path constraints and performance indices. Furthermore, the resulting algebraic constraint formulation can be readily cast as a nonlinear programming problem and efficiently solved using mature numerical solvers such as IPOPT.
In summary, differential flatness provides a theoretical tool for model simplification, while the Radau pseudospectral method ensures accuracy and efficiency in numerical discretization. Their combination enables the originally complex nonlinear optimal control problem to be transformed precisely and stably into a tractable mathematical programming form, thereby offering reliable support for the subsequent closed-loop optimization.

3.1. Reformulation of Kinematic Equations via Differential Flatness

First, the state vector and control input of the system are redefined as
X ˜ m = x m , y m , v m , x , v m , y T ,
u ˜ m = a m , x , a m , y T
Here, the position vector p m = x m , y m T denotes the coordinates of the unmanned system m, and the velocity vector v m = v m , x , v m , y T represents its velocity components along the x-axes and y-axes, respectively. The control input u m = a m , x , a m , y T corresponds to the acceleration components in the these directions. Based on this reformulated representation, the original speed v m and heading angle θ m in the kinematic model can be equivalently expressed in terms of the new state variables as
v m = v m , x 2 + v m , y 2
θ m = arctan v m , y v m , x
Furthermore, by performing differentiation on the aforementioned equations, the formulations for the acceleration v ˙ m and the heading angular velocity θ ˙ m of the unmanned system m are obtained as follows:
v ˙ m = v m , x a m , x + v m , y a m , y v m , x 2 + v m , y 2
θ ˙ m = v m , x a m , y v m , y a m , x v m , x 2 + v m , y 2
Based on the previously established equivalences among the variables, the original nonlinear kinematic model of the unmanned system (20) can be reformulated into the subsequent linear representation:
x ˙ m = v m , x , v ˙ m , x = a m , x y ˙ m = v m , y , v ˙ m , y = a m , y
which is denoted compactly as
d X ˜ m d t = 0   1     0   1   0         0 X ˜ m , i + 0     0 1     1 u ˜ m , i = Δ A m X ˜ m , i + B m u ˜ m , i
Accordingly, the constraints on velocity, acceleration, and heading angular velocity in the original system are reformulated in terms of the newly defined state and control variables as follows:
v m , min v m , x 2 + v m , y 2 v m , max
a m , min v m , x a m , x + v m , y a m , y v m , x 2 + v m , y 2 a m , max
ω m , min v m , x a m , y v m , y a m , x v m , x 2 + v m , y 2 ω m , max
To enable subsequent numerical analysis, these constraints are equivalently reformulated as quadratic expressions:
v m , x 2 + v m , y 2 v m , max 2
v m , x 2 + v m , y 2 v m , min 2
v m , x a m , x + v m , y a m , y 2 a m , max 2 v m , x 2 + v m , y 2
v m , x a m , y v m , y a m , x ω m , max v m , x 2 + v m , y 2
v m , x a m , y v m , y a m , x ω m , min v m , x 2 + v m , y 2
The employed differential flatness transformation successfully removes the trigonometric functions present in the original model, thereby substantially simplifying the system equations. Nonetheless, the resulting model remains a continuous-time optimal control problem. To enhance its numerical tractability, the following section will discretize the reformulated model employing the Radau pseudospectral method (RMP).

3.2. Formulation of the Optimal Control Problem Using the Pseudospectral Method

Consider the target sequence for unmanned system m to be defined as follows:
π m = π m 1 , , π m N m
and denote the coordinates of the i-th target point in this sequence as P π m i .
In the context of trajectory optimization, the entire trajectory of the unmanned system m is divided into N m + 1 segments based on its target sequence. The state vector and control vector associated with the i-th segment are respectively defined as follows:
X ˜ m , i = x m , i , y m , i , v m , x , i , v m , y , i T ,
u ˜ m , i = a m , x , i , a m , y , i T , i = 1 , 2 , , N m

3.2.1. Time Domain Normalization

To facilitate numerical computation, each physical time segment is transformed onto a standardized domain. Let t i represent the instant at which the unmanned system arrives at the i-th target within its target sequence. The physical time interval t i 1 , t i is correspondingly mapped onto the normalized time interval 1 , 1 through the application of a linear transformation:
τ = 2 t t i t i 1 1 , t t i 1 , t i , τ 1 , 1
Under this transformation, the state equation is accordingly transformed into:
d X ˜ m , i d τ = t i t i 1 2 A m X ˜ m , i + B m u ˜ m , i
where the boundary times are given by t 0 = 0 , t N m + 1 = t f , representing the initial time and the final completion time, respectively.

3.2.2. Approximation of State and Control Variables

To avoid the Runge phenomenon commonly encountered in function approximation, the Radau pseudospectral method is employed. This approach involves the selection of K Legendre–Gauss–Radau (LGR) collocation points within the half-open interval 1 , 1 , which are defined as follows:
1 = τ 0 < < τ K 1 < 1
Here, each collocation point τ k is a root of the polynomial P K τ + P K 1 τ , where P K τ represents the Legendre polynomial of degree K:
P k x = 1 2 K K ! d K d x K x 2 1 K
At these LGR collocation points τ k , the state and control variables are discretely sampled:
X ˜ m , i , k = X ˜ m , i τ k , u ˜ m , i , k = u ˜ m , i τ k , k = 0 , 1 , 2 , , K 1
The state variable X ˜ m , i τ is approximated using a K-th degree Lagrange interpolating polynomial, expressed as
X ˜ m , i τ X ˜ m , i , 0 , X ˜ m , i , 1 , , X ˜ m , i , K 1 l τ
where l τ = l 0 τ , l 1 τ , , l K 1 τ T is the basis function vector, where each element l j τ corresponds to a Lagrange basis polynomial, defined by
l j τ = k = 0 k j N 1 τ τ k τ j τ k = τ τ 0 τ τ 1 τ τ K 1 τ j τ 0 τ j τ 1 τ j τ K 1 , j = 0 , 1 , 2 , , K 1
This collection of basis functions satisfies the normalization criterion, that is,
l j τ k = 1 j = k 0 j k
Similarly, the control variable u ˜ m , i τ is approximated by a K-th degree Lagrange interpolating polynomial, expressed as
u ˜ m , i τ u ˜ m , i , 0 , u ˜ m , i , 1 , , u ˜ m , i , K 1 l τ
Theorem 1. 
Let the state variable x ( τ ) possess a continuous K-th derivative on the interval [ 1 , 1 ] , and let p ( τ ) be the Lagrange interpolation polynomial of x ( τ ) at K Radau points { τ 0 , τ 1 , , τ K 1 } , where the points are roots of P K ( τ ) + P K 1 ( τ ) , and P K ( τ ) denotes the K-th degree Legendre polynomial. Then the interpolation error is given by
x τ p τ = 2 K K ! 2 K ! x K ξ P K τ + P K 1 τ
with the corresponding error bound:
x τ p τ 2 K + 1 K ! 2 K ! x K ξ
Proof. 
The state variable x ( τ ) is approximated on [ 1 , 1 ] by the Lagrange interpolation polynomial p ( τ ) using K distinct nodes { τ 0 , τ 1 , , τ K 1 } . The standard Lagrange interpolation error formula yields:
x τ p τ = x K ξ K ! k = 0 K 1 τ τ k
where ξ 1 , 1 and x K denotes the K-th derivative of x.
The Radau points τ k are defined as the roots of P K τ + P K 1 τ , where P K τ is the K-th degree Legendre polynomial. Legendre polynomials are defined on [ 1 , 1 ] and satisfy
P K 1 = 1 , P K 1 = 1 K
Define the nodal polynomial:
γ τ = k = 0 K 1 τ τ k
Since the τ k are roots of P K τ + P K 1 τ , it follows that
γ τ = α P K τ + P K 1 τ
where α is a constant coefficient.
The leading coefficient of the Legendre polynomial P K τ is
b K = 2 K ! 2 K K ! 2
Consequently, the leading coefficient of P K τ + P K 1 τ is also b K . Since γ ( τ ) is monic (leading coefficient 1), it follows that
γ τ = 1 b K P K τ + P K 1 τ
Substituting into the Lagrange error formula yields
x τ p τ = x K ξ K ! · 2 K K ! 2 2 K ! P K τ + P K 1 τ = 2 K K ! 2 K ! x K ξ P K τ + P K 1 τ
The error bound follows directly:
x τ p τ 2 K K ! 2 K ! x K ξ P K τ + P K 1 τ 2 K K ! 2 K ! x K ξ · 2 = 2 K + 1 K ! 2 K ! x K ξ
This demonstrates that the interpolation error decays exponentially as K increases.
This completes the proof of the theorem. □

3.2.3. Transformation of the State Equation

The time derivative of the state variable at the collocation point τ k is computed by differentiating the interpolating polynomial as expressed in Equation (56):
d X ˜ m , i τ k d τ X ˜ m , i , 0 , X ˜ m , i , 1 , , X ˜ m , i , K 1 l ˙ τ k = Δ X ˜ m , i , 0 , X ˜ m , i , 1 , , X ˜ m , i , K 1 d τ k , k = 0 , 1 , , K 1
where d τ k is a K × 1 differentiation matrix.
Simultaneously, by evaluating the normalized state Equation (52) at the collocation point τ k , it follows that
d X ˜ m , i τ k d τ = t i + 1 t i 2 A m X ˜ m , i τ k + B m u ˜ m , i τ k
By equating the two aforementioned expressions equal, the continuous-time kinematic differential equation of the unmanned system is transformed, at the collocation points, into the following algebraic constraint equation:
X ˜ m , i , 0 , X ˜ m , i , 1 , , X ˜ m , i , K 1 d τ k = t i + 1 t i 2 A m X ˜ m , i , k + B m u ˜ m , i , k

3.2.4. Formulation of Boundary and Interior Point Conditions

In the Radau pseudospectral method, the terminal state is obtained via Gauss–Radau quadrature:
X ˜ m , i , K = X ˜ m , i , 0 + t i + 1 t i 2 k = 0 K 1 ω k A m X ˜ m , i , k + B m u ˜ m , i , k
where ω k represents the Gauss–Radau quadrature weight corresponding to the collocation point τ k , calculated as
ω k = 1 1 l k τ d τ , k = 0 , 1 , , K 1
Within this numerical integration framework, the boundary and interior point conditions are formulated as follows. Based on the initial condition:
X ˜ m , 0 , 0 = x m , 0 , y m , 0 , v m , 0 cos θ m , 0 , v m , 0 sin θ m , 0 T
According to the terminal condition, the unmanned system must return to its initial departure position, that is,
x m , N m + 1 , K = x m , 0 , y m , N m + 1 , K = y m , 0
Regarding the interior point conditions, the terminal position of the i-th trajectory segment must coincide with the coordinates of the i-th target point in the target sequence, that is,
x m , i , K , y m , i , K , z m , i , K T = P π m i
Furthermore, to ensure trajectory continuity, the state of the unmanned system must be continuous at each target point, enforced by
X ˜ m , i , K = X ˜ m , i + 1 , 0

3.2.5. Discretization of Unmanned System Performance Constraints

The performance constraints for the unmanned system are given by Equations (43)–(47). By discretizing these constraints at the LGR collocation points, they are transformed into the following algebraic form:
v m , x , i , k 2 + v m , y , i , k 2 v m , max 2
v m , x , i , k 2 + v m , y , i , k 2 v m , min 2
v m , x , i , k a m , x , i , k + v m , y , i , k a m , y , i , k 2 a m , max v m , x , i , k 2 + v m , y , i , k 2
v m , x , i , k a m , y , i , k v m , y , i , k a m , x , i , k ω m , min v m , x , i , k 2 + v m , y , i , k 2
v m , x , i , k a m , y , i , k v m , y , i , k a m , x , i , k ω m , max v m , x , i , k 2 + v m , y , i , k 2

3.2.6. Formulation of the Nonlinear Programming Problem

Through the aforementioned discretization procedures, the time-optimal control problem for unmanned system m to execute its designated target sequence is transformed into the following nonlinear programming (NLP) problem:
min t N m + 1 s . t . X ˜ m , i , 0 , X ˜ m , i , 1 , , X ˜ m , i , K 1 d τ k = t i + 1 t i 2 A m X ˜ m , i , k + B m u ˜ m , i , k X ˜ m , 0 , 0 = x m , 0 , y m , 0 , z m , 0 , v m , 0 cos φ m , 0 cos θ m , 0 , v m , 0 cos φ m , 0 sin θ m , 0 , v m , 0 sin φ m , 0 T x m , N m + 1 , K = x m , 0 , y m , N m + 1 , K = y m , 0 , z m , N m + 1 , K = z m , 0 x m , i , K , y m , i , K , z m , i , K T = P π m i X ˜ m , i , K = X ˜ m , i + 1 , 0 v m , x , i , k 2 + v m , y , i , k 2 v m , max 2 v m , x , i , k 2 + v m , y , i , k 2 v m , min 2 v m , x , i , k a m , x , i , k + v m , y , i , k a m , y , i , k 2 a m , max v m , x , i , k 2 + v m , y , i , k 2 v m , x , i , k a m , y , i , k v m , y , i , k a m , x , i , k ω m , min v m , x , i , k 2 + v m , y , i , k 2 v m , x , i , k a m , y , i , k v m , y , i , k a m , x , i , k ω m , max v m , x , i , k 2 + v m , y , i , k 2 i = 0 , 1 , , N m , k = 0 , 1 , , K 1
By integrating the differential flatness property with the Radau pseudospectral method, the original nonlinear optimal control problem, which involved complex trigonometric functions, has been successfully converted into a nonlinear programming problem with polynomial constraints. This transformation significantly reduces the computational complexity of the problem, thereby establishing a solid foundation for applying mature and efficient optimization algorithms to obtain the numerical solution.
Remark 1 
(Computational Complexity Analysis). The variable scale of the aforementioned nonlinear programming problem directly determines the computational complexity of the algorithm. For a single unmanned system m, its trajectory is divided into N m + 1 segments, with each segment discretized using K LGR collocation points. The state variables X ˜ m , i for each segment contain four components (position and velocity), and the control variables u ˜ m , i contain two components (acceleration). Additionally, each segment introduces an intra-segment time variable t i t i 1 and the global final time t m , f , Therefore, the total number of variables for a single unmanned system is approximately 4 N m + 1 K + 2 N m + 1 K + N m + 1 + 1 . Assuming N m = N N M M , the total number of variables for the entire swarm can be approximated as
M × 4 N + 2 N + 1 + 1 × K
This scale explains the inherent computational cost of the algorithm. Given this computational demand, the proposed method is more suitable for offline pre-planning or mission pre-allocation rather than online real-time replanning. Future work may explore parallel computing, surrogate models, or distributed architectures to enhance efficiency, laying the foundation for real-time applications.

4. Algorithm Design and Effectiveness Analysis

Based on the established model, this section designs a solution algorithm for the bi-level optimization problem. The lower-level model is solved using the IPOPT solver. For the upper-level task assignment, since the minimum execution time of each unmanned system is implicitly determined by the task assignment scheme p and the target sequence π m without an explicit analytical form, it constitutes a “black-box” optimization problem where traditional gradient-based methods are not applicable. To address this, the cooperative task assignment problem for the unmanned swarm is formulated as a multiple traveling salesman problem incorporating both capability and kinematic constraints. The following solution algorithm is accordingly proposed.

4.1. Algorithm Design

4.1.1. Encoding Design

To solve the aforementioned upper-level optimization model, a hybrid encoding scheme integrating permutation-based encoding with delimiters is proposed. The chromosome structure is illustrated in Figure 1, which consists of two segments: the first N genes represent a permutation of all targets, guaranteeing that each target is visited exactly once; the remaining M 1 genes represent a sequence of delimiters. Each delimiter is an integer selected from the set 1 , 2 , , N that indicates a cut position within the target permutation. These delimiters are listed in ascending order to ensure a valid and unambiguous partition. Specifically, the M 1 delimiters divide the complete target permutation into M contiguous subsequences, where the m-th subsequence corresponds to the ordered list of targets assigned to the m-th unmanned system. Following this encoding rule, a complete chromosome can be decomposed into M sub-chromosomes, each representing the target engagement sequence for one unmanned system.
An illustrative example is provided in Figure 2: for a case with three unmanned systems and 10 targets, a target permutation 5 , 3 , 7 , 6 , 2 , 9 , 1 , 10 , 4 , 8 combined with delimiters 3 , 7 yields three subsequences 5 , 3 , 7 for Unmanned System 1, 6 , 2 , 9 , 1 for Unmanned System 2, and 10 , 4 , 8 for Unmanned System 3. This encoding scheme ensures feasibility and facilitates subsequent genetic operations.

4.1.2. Initial Population Generation

The quality of the initial population significantly influences the convergence speed and final solution quality of the algorithm. An insufficient number of feasible initial solutions can lead to increased ineffective search efforts and slower convergence. Therefore, an effective method for generating the initial population is designed in this paper, aiming to enhance both the quality and feasibility of the initial solutions. The detailed procedure is described as follows:
Step 1:
Randomly generate a permutation of the target visiting sequence, ensuring each target appears exactly once.
Step 2:
Randomly generate M 1 distinct integers from the interval 2 , N 1 as delimiters. These delimiters are arranged in ascending order to form the delimiter sequence. Here, the delimiter s m indicates the end position of the target subsequence assigned to the m-th unmanned system.
Step 3:
For each initial individual, verify whether it satisfies the target-type matching constraints. If an individual violates any constraint, a repair operation is performed. This operation adjusts the positions of the delimiters or swaps targets within the sequence to ensure that all type-matching constraints are satisfied. This repair mechanism ensures that the majority of individuals in the initial population comply with the capability matching constraints, thereby effectively reducing ineffective searches during the evolutionary process.

4.1.3. Feasible and Infeasible Reserve Sets

In constrained optimization problems, the interplay between feasible and infeasible sets considerably affects algorithm performance. To make effective use of the information explored during the search, this paper introduces two distinct archives: the feasible reserve set ( Γ F ) and the infeasible reserve set ( Γ I ). These sets are maintained to store, respectively, the feasible and infeasible solutions encountered throughout the iterative search, thereby providing guidance for subsequent optimization steps. The capacities of the two archives are predefined as N F and N I .
First, the constraint violation of a solution is defined as
V q , π = l = 1 L max 0 , g l q , π
where L denotes the total number of constraints.
Based on the constraint violation measure, the feasible solution set Ω F and the infeasible solution set Ω I are defined as:
Ω F = q , π V q , π = 0
Ω I = q , π V q , π > 0
The design of the reserve set is based on three principles: the elitism principle, which aims to prevent the loss of high-quality feasible assignment schemes during evolution; the boundary exploration principle, which focuses on retaining infeasible assignment schemes that are close to the feasible region boundary, thereby guiding the search toward new feasible areas; and the diversity maintenance principle, which avoids premature convergence by preserving a spatially diversified distribution of the assignment schemes.
The motivation for maintaining an infeasible reserve set stems from the observation in constrained optimization that the global optimum often lies on or near the boundary of the feasible region. Purely feasible solutions may be located far from this boundary, and discarding all infeasible individuals can lead to the loss of valuable directional information. By retaining infeasible solutions with small constraint violations and good objective values, the algorithm can exploit these boundary solutions to guide the population toward promising regions of the search space. This is conceptually similar to the ε -constrained method and stochastic ranking, where controlled acceptance of infeasible solutions enhances exploration and helps escape local optima.
The coordination between the two archives is achieved through a carefully designed update mechanism and a combined parent selection process. In each generation, after evaluating the population, both archives are updated according to the procedures described below. Subsequently, the parent population for the next generation is formed by merging individuals from both Γ F and Γ I , along with other selected individuals. This allows the infeasible reserve set to continuously inject boundary information into the evolutionary process, thereby maintaining a healthy balance between feasibility and optimality, and enhancing global search capability.
(1)
Update Mechanism for the Feasible Reserve Set
The feasible reserve set Γ F is designed to retain high-quality feasible assignment schemes encountered during the search process, following the elitism principle. First, a fitness function is defined as the normalized objective value:
h ^ π = h π h min h max h min
where h π = max m U t m represents the upper-level objective value for a given task sequence π , and h min and h max are the minimum and maximum h values in the current population, respectively. This normalization ensures that fitness values lie in a comparable range.
Based on the fitness function, a composite quality metric is introduced to evaluate assignment schemes:
Q F π = h ^ π , D π , Γ F
where D π , Γ F denotes the minimum distance between the task assignment scheme π and the solutions already stored in the reserve set. The distance quantifies the diversity among solutions and is computed as
D ( π i , π j ) = 1 E ( π i ) E ( π j ) M
A distance value of D = 0 indicates that the two assignment schemes are identical, while a larger value reflects greater dissimilarity.
When a new set of feasible solutions Ω F new is obtained, the feasible reserve set is updated through the following steps:
Step 1:
Combine the current feasible reserve set Γ F with the newly generated feasible solution set Ω F new to form a candidate pool P cand F .
Step 2:
Remove duplicate assignment schemes within P cand F . Specifically, for all solution pairs satisfying D π , F A = 0 , retain only one instance of each identical scheme and discard the redundancies.
Step 3:
Employ tournament selection on P cand F to choose 1 λ F N F individuals with the best fitness values f ^ π . Add these selected individuals to the new reserve set Γ F .
Step 4:
From the remaining individuals in P cand F , select λ F N F individuals with the highest comprehensive quality score using roulette wheel selection. The selection probability for each individual is proportional to a composite quality score, calculated as
Q π = γ F h ^ π + 1 γ F D π , F A
Add the individuals to Γ F . The resulting Γ F then becomes the updated feasible reserve set.
(2)
Update Mechanism for the Infeasible Reserve Set
The infeasible reserve set Γ I is maintained to store assignment schemes that violate constraints but exhibit relatively promising objective function values or only minor constraint violations. Its primary role is to guide the search toward the boundary of the feasible region, thereby helping the algorithm explore a broader solution space. The quality of an infeasible assignment scheme is evaluated through the following composite metric:
Q I π = h ^ π , V ^ π , D π , Γ F Γ I
where V ^ π denotes the normalized constraint violation degree.
When a new set of infeasible assignment schemes Ω I new is generated, the infeasible reserve set is updated through the following procedure:
Step 1:
Merge the current infeasible reserve set Γ I with the newly obtained set Ω I new to form a candidate pool P cand I .
Step 2:
Within P cand I , identify and remove duplicate assignment schemes. Specifically, for any pair of schemes satisfying D π , Γ F Γ I = 1 , retain only one copy and discard the rest.
Step 3:
Filter the assignment schemes in P cand I by keeping only those whose fitness f ^ ( π ) is better than the average fitness of the assignment schemes currently stored in the feasible reserve set Γ F . If the number of remaining assignment schemes exceeds the capacity N I of the infeasible reserve set, select the N I individuals with the smallest constraint violation V ^ π to form the updated infeasible reserve set Γ I .
Finally, the updated feasible reserve set Γ F and the infeasible reserve set Γ I are combined to constitute the parent population for the subsequent evolutionary operations.
By integrating these two reserve sets, the algorithm dynamically balances the exploitation of feasible elite solutions and the exploration of promising infeasible regions. This dual-archive strategy has been shown to improve both convergence speed and solution robustness in constrained optimization, as it prevents the population from prematurely converging to suboptimal feasible regions and encourages the discovery of the true global optimum.

4.1.4. Hybrid Crossover Operation

The crossover operation serves as a key mechanism in genetic algorithms for maintaining population diversity and recombining advantageous genetic material. This paper proposes a two-layer crossover strategy, which operates separately on the target visiting sequence and the delimiter sequence.
(1)
Crossover for the Target Sequence
The Order Crossover (OX) operator is applied to the target visiting sequence. This operator effectively preserves the relative order and adjacency of target from the parent sequences. The procedure is as follows: First, randomly select two crossover points r 1 and r 2 satisfying 1 r 1 < r 2 N . Next, copy the segment between positions r 1 and r 2 from Parent 1 directly into the corresponding positions of the offspring. Finally, fill the remaining empty positions in the offspring, in order, with the targets from Parent 2 that do not already appear in the copied segment. This yields a complete new target sequence.
(2)
Crossover for the Delimiter Sequence
For the delimiter sequence, an arithmetic crossover based on random weights is applied. Specifically, a random weight λ is generated from a uniform distribution over 0 , 1 . The delimiter sequence of the offspring is then obtained by a convex combination of the two parent sequences:
s c h i l d = λ · s p a r e n t 1 + 1 λ · s p a r e n t 2
where [ · ] denotes rounding to the nearest integer. After crossover, the resulting delimiters are checked for boundary validity and sorted in ascending order to ensure a feasible encoding.

4.1.5. Adaptive Mutation Strategy

The mutation operation provides a further key mechanism for maintaining population diversity and helping the algorithm escape local optima. Corresponding mutation strategies are designed for the target visiting sequence and the delimiter sequence separately.
(1)
Mutation for the Target Sequence
Four mutation operators are defined for the target sequence:
  • Swap Mutation: Randomly select two distinct positions in the sequence and exchange the targets located at these positions.
  • Inversion Mutation: Randomly select a contiguous subsequence and reverse the order of targets within it.
  • Insertion Mutation: Randomly select a target, remove it from its current position, and insert it into another randomly chosen position in the sequence.
  • Scramble Mutation: Randomly choose a contiguous subsequence and randomly permute the order of targets inside it.
During each mutation step, one of the above operators is selected at random according to a predefined probability distribution and applied to the individual.
(2)
Mutation for the Delimiter Sequence
For the delimiter sequence, a mutation strategy based on probabilistic perturbation is employed. In each generation, a chosen delimiter is perturbed with probability p m ( t ) by applying a small, discrete offset:
s n e w = s o l d + Δ s , Δ s 3 , 2 , 1 , 1 , 2 , 3
where Δ s is an integer perturbation step selected uniformly at random. Following mutation, the validity of the new delimiter sequence must be checked: all delimiters must lie within the permissible range 2 , N 1 and must remain in strictly ascending order. If the mutated sequence violates any of these conditions, the perturbation is either rejected or reapplied until a valid sequence is obtained.

4.1.6. Elitism and Local Search

(1)
Elitism Strategy
The elitism strategy is a fundamental mechanism for ensuring convergence of the algorithm. In each generation, the N elite individuals with the highest fitness are preserved and directly carried over to the next generation without undergoing crossover or mutation. This effectively prevents the loss of high-quality genetic material due to random genetic operations.
Furthermore, this work maintains a global elite solution set that records the historically best N elite solutions found during the entire evolutionary process. This set not only provides a basis for analyzing algorithm performance but can also serve as a reference for potential restart strategies, thereby enhancing the algorithm’s ability to escape local optima.
(2)
Periodic Local Search
To enhance the algorithm’s local refinement ability, a periodic local search strategy is employed. Every G 1 generations, a two-opt local search is applied on a subset of high-fitness individuals in the current population. The procedure is described below:
Step 1:
Choose several top-ranked assignment schemes from the population based on fitness as initial solutions for the local search.
Step 2:
For each selected individual, randomly pick two distinct positions i and j ( i < j ) and reverse the subsequence between them in the target visiting sequence.
Step 3:
Compute the fitness of the newly generated sequence. If an improvement is observed, accept the change by replacing the original sequence; otherwise, retain the original.
Step 4:
Repeat Steps 2 and 3 until a predefined maximum number of local-search iterations is reached.
The two-opt operation effectively removes crossing edges within a path, which is particularly beneficial for improving solution quality in Traveling-Salesman-Problem-like tasks. By periodically integrating this local search, the algorithm achieves a dynamic balance between global exploration and local optimization, thereby improving overall solution efficiency and accuracy.

4.1.7. Adaptive Parameter Control and Restart Strategy

Effective genetic algorithms must balance exploration and exploitation. Exploration, largely driven by mutation, aims to discover new regions of the solution space, while exploitation, led primarily by crossover, focuses on refining searches around known high-quality solutions. The performance of the algorithm critically depends on parameters such as the crossover and mutation probabilities. Therefore, an adaptive parameter control mechanism is designed based on the monitoring of population state. Denote the current iteration count as G 2 , and introduce the following two metrics to characterize the algorithm’s current state.
First, the convergence state is assessed by computing the average improvement of the best fitness over the recent G ˙ 2 generations:
Δ f b e s t G 2 = 1 G ^ 2 k = G 2 G ^ 2 G 2 f b e s t k f b e s t G 2
where f b e s t k represents the best fitness value found at generation k. A negative value of Δ f b e s t G 2 indicates that solution quality has improved in recent iterations.
Second, population diversity is quantified by the average pairwise distance between individuals:
D p o p G 2 = 1 N p o p N p o p 1 i = 1 N p o p 1 j = i + 1 N p o p D π i , π j
where D π i , π j denotes the distance measure between individuals π i and π j .
Based on the convergence and diversity metrics introduced above, the following adaptive parameter adjustment strategy is formulated:
(1)
Stalled Convergence and Low Diversity: If Δ f b e s t G 2 < δ f , min and D p o p G 2 < δ D , the search is likely stagnating in a local optimum with reduced diversity. In this case, the mutation probability p m is increased and the crossover probability p c is decreased to encourage broader exploration.
(2)
Satisfactory Convergence Progress: If Δ f b e s t G 2 < δ f , max , it indicates that the population is of good quality and refinement is ongoing. Here, the mutation probability p m is decreased and the crossover probability p c is increased to promote the exploitation and recombination of promising solution structures, preventing excessive disruption from mutation.
When the algorithm is detected to be trapped in a local optimum (e.g., the best fitness shows no significant improvement over multiple consecutive generations), a restart strategy is triggered. This strategy preserves the elite solutions to retain the search progress while reinitializing the remainder of the population with certain probability. This injects new genetic material and helps the algorithm escape stagnation.
Through this adaptive control and restart mechanism, the search strategy is adjusted intelligently based on the real-time state, dynamically balancing exploration and exploitation. This approach effectively alleviates premature convergence and improves the likelihood of locating the global optimum.

4.1.8. Algorithm Flow

Based on the aforementioned designs and enhancements to the genetic algorithm, the complete procedure for solving the upper-level optimization model is summarized as follows:
Step 1:
Initialization. Set the algorithm parameters: population size N p o p , maximum generations G max , crossover probability p c , and mutation probability p m . Generate an initial population that satisfies the target-type matching constraints.
Step 2:
Evaluation and Reserve Set Construction. Compute the fitness and constraint violation for all individuals in the current population. Following the update mechanisms designed in this paper, construct and maintain the feasible reserve set Γ F and the infeasible reserve set Γ I .
Step 3:
Crossover Operation. Select individuals from the parent population and perform crossover: apply order crossover to the target sequence and arithmetic crossover to the delimiter sequence, thereby generating new offspring.
Step 4:
Mutation Operation. For each offspring, apply mutation with probability p m : on the target sequence, randomly execute one operator among swap, inversion, insertion, and scramble; on the delimiter sequence, apply a small perturbation mutation.
Step 5:
Environmental Selection and Next-Generation Formation. Combine the parent and offspring individuals, evaluate their fitness, and form the next-generation population through elitism and fitness-based selection.
Step 6:
Periodic Local Search. Every G 1 generations, perform a two-opt local search on the higher-fitness individuals in the population to refine their target sequences.
Step 7:
Adaptive Parameter Adjustment. Every G 2 generations, dynamically adjust the crossover probability p c and mutation probability p m according to the recent convergence measure Δ f best and the population diversity D pop .
Step 8:
Restart Strategy. Every G 3 generations, check whether the algorithm is stagnating in a local optimum. If so, preserve the elite individuals and reinitialize a portion of the remaining population to introduce new search directions.
Step 9:
Termination Check. If the current generation count reaches G max , terminate the algorithm and output the best solution from the feasible reserve set as the final assignment scheme; otherwise, return to Step 2 and continue.
The flowchart of the proposed algorithm is shown in Figure 3.

4.2. Algorithm Effectiveness Analysis

To validate the effectiveness of the proposed improved genetic algorithm, this section conducts comparative experiments against a standard genetic algorithm. The experimental setup is as follows: the kinematic constraints of the unmanned systems are temporarily disregarded, and all systems are assumed to move at a constant speed. The scenario involves four unmanned systems and 50 randomly located targets, with each system capable of engaging at most 15 targets.
The comparative results, presented in Table 1 and Figure 4, demonstrate that the proposed IGA achieves significant overall performance advantages in solving the unmanned swarm cooperative task assignment problem. A quantitative analysis of the metrics reveals the following:
  • The average fitness obtained by the improved genetic algorithm over 300 independent runs is 446.76, notably superior to the 539.12 achieved by the standard GA, indicating higher overall solution quality.
  • The best fitness found by the improved genetic algorithm (413.02) outperforms that of the standard genetic algorithm (437.36).
  • The worst fitness recorded for the improved genetic algorithm (457.52) is also considerably lower than the worst-case value for the standard genetic algorithm (654.12).
  • Critically, the variance of the improved genetic algorithm’s fitness is 7.91, compared to a significantly higher variance of 39.18 for the standard genetic algorithm. This indicates that the performance of the improved algorithm is stable and exhibits low fluctuation across multiple runs, confirming its strong robustness.
Further visualization of the path planning results reveals that the improved algorithm also demonstrates clear advantages in the spatial distribution and coordination of the solutions. As shown in Figure 4d, even in its worst-case solution, the four UAV trajectories generated by the improved algorithm are relatively balanced in spatial distribution. Each trajectory remains coherent without redundant coverage, reflecting effective task assignment and coordinated path planning.
In contrast, the worst-case solution produced by the traditional algorithm (shown in the corresponding sub-figure of Figure 4c) exhibits noticeable detours, path crossings, and repeated coverage, particularly in target-dense areas. This results in a significant increase in the total path length and an uneven workload distribution among the unmanned systems. These observations suggest that the traditional algorithm is more prone to becoming trapped in local optima under complex constraints and lacks sufficient global search capability.
This visual comparison supports the quantitative findings in Table 1, specifically, the higher variance and the larger gap between the worst and best solutions for the traditional algorithm. This indicates that the performance of the traditional algorithm is more sensitive to the initial population and random operations, leading to greater fluctuations in solution quality.
In summary, the proposed improved genetic algorithm demonstrates significant superiority over the traditional genetic algorithm in solving the unmanned swarm cooperative task assignment problem, in terms of solution quality, algorithmic stability, and the spatial rationality of the planned trajectories. The experimental results validate the effectiveness and advantages of the proposed algorithm from both statistical and visual perspectives, providing a theoretical foundation for its application in dynamic, large-scale practical scenarios.

5. Simulation Verification

This section verifies the proposed method through simulation experiments. First, a simplified scenario is constructed to analyze and demonstrate the innovation and advantages of the proposed integrated “decision-control” approach by comparing it with traditional task assignment approaches. Subsequently, the scenario is extended to a more complex case involving multiple unmanned systems and multiple targets to verify the practical applicability and scalability of the proposed method.
All simulation experiments were conducted on a desktop computer equipped with an Intel Core i7-14700KF processor (3.4 GHz base frequency, 20 cores, 28 threads), running the Windows 11 operating system. The algorithms were implemented and executed in MATLAB R2023b.

5.1. Analysis of Method Innovation

To validate the superiority of the integrated “decision-control” model for unmanned swarm cooperative task assignment under kinematic constraints, this subsection presents a comparative analysis against the traditional static task assignment model based on Euclidean distance. In the traditional model, assignment decisions are made exclusively based on the straight-line distance between unmanned systems and targets. This approach typically assume that all systems travel at constant maximum speed, thereby neglecting actual kinematic constraints and the trajectory optimization processes.
This subsection considers a scenario in which two unmanned systems are tasked with engaging multiple targets. The specific locations of the targets are provided in Table 2. The kinematic parameters for the UAVs are set as follows: speed range 10 , 50 m / s , acceleration range 5 , 5 m / s 2 , and heading angular rate range 0.1 , 0.1 rad / s .
By comparing the total mission completion time and the total path cost obtained by the two methods in this scenario, the performance limitations of the traditional method, which arise from its disregard for kinematic constraints and trajectory optimization, are clearly revealed. This comparison serves to demonstrate the necessity and advancement of the integrated modeling and optimization approach proposed in this paper.

5.1.1. Results of the Traditional Task Assignment Method

The traditional task assignment method makes decisions based solely on Euclidean distance. The resulting shortest straight-line paths are shown in Figure 5a. While this assignment appears geometrically efficient, the actual flight trajectories under realistic kinematic constraints (Figure 5b) require frequent heading and speed adjustments. This leads to a significant increase in mission completion time. Specifically, the calculated mission completion times for UAV 1 and UAV 2 are 94.03 s and 100.63 s, respectively.

5.1.2. Results of the Integrated “Decision-Control” Method

The task assignment scheme and the corresponding motion trajectories obtained by the integrated model proposed in this paper are shown in Figure 6. Although the total straight-line path length of this scheme is longer than that of the traditional method (detailed comparisons are presented in Table 3), the actual mission completion times for both UAVs, while satisfying all kinematic constraints, are significantly reduced to merely 66.07 s. This contrast clearly demonstrates that for a dynamic system, minimal spatial distance does not guarantee optimal temporal efficiency, and that the impact of kinematic constraints on overall performance is crucial. By co-optimizing task assignment and trajectory generation, the integrated “decision-control” approach achieves superior performance in the time domain.

5.1.3. Comparative Analysis of Results

The key metrics of the task assignment schemes obtained by the two methods are compared in Table 3.
Analysis of the assignment schemes and trajectory morphology reveals distinct strategic differences. The traditional method adopts a symmetrical region-partitioning strategy (Figure 5a), assigning targets on the left and right sides to UAV 1 and UAV 2, respectively. While this scheme appears balanced and reasonable when kinematic constraints are ignored, the actual trajectories under constraints (Figure 5b) exhibit frequent, sharp turns for each UAV, which substantially increases the time cost.
In contrast, the integrated model employs a cross-region, interleaved assignment strategy (Figure 6). As indicated in Table 3, this strategy breaks the simple spatial partition, resulting in more spatially dispersed target sequences for each UAV and consequently generating smoother overall trajectories. Its core advantage lies in significantly reducing the cumulative turning demand imposed on any single UAV within a localized area. The essence of this strategy is the implicit consideration of kinematic feasibility at the decision-making stage, thereby achieving a more balanced redistribution of the maneuvering load across the entire swarm.
From the perspective of the underlying optimization mechanism, the traditional decoupled approach fails to account for the dynamic characteristics of the unmanned systems during the task assignment phase. Consequently, the subsequent trajectory planning phase is constrained to passively adapt to the predetermined target sequence, offering limited scope for genuine optimization.
In contrast, the integrated model proposed in this paper employs a bi-level optimization architecture. Its critical mechanism is the direct incorporation of performance feedback from the low-level motion planner into the high-level task assignment decision. This enables the assignment decision to prospectively evaluate the achievable trajectory quality under different target sequences, as evidenced by the smoother trajectories shown in Figure 6. Therefore, the model actively selects target combinations that, while potentially longer in geometric distance, are more favorable for flight maneuverability and yield better overall completion time. This embodies the core design philosophy of “assignment as planning”, where the assignment decision inherently incorporates a pre-assessment of planning feasibility and expected performance.
The simulation results shown in Table 3 confirm that in cooperative engagement scenarios where kinematic constraints are significant, the proposed integrated “decision-control” method effectively captures the coupling relationship between the decision and control layer. Through spatiotemporal joint optimization, it significantly enhances the overall mission execution efficiency of the unmanned swarm.

5.2. Statistical Validation Under Multiple Random Scenarios

To eliminate the potential contingency in method comparison caused by specific target distributions, this subsection extends the experiments of Section 5.1 by conducting comparative tests under multiple randomly generated target locations. The experimental setup is as follows: the initial positions of the two UAVs are fixed, and 30 sets of target points, each containing 12 targets, are randomly generated within the mission area. For each set, the traditional Euclidean-distance-based task assignment method and the proposed bi-level integrated optimization method are applied, and the mission completion time (makespan) is recorded for both methods. To quantify the performance improvement of the integrated method over the traditional one, the time reduction Δ t = t trad t int and the relative improvement percentage Δ t Δ t t t r a d t t r a d × 100 % are calculated.
The detailed results for the 30 random scenarios are summarized in Table 4. Statistical analysis reveals that in 27 out of 30 experiments, the integrated method achieves a shorter mission completion time than the traditional method, while in the remaining three scenarios, the two methods yield identical times. No case is observed where the integrated method performs worse than the traditional one.
Further analysis of the improvement magnitude indicates that the average improvement percentage across all 30 scenarios is 21.31%, and the maximum improvement reaches 41.98% (Scenario 16: traditional 173.67 s → integrated 100.77 s). The standard deviation of improvement percentages is 13.58%, suggesting that the integrated method consistently enhances mission efficiency under various target distributions.
It is noteworthy that in Scenarios 5, 12, and 30, the target completion times obtained by the two methods are exactly equal. Analysis reveals that when the target distribution is relatively sparse and the relative positions are such that the path sequences planned by the traditional Euclidean-distance-based method inherently satisfy the kinematic feasibility conditions of the UAVs, the integrated method does not need to adjust the task assignment scheme to obtain a kinematically feasible and time-optimal trajectory. In such cases, the traditional solution itself is already kinematically feasible, and the integrated method naturally preserves this optimal solution, demonstrating its adaptive capability in determining whether adjustments are necessary.
The statistical results above provide robust evidence that the proposed integrated decision-control method consistently and significantly reduces mission completion time across different spatial distributions of targets, with statistically significant advantages and robustness.

5.3. Practicality Analysis of the Method

To evaluate the practicality of the proposed method, this section considers a cooperative engagement scenario involving a heterogeneous unmanned system swarm. The swarm comprises six unmanned systems of three different types: two unmanned aerial vehicles (UAVs), two unmanned surface vehicles (USVs), and two unmanned underwater vehicles (UUVs). They are tasked with cooperatively engaging 40 targets distributed across the operational area. The targets are classified into two types based on their operational environment: surface targets and underwater targets. Each platform type operates under unique physical constraints. Certain targets are only accessible to specific platform types. The decision layer must therefore incorporate target-platform compatibility constraints, adding complexity to the assignment problem.
The performance parameters of the unmanned systems are listed in Table 5, The specific locations of the targets are provided in Table 6. Notably, the UAVs must operate above a minimum airspeed to maintain flight and possess superior maximum speed and acceleration compared to the USVs and UUVs.
The positions and essential characteristics the targets are provided in Table 2. For clarity, surface targets are labeled as Type 1 and underwater targets as Type 2. Verifying the method’s performance in this complex, constrained scenario effectively demonstrates its practical utility for real-world heterogeneous swarm operations.

5.3.1. Analysis of Task Assignment Results

Based on the simulation using the unmanned system parameters from Table 5 and the target information from Table 2, the resulting cooperative task assignment scheme and execution sequence are summarized in Table 7.
The analysis shows that the proposed bi-level planning algorithm successfully assigned one unmanned system to each of the 40 targets, with every target assigned only once. This satisfies the fundamental requirements of completeness and uniqueness for the task assignment. Furthermore, the number of targets assigned to each unmanned system does not exceed its maximum capacity constraint. Specifically, Unmanned System 1 is assigned nine targets, while the remaining systems are assigned between four and eight targets, indicating the algorithm’s effective consideration of workload balancing. The total completion time for the swarm is dictated by the system with the longest execution time, namely Unmanned System 3 at 532.18 s. This outcome aligns with the typical bottleneck effect observed in cooperative multi-agent missions.
Figure 7 displays the motion trajectories of the six unmanned systems. Spatially, the task assignment demonstrates distinct type-matching characteristics: the two UAVs (Unmanned Systems 1 and 2) are assigned exclusively to surface targets (Type 1), the two UUVs (Unmanned Systems 5 and 6) are assigned solely to underwater targets (Type 2), and the USVs (Unmanned Systems 3 and 4) are allocated a combination of both target types. This assignment strategy fully leverages the inherent operational strengths of each platform: the high speed and agility of UAVs make them suitable for surface targets; UUVs are specialized for the underwater domain; and USVs act as flexible assets capable of engaging both surface and underwater targets.
Notably, the generated trajectories are smooth, continuous curves rather than simple straight-line segments. This confirms that the algorithm successfully integrates the kinematic constraints of the unmanned systems, such as acceleration and heading rate limits, into the trajectories optimization process. Consequently, the resulting paths are not only physically feasible but also optimized for time efficiency, reflecting the effective coupling between decision-level assignment and control-level execution in the proposed bi-level framework.

5.3.2. Analysis of Unmanned System Motion States

To further verify the compliance of each unmanned system’s motion with its dynamic constraints, Figure 8 presents the detailed time histories of velocity, acceleration, and heading angular rate.
Analysis of UAV Motion States: An analysis of the UAV states in Figure 8a,b shows that both UAVs rapidly accelerate to a speed close to their upper limit (50 m/s) during the initial phase and maintain this high speed throughout most of the subsequent cruise segments. This behavior directly reflects the algorithm’s time-minimization objective. Upon reaching cruise speed, their acceleration remains near zero, while the heading angular rate is adjusted dynamically according to waypoint transitions. This motion pattern of “high-speed cruising with heading adjustments” is fundamentally shaped by the stall constraint inherent to fixed-wing UAVs, which mandates a minimum speed of 10 m/s to maintain aerodynamic stability. This physical limitation prevents them from performing significant deceleration near targets; instead, the algorithm relies on heading angular rate adjustments to execute turns. By preserving momentum and avoiding speed loss, UAVs fully leverage their speed advantage to minimize global mission completion time. This demonstrates that the algorithm prioritizes sustained high-speed performance for platforms where speed is the primary driver of mission efficiency.
Analysis of USV Motion States: The motion states of the USVs (Figure 8c,d) generally follow a pattern similar to that of the UAVs. However, their cruise speed remains at a lower level (approximately 30 m/s), and corresponding variations in acceleration are also smaller, which aligns with their specified performance parameters in Table 4.
Analysis of UUV Motion States: For the UUVs, illustrated in Figure 8e,f, their overall state curves are more gradual, reflecting their inherently low-speed and stable dynamic characteristics. A detailed analysis reveals that UUVs exhibit a distinct “deceleration-turn-acceleration” maneuver pattern when approaching targets, which differs notably from the strategy of UAVs. This behavior is clearly demonstrated by the synchronized, periodic dips in the velocity curve and the corresponding pulsed peaks in the heading angular rate curve. The underlying reason lies in the UUVs’ operational flexibility: unlike UAVs, they are not constrained by a minimum stall speed (with lower bound reaching 0 m/s), which enables active deceleration to facilitate sharper turns. From a kinematic perspective, this “slow-down-to-turn-sharply” approach allows for planning local trajectories with higher curvature, and consequently shorter path lengths, in the vicinity of target points. Although each deceleration incurs a temporary speed loss, the reduction in trajectory length—combined with more efficient turning—ultimately contributes to a decrease in the total travel time from the current waypoint to the next target. This illustrates the algorithm’s adaptive optimization tailored to platforms where agility can compensate for lower speed: for UUVs, minimizing local path curvature through flexible deceleration becomes a more effective strategy than maintaining constant high velocity.
Moreover, the state curves for all platform types show that significant changes in heading angular rate occur primarily near the target points, while remaining relatively steady during transits between waypoints, presenting an overall intermittent, pulse-like profile. This observation is fully consistent with the behavior visible in the trajectory plot (Figure 7), where heading adjustments are made just before the unmanned systems reach their assigned targets.
In summary, the distinct motion strategies observed across platforms arise from the algorithm’s ability to internalize their unique dynamic constraints within the bi-level optimization framework. UAVs, constrained by stall limits, adopt high-speed cruising to minimize inter-target transit time. UUVs, leveraging the absence of minimum speed restrictions, employ localized deceleration to enhance turning agility, compensating for their lower speed through path optimization. USVs adopt a balanced intermediate strategy suited to their hybrid operational role at the air–sea interface. These platform-adaptive behaviors collectively ensure that the swarm achieves global mission efficiency—not by enforcing a uniform motion pattern, but by orchestrating heterogeneous capabilities in a complementary manner. The results confirm that the proposed integrated “decision-control” paradigm effectively translates kinematic constraints into operationally meaningful trajectories, enhancing both individual feasibility and collective mission performance.

6. Conclusions

6.1. Contributions

This paper addresses the inherent challenges in cooperative task assignment for cross-domain unmanned swarms, specifically the decoupling of decision-making from execution and the neglect of platform-specific kinematic constraints. An integrated “decision-control” cooperative task assignment method based on a bi-level optimization framework is proposed. The main contributions are systematically articulated in the following three dimensions:
(1)
“Decision-Control” Integrated Modeling Paradigm: A tightly coupled bi-level programming model is established, wherein the upper-level task assignment optimizes based on the time cost fed back from the lower-level optimal control solution. This paradigm ensures the inherent executability of the assignment schemes, achieving closed-loop optimization between decision-making and execution.
(2)
Bi-Level Optimization Solution Framework: A dedicated solution framework is developed for the proposed complex model. At the lower level, differential flatness theory and the Radau pseudospectral method are synergistically employed to transform the continuous-time optimal control problem into a tractable nonlinear programming problem. At the upper level, an enhanced genetic algorithm is designed, integrating hybrid encoding, a dual-archive elitism preservation strategy, adaptive operators, and periodic local search to efficiently solve the combinatorial optimization problem, balancing global exploration and local refinement.
(3)
Heterogeneous Platform Cooperative Validation: Simulation results rigorously validate the effectiveness of the proposed method. Comparative studies demonstrate a significant reduction in mission makespan compared to traditional Euclidean-distance-based approaches. Furthermore, in a complex heterogeneous scenario involving UAVs, USVs, and UUVs, the method autonomously generates smooth, kinematically feasible trajectories for each platform type while achieving a balanced task load, substantiating its practicality and superiority.

6.2. Future Work

Given the computational complexity of the proposed bi-level framework, which currently renders it suitable for offline mission planning, future research will focus on enabling real-time applicability through the following specific directions:
(1)
Dynamic Online Re-planning: Investigate incremental replanning strategies that adjust only the portions of the assignment and trajectories affected by environmental changes. This will be combined with surrogate models (e.g., neural networks) to rapidly approximate the lower-level time-optimal control cost and parallel computing techniques to accelerate individual evaluations within the genetic algorithm, thereby enhancing responsiveness to unforeseen events.
(2)
Distributed Solving Architecture: Design a distributed architecture based on consensus-based auction algorithms and the Alternating Direction Method of Multipliers (ADMM). This will enable individual platforms to solve their own subproblems independently and achieve global coordination through limited communication, thereby overcoming the computational bottlenecks of centralized methods and supporting cooperative mission planning for ultra-large-scale swarms.
(3)
Robust Multi-Objective Optimization: Develop a multi-objective bi-level optimization model that considers performance criteria beyond mission time, such as energy consumption and operational risk. Surrogate models will be leveraged to aid multi-objective evolutionary search, while uncertainty quantification methods will be incorporated to enhance the robustness of solutions against parameter perturbations in dynamic environments.
Advancing research in these directions is expected to progressively enhance the real-time responsiveness, robustness, and scalability of the proposed integrated “decision-control” paradigm, laying a theoretical foundation for its deployment in practical, large-scale cross-domain unmanned swarm operations.

Author Contributions

Conceptualization, A.Z. and J.Z.; methodology, A.Z. and J.Z.; software, A.Z. and X.L.; validation, A.Z., X.L. and J.Z.; writing—original draft preparation, A.Z.; writing—review and editing, X.L., J.Z., Y.X. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, X.; Lu, X.; Chen, W.; Ge, D.; Zhu, J. Research on UAVs reconnaissance task allocation method based on communication preservation. IEEE Trans. Consum. Electron. 2024, 70, 684–695. [Google Scholar] [CrossRef]
  2. He, C.; Dong, Y.; Wang, Z.J. Radio map assisted multi-UAV target searching. IEEE Trans. Wirel. Commun. 2022, 22, 4698–4711. [Google Scholar] [CrossRef]
  3. Lei, T.; Luo, C.; Sellers, T.; Wang, Y.; Liu, L. Multitask allocation framework with spatial dislocation collision avoidance for multiple aerial robots. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 5129–5140. [Google Scholar] [CrossRef]
  4. Sun, L.; Wang, J.; Wang, J.; Lin, L.; Gen, M. Efficient joint deployment of multi-UAVs for target tracking in traffic big data. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7780–7791. [Google Scholar] [CrossRef]
  5. Chang, Z.; Deng, H.; You, L.; Min, G.; Garg, S.; Kaddoum, G. Trajectory design and resource allocation for multi-UAV networks: Deep reinforcement learning approaches. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2940–2951. [Google Scholar] [CrossRef]
  6. Jiang, B.; Wen, G.; Zhou, J.; Zheng, D. Cross-domain cooperative technology of intelligent unmanned swarm systems: Current status and prospects. Strateg. Study Chin. Acad. Eng. 2024, 26, 117–126. [Google Scholar] [CrossRef]
  7. Chakraborty, N.; Akella, S.; Wen, J.T. Coverage of a Planar Point Set with Multiple Robots Subject to Geometric Constraints. IEEE Trans. Autom. Sci. Eng. 2010, 7, 111–122. [Google Scholar] [CrossRef]
  8. Liao, Y.; Chen, X.; Xia, S.; Ai, Q.; Liu, Q. Energy Minimization for UAV Swarm-Enabled Wireless Inland Ship MEC Network with Time Windows. IEEE Trans. Green Commun. Netw. 2023, 7, 594–608. [Google Scholar] [CrossRef]
  9. Han, B.; Leblet, J.; Simon, G. Hard multidimensional multiple choice knapsack problems, an empirical study. Comput. Oper. Res. 2010, 37, 172–181. [Google Scholar] [CrossRef]
  10. Pavone, M.; Frazzoli, E.; Bullo, F. Adaptive and Distributed Algorithms for Vehicle Routing in a Stochastic and Dynamic Environment. IEEE Trans. Autom. Control 2011, 56, 1259–1274. [Google Scholar] [CrossRef]
  11. Ma, T.; Lu, P.; Deng, F.; Geng, K. Air–Ground Collaborative Multi-Target Detection Task Assignment and Path Planning Optimization. Drones 2024, 8, 110. [Google Scholar] [CrossRef]
  12. Jia, K.; Yang, D.; Wang, Y.; Shui, T.; Liu, C. Energy Efficient and Balanced Task Assignment Strategy for Multi-AAV Patrol Inspection System in Mobile Edge Computing Network. IEEE Trans. Netw. Sci. Eng. 2025, 12, 210–222. [Google Scholar] [CrossRef]
  13. Hua, X.; Wang, Z.; Yao, H.; Li, B.; Shi, C.; Zuo, J. Research on many-to-many target assignment for unmanned aerial vehicle swarm in three-dimensional scenarios. Comput. Electr. Eng. 2021, 91, 107067. [Google Scholar] [CrossRef]
  14. Wang, G.; Wang, F.; Wang, J.; Li, M.; Gai, L.; Xu, D. Collaborative target assignment problem for large-scale UAV swarm based on two-stage greedy auction algorithm. Aerosp. Sci. Technol. 2024, 149, 109146. [Google Scholar] [CrossRef]
  15. Wang, F.; Huang, Z.L.; Han, M.C.; Xing, L.; Wang, L. A knee point based coevolution multi-objective particle swarm optimization algorithm for heterogeneous UAV cooperative multi-task allocation. Acta Autom. Sin. 2023, 49, 399–414. [Google Scholar]
  16. Ruipeng, Z.; Yanxiang, F.; Yikang, Y. Hybrid particle swarm algorithm for multi-UAV cooperative task allocation. Acta Aeronaut. Astronaut. Sin. 2022, 43, 326011. [Google Scholar]
  17. Huang, Y.; Li, W.; Ning, J.; Li, Z. Formation control for UAV-USVs heterogeneous system with collision avoidance performance. J. Mar. Sci. Eng. 2023, 11, 2332. [Google Scholar] [CrossRef]
  18. Yan, M.; Yuan, H.; Xu, J.; Yu, Y.; Jin, L. Task allocation and route planning of multiple UAVs in a marine environment based on an improved particle swarm optimization algorithm. EURASIP J. Adv. Signal Process. 2021, 1, 94. [Google Scholar]
  19. Peng, J.; Viswanath, H.; Bera, A. Graph-Based Decentralized Task Allocation for Multi-Robot Target Localization. IEEE Robot. Autom. Lett. 2024, 9, 10676–10683. [Google Scholar] [CrossRef]
  20. Liu, H.; Chen, Q.; Pan, N.; Sun, Y.; Yang, Y. Three-Dimensional Mountain Complex Terrain and Heterogeneous Multi-UAV Cooperative Combat Mission Planning. IEEE Access 2020, 8, 197407–197419. [Google Scholar] [CrossRef]
  21. Jian, D.; Fei, X.; Qifeng, C. Multi-UAV cooperative search on region division and path planning. Acta Aeronaut. Astronaut. Sin. 2020, 41, 723770. [Google Scholar]
  22. Xiao, P.; Xie, F.; Ni, H.; Zhang, M.; Tang, Z.; Li, N. Research on Collaborative Optimization Method of Multi-UAV Task Allocation and Path Planning. J. Syst. Simul. 2024, 36, 1141–1151. [Google Scholar]
  23. Wu, W.; Zhang, L.; Le, J.; Lu, Z. Integrated method for multi-UAV task assignment and trajectory planning with deadlock based on Three-dimensional dubins path. Sci. Rep. 2025, 15, 24152. [Google Scholar]
Figure 1. Algorithm encoding method.
Figure 1. Algorithm encoding method.
Drones 10 00193 g001
Figure 2. Schematic diagram of the algorithm encoding method.
Figure 2. Schematic diagram of the algorithm encoding method.
Drones 10 00193 g002
Figure 3. Flowchart of the proposed algorithm.
Figure 3. Flowchart of the proposed algorithm.
Drones 10 00193 g003
Figure 4. Results of the Algorithm Comparison. (a) Best Solution of the Genetic Algorithm; (b) Best Solution of the Improved Genetic Algorithm; (c) Worst Solution of the Genetic Algorithm; (d) Worst Solution of the Improved Genetic Algorithm.
Figure 4. Results of the Algorithm Comparison. (a) Best Solution of the Genetic Algorithm; (b) Best Solution of the Improved Genetic Algorithm; (c) Worst Solution of the Genetic Algorithm; (d) Worst Solution of the Improved Genetic Algorithm.
Drones 10 00193 g004
Figure 5. Comparison of task assignment schemes without considering kinematic constraints of unmanned systems. (a) Trajectories under traditional task assignment without kinematic constraints; (b) trajectories under traditional task assignment with kinematic constraints.
Figure 5. Comparison of task assignment schemes without considering kinematic constraints of unmanned systems. (a) Trajectories under traditional task assignment without kinematic constraints; (b) trajectories under traditional task assignment with kinematic constraints.
Drones 10 00193 g005
Figure 6. Task assignment scheme and motion trajectories obtained by the integrated model.
Figure 6. Task assignment scheme and motion trajectories obtained by the integrated model.
Drones 10 00193 g006
Figure 7. Trajectories of the unmanned systems.
Figure 7. Trajectories of the unmanned systems.
Drones 10 00193 g007
Figure 8. State Variation Profiles of Unmanned Systems. (a) State Variables of UAV 1; (b) State Variables of UAV 2; (c) State Variables of USV 3; (d) State Variables of USV 4; (e) State Variables of UUV 5; (f) State Variables of UUV 6.
Figure 8. State Variation Profiles of Unmanned Systems. (a) State Variables of UAV 1; (b) State Variables of UAV 2; (c) State Variables of USV 3; (d) State Variables of USV 4; (e) State Variables of UUV 5; (f) State Variables of UUV 6.
Drones 10 00193 g008
Table 1. Results of the Algorithm Comparison.
Table 1. Results of the Algorithm Comparison.
Improved Genetic AlgorithmStandard Genetic Algorithm
Best fitness448.87    450.81    425.58    438.01
413.02    425.58    428.07    435.84
436.81    438.80    440.35    442.95
444.14    439.16    443.34    441.11
451.28    439.68    438.01    …
437.36    444.33    458.63    476.17
492.87    498.58    545.66    481.49
518.87    502.36    561.19    497.63
499.49    515.01    542.91    561.28
532.19    547.17    578.17    …
Average446.76539.12
Worst457.52654.12
Best413.02437.36
Variance7.9139.18
Table 2. Specific locations of the targets.
Table 2. Specific locations of the targets.
TargetxyTargetxy
Target 1−326.00121.00Target 7−252.001431.00
Target 2−471.00667.00Target 8−468.00824.00
Target 3−287.00590.00Target 9−252.00931.00
Target 4−6.00500.00Target 1035.00998.00
Target 5317.00613.00Target 11310.00891.00
Target 6442.00267.00Target 12438.001239.00
Table 3. Comparison of Results Between the Traditional Method and the Integrated Decision-Control Method.
Table 3. Comparison of Results Between the Traditional Method and the Integrated Decision-Control Method.
Comparison ItemUnmanned System 1Unmanned System 2
Traditional Task Assignment MethodAssignment Scheme 1 , 2 , 3 , 4 , 5 , 6 7 , 8 , 9 , 10 , 11 , 12
Mission Execution Time94.03 s100.63 s
Total Euclidean Path Length2633.652617.07
Integrated Decision-Control MethodAssignment Scheme 1 , 2 , 9 , 10 , 11 , 6 7 , 8 , 3 , 4 , 5 , 12
Mission Execution Time66.07 s66.07 s
Total Euclidean Path Length2999.162986.10
Table 4. Statistical metrics of mission completion times across 30 random scenarios.
Table 4. Statistical metrics of mission completion times across 30 random scenarios.
Traditional Task Assignment MethodIntegrated Decision-Control MethodDifference (s)
System 1System 2System 1System 2
Scenario 1Sequence1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 121, 2, 9, 10, 11, 67, 8, 3, 4, 5, 1234.56
Time (s)100.6366.07
Scenario 2Sequence1, 3, 4, 11, 5, 67, 8, 2, 9, 10, 121, 2, 9, 10, 11, 67, 8, 3, 4, 5, 1239.41
Time (s)113.2673.85
Scenario 3Sequence1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 121, 2, 9, 10, 11, 67, 8, 3, 4, 5, 1219.90
Time (s)91.2471.34
Scenario 4Sequence1, 2, 3, 4, 10, 56, 7, 8, 9, 11, 121, 2, 7, 8, 9, 11, 56, 3, 4, 10, 1219.91
Time (s)91.6671.75
Scenario 5Sequence6, 7, 8, 9, 10, 12, 113, 2, 1, 4, 56, 7, 8, 9, 10, 12, 113, 2, 1, 4, 50.00
Time (s)105.54105.54
Scenario 6Sequence10, 12, 11, 9, 86, 3, 1, 2, 4, 5, 72, 1, 5, 9, 12, 106, 7, 11, 8, 4, 330.05
Time (s)150.5120.45
Scenario 7Sequence4, 1, 2, 3, 7, 810, 12, 11, 9, 6, 53, 10, 12, 11, 9, 85, 6, 7, 4, 1, 252.89
Time (s)159.87106.98
Scenario 8Sequence3, 1, 2, 5, 11, 98, 10, 12, 7, 4, 65, 8, 10, 12, 11, 94, 2, 1, 3, 7, 669.98
Time (s)170.36100.38
Scenario 9Sequence2, 3, 1, 4, 5, 68, 7, 9, 10, 12, 112, 3, 8, 11, 12, 104, 1, 6, 9, 7, 556.32
Time (s)167.55111.23
Scenario 10Sequence5, 4, 2, 7, 9, 10, 1112, 8, 6, 3, 15, 4, 2, 6, 8, 10, 111, 3, 7, 9, 1235.94
Time (s)147.35111.41
Scenario 11Sequence9, 10, 12, 11, 7, 83, 1, 2, 5, 4, 62, 1, 3, 4, 5, 96, 7, 8, 10, 12, 1166.54
Time (s)183.11116.57
Scenario 12Sequence5, 7, 8, 10, 11, 12, 93, 2, 1, 4, 65, 7, 8, 10, 11, 12, 93, 2, 1, 4, 60.00
Time (s)95.6395.63
Scenario 13Sequence2, 1, 4, 3, 56, 8, 7, 9, 10, 12, 112, 4, 8, 6, 3, 51, 7, 9, 10, 12, 1121.67
Time (s)124.75103.08
Scenario 14Sequence3, 2, 1, 4, 5, 69, 11, 12, 10, 8, 73, 2, 5, 7, 9, 101, 4, 6, 8, 12, 1146.34
Time (s)156.09109.75
Scenario 15Sequence6, 5, 4, 3, 1, 27, 8, 12, 11, 10, 92, 3, 5, 7, 11, 121, 4, 6, 8, 10, 974.02
Time (s)185.87111.85
Scenario 16Sequence1, 2, 3, 4, 6, 78, 9, 5, 10, 12, 111, 4, 8, 9, 10, 52, 3, 6, 7, 12, 1172.90
Time (s)173.67100.77
Scenario 17Sequence1, 2, 3, 5, 6, 7, 94, 8, 12, 10, 111, 2, 5, 8, 10, 94, 3, 6, 7, 13, 1123.58
Time (s)129.45105.87
Scenario 18Sequence1, 2, 3, 4, 6, 78, 9, 5, 10, 12, 111, 4, 8, 9, 10, 52, 3, 6, 7, 12, 1172.90
Time (s)173.67100.77
Scenario 19Sequence1, 2, 3, 5, 6, 7, 94, 8, 12, 10, 111, 2, 5, 8, 10, 94, 3, 6, 7, 13, 1123.57
Time (s)129.44105.87
Scenario 20Sequence3, 5, 6, 7, 10, 11, 121, 2, 4, 8, 93, 4, 7, 10, 11, 121, 2, 5, 6, 8, 97.41
Time (s)117.77110.36
Scenario 21Sequence1, 2, 3, 5, 4, 6, 87, 9, 10, 11, 121, 2, 5, 7, 9, 11, 83, 4, 6, 10, 1214.88
Time (s)116.32101.44
Scenario 22Sequence6, 8, 9, 11, 12, 102, 1, 3, 7, 5, 46, 3, 1, 4, 9, 122, 5, 7, 8, 10, 111.22
Time (s)113.38112.16
Scenario 23Sequence9, 7, 8, 10, 11, 121, 2, 4, 5, 3, 67, 5, 3, 6, 8, 121, 2, 4, 9, 11, 109.90
Time (s)118.64108.74
Scenario 24Sequence2, 1, 3, 4, 5, 69, 7, 8, 10, 11, 122, 1, 3, 6, 9, 85, 4, 7, 10, 11, 1218.43
Time (s)127.02108.59
Scenario 25Sequence3, 2, 1, 4, 6, 7, 58, 9, 12, 10, 116, 4, 1, 3, 9, 10, 112, 5, 7, 8, 1271.33
Time (s)197.6126.27
Scenario 26Sequence2, 1, 3, 4, 6, 57, 8, 9, 10, 11, 122, 1, 3, 6, 8, 97, 4, 5, 10, 11, 121.97
Time (s)103.25101.28
Scenario 27Sequence7, 9, 8, 12, 10, 112, 1, 3, 4, 5, 65, 4, 3, 8, 9, 10, 112, 1, 6, 7, 1224.76
Time (s)140.91116.15
Scenario 28Sequence1, 2, 3, 5, 4, 67, 9, 8, 11, 12, 101, 2, 3, 9, 10, 117, 5, 4, 6, 8, 12, 616.71
Time (s)123.13106.42
Scenario 29Sequence10, 9, 7, 8, 12, 111, 2, 3, 6, 5, 42, 1, 4, 5, 9, 113, 6, 10, 12, 8, 724.33
Time (s)130.81106.48
Scenario 30Sequence4, 1, 2, 3, 5, 67, 8, 11, 12, 10, 94, 1, 2, 3, 5, 67, 8, 11, 12, 10, 90.00
Time (s)114.45114.45
Table 5. Parameters of the unmanned systems.
Table 5. Parameters of the unmanned systems.
Unmanned SystemTypeSpeed (m/s)Acceleration (m/s2)Heading Angle Rate (rad/s)Max Number of Engagements
Unmanned system 1UAV 10 , 50 5 , 5 0.1 , 0.1 10
Unmanned system 2UAV 10 , 50 5 , 5 0.1 , 0.1 10
Unmanned system 3USV 0 , 30 3 , 3 0.1 , 0.1 10
Unmanned system 4USV 0 , 30 3 , 3 0.1 , 0.1 10
Unmanned system 5UUV 0 , 20 2 , 2 0.1 , 0.1 10
Unmanned system 6UUV 0 , 20 2 , 2 0.1 , 0.1 10
Table 6. Parameters of the Targets.
Table 6. Parameters of the Targets.
TargetTypexyTargetTypexy
Target 11−20001800Target 21220006000
Target 212003000Target 221−10005500
Target 3204000Target 231−5006500
Target 415001500Target 242−25006000
Target 5115003500Target 25115009000
Target 6220002100Target 261−50002800
Target 711000500Target 272−52009100
Target 81−10002900Target 28140001000
Target 91−20003500Target 29140006000
Target 10130003000Target 30250002500
Target 112−30001900Target 31150008500
Target 121−40001000Target 322−40008000
Target 13105000Target 33128004800
Target 142−5002000Target 341−40004000
Target 151−30005200Target 352−55006200
Target 161−39006900Target 36255007000
Target 171−20008000Target 37242004000
Target 182−5008200Target 38140009000
Target 19110007000Target 391−30009500
Target 20230008000Target 40190009500
Table 7. Task Assignment Scheme and Execution Sequence for Unmanned Systems.
Table 7. Task Assignment Scheme and Execution Sequence for Unmanned Systems.
Unmanned SystemTypeEngagement SequenceTargets NumberTotal Time
Unmanned System 1UAV 4 , 2 , 8 , 13 , 33 , 5 , 10 , 28 , 7 9413.536 s
Unmanned System 2UAV 40 , 23 , 22 , 15 , 16 , 17 , 38 , 39 8411.524 s
Unmanned System 3USV 34 , 9 , 3 , 14 , 1 , 11 , 12 , 26 8532.179 s
Unmanned System 4USV 31 , 20 , 25 , 18 , 19 , 21 , 29 , 36 8520.561 s
Unmanned System 5UUV 37 , 6 , 30 3488.942 s
Unmanned System 6UUV 35 , 24 , 32 , 27 4530.251 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, A.; Liang, X.; Zhang, Z.; Xiao, Y.; Zhang, J. Research on Integrated Decision-Control Cooperative Target Assignment for Cross-Domain Unmanned Systems Based on a Bi-Level Optimization Framework. Drones 2026, 10, 193. https://doi.org/10.3390/drones10030193

AMA Style

Zheng A, Liang X, Zhang Z, Xiao Y, Zhang J. Research on Integrated Decision-Control Cooperative Target Assignment for Cross-Domain Unmanned Systems Based on a Bi-Level Optimization Framework. Drones. 2026; 10(3):193. https://doi.org/10.3390/drones10030193

Chicago/Turabian Style

Zheng, Aoyu, Xiaolong Liang, Zhiyang Zhang, Yuyan Xiao, and Jiaqiang Zhang. 2026. "Research on Integrated Decision-Control Cooperative Target Assignment for Cross-Domain Unmanned Systems Based on a Bi-Level Optimization Framework" Drones 10, no. 3: 193. https://doi.org/10.3390/drones10030193

APA Style

Zheng, A., Liang, X., Zhang, Z., Xiao, Y., & Zhang, J. (2026). Research on Integrated Decision-Control Cooperative Target Assignment for Cross-Domain Unmanned Systems Based on a Bi-Level Optimization Framework. Drones, 10(3), 193. https://doi.org/10.3390/drones10030193

Article Metrics

Back to TopTop