Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing

Gao, Jiazhan; Wu, Yutian; Jia, Liruizhi; Shi, Heng; Zhu, Jihong

doi:10.3390/drones10010059

Open AccessArticle

Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing

by

Jiazhan Gao

¹

,

Yutian Wu

²,

Liruizhi Jia

¹,

Heng Shi

^2,*

and

Jihong Zhu

^2,*

¹

School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China

²

Department of Precision Instrument, Tsinghua University, Beijing 100084, China

^*

Authors to whom correspondence should be addressed.

Drones 2026, 10(1), 59; https://doi.org/10.3390/drones10010059

Submission received: 2 December 2025 / Revised: 12 January 2026 / Accepted: 13 January 2026 / Published: 14 January 2026

(This article belongs to the Special Issue Path Planning, Trajectory Tracking and Guidance for UAVs: 3rd Edition)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We developed a geometry-physics-consistent framework that explicitly models Dubins SE(2) equivariance via dual-channel embedding and theoretically proven Rotary Phase Encoding.
The proposed model outperforms state-of-the-art neural baselines in optimality and stability while achieving inference speeds roughly three orders of magnitude faster than classical metaheuristics.

What are the implications of the main findings?

Ablation studies verify the complementarity of the proposed modules, confirming that explicit geometric embedding is essential for nonholonomic routing and superior to naive reward substitution.
The framework establishes a scalable, real-time foundation for fixed-wing UAV autonomy, demonstrating robust zero-shot generalization to unseen turning radii and complex task variants.

Abstract

The nonholonomic constraints of fixed-wing UAVs, characterized by coupled heading-curvature feasibility and asymmetric costs, fundamentally deviate from classical Euclidean routing assumptions. While standard neural combinatorial optimization (NCO) architectures could theoretically incorporate Dubins costs via reward signals, such naive adaptations lack the capacity to explicitly model the intrinsic SE(2) geometric invariance and directional asymmetry of fixed-wing motion, leading to suboptimal generalization. To bridge this gap, we propose a Dubins-Aware NCO framework. We design a dual-channel embedding to decouple asymmetric physical distances from rotation-stable geometric features. Furthermore, we introduce a Rotary Phase Encoding (RoPhE) mechanism that theoretically guarantees strict SO(2) equivariance within the attention layer. Extensive sensitivity, ablation, and cross-distribution generalization experiments are conducted on tasks spanning varying turning radii and problem variants with instance scales of 10, 20, 36, and 52 nodes. The results consistently validate the superior optimality and stability of our approach compared with state-of-the-art DRL and NCO baselines, while maintaining significant computational efficiency advantages over classical heuristics. Our results highlight the importance of explicitly embedding geometry-physics consistency, rather than relying on scalar reward signals, for real-world fixed-wing autonomous scheduling.

Keywords:

fixed-wing UAVs; dubins path; neural combinatorial optimization; deep reinforcement learning; SE(2) equivariance; nonholonomic constraints

1. Introduction

1.1. Background and Motivation

Fixed-wing UAVs are essential for long-range missions like environmental monitoring and logistics [1,2], utilizing their superior endurance to visit numerous waypoints under strict constraints [3,4]. Unlike rotary-wing systems, fixed-wing platforms operate under nonholonomic dynamics, characterized by irreversible forward motion and bounded turns [5]. Consequently, feasible paths cannot use Euclidean metrics but must conform to Dubins primitives [6,7].

These kinematics introduce asymmetric costs and direction dependence, complicating standard routing models like the Traveling Salesman Problem (TSP) and Vehicle Routing Problem (VRP) [8]. While exact geometric [9,10] and mixed-integer approaches [11] offer fidelity, they lack scalability [12]. Alternatively, metaheuristics have been widely applied, including distributed Ant Colony Optimization [13], extended Consensus Bundle Algorithms [14], and learning-based Particle Swarm Optimization [15]. However, these methods often suffer from premature convergence and typically decouple routing from kinematic constraints, necessitating feasibility post-processing. Ultimately, neither approach fully meets the real-time scalability requirements of modern UAV operations.

1.2. Limits of Current Learning-Based Solvers

Neural Combinatorial Optimization (NCO) has emerged as a promising data-driven alternative to traditional heuristics [16,17]. By learning constructive policies, Transformer-based solvers [18,19] achieve competitive performance on Euclidean routing problems, relying on self-attention mechanisms that effectively capture translational invariance and symmetric geometric structures.

However, a critical gap remains in adapting these architectures to nonholonomic routing. Current “geometry-conscious” NCO models typically treat nodes as isotropic points in Euclidean space, implicitly assuming that edge costs are independent of approach angles. This assumption fails in the context of fixed-wing flight. Simply substituting Euclidean distances with Dubins costs is insufficient because it ignores the vector nature of the nodes: Dubins spaces are governed by the Special Euclidean Group in 2 Dimensions (SE(2)), where feasibility is strictly coupled with relative headings. Existing solvers lack the specific geometric inductive bias to process these directed states, rendering them unable to capture the inherent asymmetry and anisotropy of Dubins paths. Consequently, they struggle to generalize across varying turning radii or maintain solution quality under stringent kinematic constraints.

1.3. Objectives and Contributions

To address these challenges, we introduce Dubins-Aware Neural Combinatorial Optimization, referred to as Dubins-Aware NCO, which is a reinforcement learning framework based on an encoder-decoder architecture designed to capture the kinematic characteristics of fixed-wing aircraft and the geometric structure consistent with Special Orthogonal group in 2 dimensions (SO(2)) and SE(2). The framework embeds the physical principles of Dubins motion directly into the inductive bias of the model, instead of relying on the network to infer non-Euclidean structure from data. The framework integrates three key designs: a dual-channel embedding for asymmetric costs, a Rotary Phase Encoding for SO(2) equivariance, and a mixed-score attention for feasibility, explicitly embedding Dubins physical principles into the model’s inductive bias.

Our main contributions are as follows:

We examine key properties of Dubins paths and design a multi-channel embedding that jointly encodes asymmetric Dubins distances and relative SE(2) geometric features.
Through theoretical analysis, we show that the proposed Rotary Phase Encoding achieves strict equivariance with respect to global rotations in the plane. This property ensures that the attention dot product $q^{T} k$ preserves an inductive bias consistent with SO(2) equivariance.
We incorporate the Dubins distance matrix as an auxiliary signal in cross-fusion attention, allowing the model to prioritize trajectories that satisfy both semantic requirements and fixed-wing dynamic feasibility.

Extensive experiments on heading-constrained TSP, CVRP, and the Pickup and Delivery Problem (PDP) show that the proposed method achieves consistently superior performance across a wide range of problem scales compared with classical heuristics and the state-of-the-art neural combinatorial optimization baselines. The solver also demonstrates strong generalization capability when evaluated on unseen turning radii and on extended routing formulations such as the Prize-Collecting Traveling Salesman Problem (PCTSP) and Split Delivery Vehicle Routing Problem (SDVRP). When combined with the theoretical analysis, these empirical findings indicate that explicit modeling of the SE(2) geometry and the inherent asymmetry of Dubins motion are essential for reliable UAV routing under heading constraints. This work addresses an important gap in learning-based combinatorial optimization for nonholonomic systems and establishes a principled foundation for the deployment of such methods in practical UAV planning scenarios.

2. Related Work

2.1. Dubins-Constrained UAV Routing and Classical Methods

The rapid expansion of unmanned aerial vehicle (UAV) applications in military inspection [20], emergency response [21], and urban logistics [2] has created a growing demand for large-scale, constraint-aware path planning [22]. Fixed-wing platforms are particularly suitable for long-range and high-speed missions; however, their nonholonomic motion characteristics, including strict forward-only movement, minimum turning radius, and heading continuity, significantly constrain feasible trajectories [23]. Under such kinematic restrictions, admissible paths take the form of concatenated straight and circular-arc segments, conforming to the Dubins curvature model [6]. Dubins established that the shortest path between two configurations with bounded curvature belongs to one of six optimal families (LSL, RSR, LSR, RSL, RLR, LRL) [24,25], forming the theoretical foundation of fixed-wing motion modeling [26]. The resulting asymmetric cost structure and curvature constraints break the symmetry assumptions of Euclidean metrics and make Dubins variants of TSP/VRP substantially more challenging.

Traditional approaches to Dubins-constrained routing rely on geometric constructions, analytic approximations, or problem decomposition. For example, constant-factor approximation strategies [9] and mixed-integer piecewise linear formulations [11] have been proposed to jointly handle curvature limits, variable target headings, and obstacle constraints within a unified optimization framework. Although classical solvers such as Lin-Kernighan-Helsgaun (LKH) [27] remain highly competitive in Euclidean TSP, their direct extension to Dubins geometry typically requires heavy discretization or substantial reformulation, limiting flexibility and scalability.

Population-based metaheuristics including Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO) have been adapted to Dubins TSP and its neighborhood-based variants (DMTSPN) [13,28]. Enhanced GA frameworks have been used for energy-optimal Dubins routing [29], and improvements of swarm intelligence algorithms have been introduced for multi-UAV coordinated missions under Dubins constraints [30]. Fu [31] further proposed a hierarchical multi-UAV scheme combining clustering, ACO-based ordering, adaptive windows, and Dubins feedback for obstacle-rich environments. Despite their versatility, metaheuristics often suffer from high computational cost, susceptibility to local optima, and the need for feasibility post-processing in large-scale or strongly constrained scenarios.

2.2. Neural Combinatorial Optimization (NCO)

Neural combinatorial optimization, initiated by Pointer Networks [16], has demonstrated strong potential in learning constructive solvers for routing problems. Reinforcement learning-based architectures by Bello et al. [17] and the attention model (AM) of Kool et al. [18,32] have achieved competitive performance on large-scale TSP/VRP without relying on handcrafted heuristics. Matrix-style encoders (MatNet) [33] show that incorporating structured priors such as distance matrices into dual-channel row and column representations can enhance stability and generalization. Heterogeneous attention (HAM) [34] further demonstrates that explicitly encoding structural constraints (e.g., pairing and precedence in PDP) can significantly improve solution quality. Collectively, these results indicate that injecting physical or structural priors into the representation and attention mechanism is often beneficial.

Learning-based UAV planning methods span trajectory generation, obstacle avoidance, and multi-agent assignment [12,35,36], with some recent efforts exploring NCO for constraint-aware graph optimization [37]. For example, Cui et al. [38] studied a Dubins-constrained neighborhood TSP for fixed-wing reconnaissance and proposed a hierarchical DRL framework that separates high-level ordering from local Dubins path computation.

However, existing learning methods seldom incorporate fixed-wing dynamics at the representation level. Merely substituting Euclidean distances with Dubins costs affects only the reward signal, leaving the model architecture agnostic to Dubins geometry. Lacking explicit SE(2) or relative heading encoding, such models struggle to generalize across varying turning radii, problem scales, and task variants (e.g., PCTSP and SDVRP). As confirmed by recent studies [8,39], Euclidean-based schedulers fail to capture Dubins physics unless geometric constraints are injected directly into representation learning.

3. Model Architecture

We propose a Dubins-aware encoder-decoder deep reinforcement learning framework, called Dubins-Aware Neural Combinatorial Optimization (Dubins-Aware NCO), specifically designed for fixed-wing UAV scheduling under explicit heading and minimum turning radius constraints. Unlike conventional solvers that implicitly rely on Euclidean geometry, our method incorporates explicit Dubins geometric priors at the input and representation level, enabling rotational Special Euclidean Group in 2 Dimensions (SE(2)) equivariance, asymmetric distance modeling, and direction-consistent trajectory decoding.

Finally, an autoregressive decoder sequentially constructs the trajectory. It explicitly incorporates SE(2)-aware context and applies dynamic feasibility masking to ensure Dubins-compliant transitions and loop closure at each step.

3.1. Dubins and Relative SE(2) Representation Embedding

We propose a Dual-Channel Embedding framework (Figure 1) that strategically decouples the encoding process into two complementary streams: A Physical Channel, which focuses on the cost magnitude, explicitly modeling the asymmetric effort required to traverse between nodes. A Geometric Channel, which focuses on the spatial structure, capturing relative positions and orientations to ensure SE(2) equivariance.

By processing these two dimensions in parallel, our model can jointly reason about “how expensive a move is” and “how the target is geometrically aligned,” providing a comprehensive state representation.

3.1.1. Physical Channel (Cost Encoding)

The primary goal of this channel is to inject raw kinematic costs into the model. We explicitly compute pairwise Dubins distances to construct an asymmetric distance matrix. Since we start with raw costs rather than rich node features, we adopt the initialization strategy from MatNet [33]. We initialize row nodes with zero vectors to support variable problem sizes, and column nodes with random one-hot vectors to provide instance discriminability. This setup transforms the static cost matrix into dynamic physical row and column embeddings, representing the fundamental traversal difficulty between nodes.

3.1.2. Geometric Channel (Spatial Encoding)

While the physical channel encodes traversal costs, the geometric channel is designed to capture rigorous spatial relationships consistent with the SE(2) group. Instead of using absolute coordinates, which vary with map rotation, we construct a relative representation to ensure generalization. We introduce a four-dimensional feature vector for each edge:

(Δ x, Δ y, cos Δ θ, sin Δ θ) .

(1)

(Δ x, Δ y)

represents the relative translation between two nodes, while

(cos Δ θ, sin Δ θ)

projects the relative heading angle onto the unit circle.

To effectively process these relative states, we leverage complex-valued features. The intuitive justification for this design lies in the natural isomorphism between unit complex numbers and 2D planar rotations.

This approach is theoretically grounded in the classical Dubins solver mechanism. As detailed in Appendix A, the analytical solution for a Dubins path relies on “SE(2) rigid-body normalization,” which transforms the target state into a canonical frame aligned with the source pose. By using complex multiplication, we effectively implement a differentiable version of this normalization within the neural network. This allows the agent to perceive neighbors in a local, egocentric frame (e.g., “in front” or “to the right”) rather than an arbitrary global frame, mirroring the relative distance and angles used in the analytical derivation.

Specifically, by encoding the relative position as

z_{i j}

and the relative heading as a rotation operator

p_{i j}

, we construct the complex edge feature

f_{i j} = z_{i j} \cdot p_{i j}

:

\begin{matrix} z_{i j} = Δ x_{i j} + i Δ y_{i j}, p_{i j} = cos (Δ θ_{i j}) + i sin (Δ θ_{i j}), \\ f_{i j} = (Δ x_{i j} cos (Δ θ_{i j}) - Δ y_{i j} sin (Δ θ_{i j})) + i (Δ x_{i j} sin (Δ θ_{i j}) + Δ y_{i j} cos (Δ θ_{i j})) . \end{matrix}

(2)

Expanding this product yields a geometrically interpretable result, where the real and imaginary parts correspond to projections onto the longitudinal and lateral axes of the heading vector:

\begin{matrix} ℜ (f_{i j}) = Δ x_{i j} cos (Δ θ_{i j}) - Δ y_{i j} sin (Δ θ_{i j}), \\ ℑ (f_{i j}) = Δ x_{i j} sin (Δ θ_{i j}) + Δ y_{i j} cos (Δ θ_{i j}) . \end{matrix}

(3)

Here, the real part captures the distance along the direction of motion (forward/backward), while the imaginary part captures the deviation perpendicular to the heading (left/right). These components are concatenated and projected into the embedding space. This formulation preserves the intrinsic geometric invariants required for Dubins path planning while avoiding the discontinuities often associated with raw angular data.

3.2. Cross-Semantic RoPhE Attention for SO(2) Equivariant Dubins-Aware Modeling

We introduce a Dubins-aware attention layer enhanced with cross-semantic fusion and Rotary Phase Encoding (RoPhE), designed to incorporate the model with the intrinsic capability of Special Orthogonal group in 2 dimensions (SO(2))-equivariant geometry and Dubins kinematic constraints at the attention computation stage (see Figure 2a). Unlike conventional Transformers that operate on Euclidean distances or absolute coordinates, we construct dual-channel row and column embeddings from the physical Dubins distance matrix and relative SE(2) geometric features, and perform bidirectional cross fusion between physical rows/columns and geometric rows/columns. This enables the model to simultaneously encode heading constraints directional biases, which are indispensable in fixed-wing UAV path planning.

To further endow the attention mechanism with rotational equivariance, we inject RoPhE into both the query and key projections, following the theoretical formulation provided in Appendix B. This ensures that the attention score

q^{T} k

no longer depends on absolute headings but solely on the relative heading difference

Δ θ

between nodes, thereby remaining equivariant under any global rotation

θ \mapsto θ + ϕ

. This property mirrors the Dubins path computation, which likewise depends only on the relative angles

α

and

β

rather than their absolute values (see Appendix A).

q \to R_{θ} q, k \to R_{β} k, q^{T} R_{θ}^{T} R_{β} k = q^{T} R_{β - θ} k,

(4)

where

R_{θ}

is the canonical SO(2) rotation matrix. This theoretical property ensures that the attention mechanism is intrinsically aligned with the geometric invariance principle required by Dubins path feasibility.

From a theoretical standpoint, the SE(2)-equivariance and relative-angle dependency inherent to Dubins paths are closely corresponds to the SO(2) representation encoded by RoPhE within the attention mechanism. The symmetry principles of the two systems correspond one-to-one, enabling RoPhE to reliably capture the directional asymmetry and curvature-induced structure of Dubins costs without requiring any additional normalization. This yields a more faithful alignment with the true operational constraints of fixed-wing UAVs governed by heading–curvature coupling.

Moreover, we introduce a Distance-Augmented Mixed-Score attention (DAMS-Attn) module to jointly encode semantic and physical feasibility:

{score}_{i j} = W_{2} ReLU (W_{1} [{dot}_{i j}, D_{i j}] + b_{1}) + b_{2},

(5)

where

D_{i j}

encodes the asymmetric Dubins motion cost. The resulting score, applied before Softmax, allows the model to explicitly bias its attention toward trajectories that are not only semantically plausible but also dynamically executable in real UAV flight.

3.3. Trajectory-Autoregressive Decoding

After processing by the encoder equipped with Dubins SE(2) multi-channel embedding and RoPhE-based rotationally equivariant encoding, the model has obtained a heading-aware SE(2)-consistent representation that explicitly captures the asymmetry and directional bias of Dubins path costs.

We then construct an autoregressive decoding module to sequentially generate a closed and physically executable UAV trajectory. To ensure transparency and reproducibility, we formalize the decoding process as a causal sequence of masking and selection operations.

Following the attention-based sequential decision paradigm [18], at time step t, the decoder maintains a dynamic state comprising the partially constructed tour

π_{1 : t - 1}

and the global geometric context. The context embedding

h_{(c)}

is updated as:

h_{(c)} = \{\begin{matrix} [\bar{h}, h_{π_{t - 1}}, h_{π_{1}}] & t > 1, \\ [\bar{h}, h_{0}, h_{0}] & t = 1 . \end{matrix}

(6)

where

[\cdot, \cdot, \cdot]

denotes vector concatenation. Notably,

h_{π_{1}}

explicitly encodes the closure requirement and departure heading, ensuring that the decoder accounts for the Dubins-consistent return-to-depot constraint from the very first step.

Next, we compute the raw compatibility score (logit) for each candidate node j using the query-key mechanism:

{\hat{u}}_{c j} = C \cdot tanh (\frac{q_{c}^{T} k_{j}}{\sqrt{d_{k}}}),

(7)

where C is a scaling factor (typically 10) to control the entropy of the policy.

Feasibility Masking: To strictly guarantee the validity of the combinatorial solution, we apply a hard feasibility mask

M_{t, j}

to the logits. Let

V_{t - 1} = {π_{1}, \dots, π_{t - 1}}

denote the set of visited nodes at step t. The masked logits

u_{c j}

are defined as:

u_{c j} = {\hat{u}}_{c j} + M_{t, j}, where M_{t, j} = \{\begin{matrix} 0 & if j \notin V_{t - 1}, \\ - \infty & otherwise . \end{matrix}

(8)

This masking logic enforces the constraint that each node must be visited exactly once. While the mask ensures combinatorial validity, the kinematic feasibility (i.e., selecting nodes that do not require sharp turns violating

R_{min}

) is implicitly handled by the learned attention weights

{\hat{u}}_{c j}

, which assign low probabilities to geometrically unfavorable transitions based on the Dubins-SE(2) embeddings.

Finally, a Softmax is applied to the masked logits to produce the normalized selection probability:

p_{i} = p_{θ} (π_{t} = i ∣ s, π_{1 : t - 1}) = \frac{e^{u_{c i}}}{\sum_{j} e^{u_{c j}}} .

(9)

This formulation ensures that the decoding process is strictly causal, as the mask

M_{t, j}

depends solely on the history

π_{1 : t - 1}

, guaranteeing reproducible trajectory generation.

4. Experiments

This section presents experiments evaluating the performance and generalization ability of the proposed model for UAV mission scheduling and path planning. The experimental design is structured to validate the effectiveness, robustness, and adaptability of the model under varying operational conditions.

First, we conduct sensitivity analyses to determine the optimal hyperparameter configuration, ensuring training stability and robust performance. Subsequently, the proposed model is benchmarked against classical metaheuristics, including Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Discrete Particle Swarm Optimization (DPSO), as well as state-of-the-art neural baselines such as the Attention Model (AM), MatNet, and the Heterogeneous Attention Mechanism (HAM) adapted for Pickup and Delivery Problems. The evaluation encompasses three canonical routing formulations: the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Pickup and Delivery Problem (PDP). Each formulation is extended to incorporate explicit heading constraints, simulating the kinematic characteristics of fixed-wing UAVs. We refer to these variants as Heading-Constrained TSP (HC-TSP), Heading-Constrained CVRP (HC-CVRP), and Heading-Constrained PDP (HC-PDP), respectively. Experiments are performed on instances of varying scales (10, 20, 36, and 52 nodes) to assess scalability and computational efficiency. Additionally, ablation studies are conducted to isolate the contribution of specific network components and the proposed attention mechanism.

To further assess generalization capabilities, we introduce two complementary experimental settings. The first evaluates model performance under varying minimum turning radii, reflecting the maneuverability limits of fixed-wing UAVs. The second explores zero-shot generalization to extended combinatorial formulations, testing whether the learned policy transfers effectively across distinct optimization landscapes.

4.1. Experimental Setup

During training, following prior studies, we generated all training samples (including depots and task nodes) dynamically and independently using a uniform 2D distribution within the range

[0, 1]

. The Dubins distance metric was employed to compute the distance between any two nodes. Each evaluated model was trained for 200 epochs, with 1,000,000 instances per epoch. For validation, a fixed set of 100,000 randomly sampled instances was used to monitor performance. The training process was based on the REINFORCE algorithm with rollout baselines, ensuring stable convergence and reliable policy gradient estimation.

All training computations were accelerated using two NVIDIA RTX 4090 GPUs (NVIDIA Corporation, Santa Clara, CA, USA) paired with an AMD EPYC 9534 64-core CPU (Advanced Micro Devices, Inc., Santa Clara, CA, USA). To ensure fair comparison, all benchmark and baseline comparisons were executed on the CPU environment. The experimental framework was implemented in Python 3.9, using PyTorch 2.6 as the core deep learning library.

The computational complexity of the proposed framework consists of two phases: pairwise Dubins distance calculation and neural network inference. Pairwise Dubins Calculation: Computing the asymmetric Dubins cost matrix requires evaluating the shortest path among six Dubins words for all

N (N - 1)

pairs. Since the analytical solution for a single pair is computed in constant time

O (1)

, the total complexity scales as

O (N^{2})

. While scaling quadratically, this step is fully differentiable and parallelizable. On modern GPUs, calculating a batch of

100 \times 100

Dubins matrices takes negligible time (milliseconds) due to tensorized operations.

The encoder utilizes self-attention mechanisms with a time complexity of

O (N^{2})

per layer. The autoregressive decoder generates the sequence in N steps, with each step performing attention over the graph, resulting in an overall inference complexity of

O (N^{2})

.

Compared to exact methods with exponential complexity

(O (2^{N}))

or iterative heuristics (e.g., LKH) that often require CPU-bound sequential iterations, our framework’s

O (N^{2})

complexity benefits significantly from GPU acceleration, making it highly efficient for the problem scales considered (up to

N = 100

) and viable for real-time applications.

Model hyperparameters, including optimizer type, learning rate, batch size, number of greedy rollout initializations, number of attention heads, and number of attention layers, were tuned through systematic sensitivity analysis to determine the most robust configuration for multi-UAV scheduling tasks. As the primary focus of this study is deep reinforcement learning for mission planning rather than heuristic algorithm engineering, the parameters for exact and heuristic baselines were directly adopted from the open-source library in [40]. This setup guarantees a fair and consistent comparison while minimizing variability due to manual tuning.

4.2. Sensitivity Analysis of Model Hyperparameters

In this sensitivity study, we conducted systematic architectural analyses to evaluate the robustness and sensitivity of the proposed solver by varying key hyperparameters in the Transformer backbone (the number of attention layers and attention heads). To ensure statistical reliability, each configuration was independently trained under six different random seeds and evaluated on ten randomly generated test sets. All other hyperparameters were held constant to guarantee that performance variations could be attributed solely to the investigated hyperparameter. The full experimental setup, including attention depth, number of heads, random seeds, and task scales, is summarized in Table 1.

As shown in Table 2, the results show a highly consistent trend across all runs, indicating good repeatability and statistical significance. This design balances model depth and representational capacity and provides a unified foundation for analyzing the three heading-constrained routing tasks (HC-TSP, HC-CVRP, and HC-PDP).

From the results on attention heads, we observe that the 8-head configuration delivers the best performance on small-scale instances (10 and 20 nodes), with the 16-head counterpart ranking second. However, as the task scale increases to 36 nodes and beyond, the 16-head model consistently outperforms all other settings across HC-TSP, HC-CVRP, and HC-PDP. On the largest test size (52 nodes), the 16-head configuration achieves mean scores of

- 10.04

,

- 11.80

, and

- 13.32

, representing improvements of

10.28 %

,

6.35 %

, and

7.50 %

over the 4-head baseline, respectively. This trend indicates that a larger number of heads enhances the model’s ability to capture multi-scale relational structures and long-range spatial dependencies in complex Dubins-constrained routing scenarios.

A similar observation emerges for attention depth. On small-scale problems, 4-layer and 5-layer configurations exhibit comparable performance and stability. However, once the task scale reaches medium to large size (≥36 nodes), the 5-layer model consistently yields the best mean performance across all evaluated tasks. Increasing depth further to 6-layers does not lead to performance gains and even causes slight degradation in some cases likely due to overfitting or the insufficient structural complexity of the current problem scale to benefit from deeper networks.

To provide a clearer view of performance distribution and robustness, we additionally report the corresponding boxplot results in Appendix C. These visualizations not only display the median, interquartile range (IQR), and outlier behavior but also reveal the sensitivity of each configuration to performance fluctuations. The results show that the 16-head/5-layer configuration achieves the most favorable trade-off across mean performance, variance, and stability. This configuration is thus selected as the optimal backbone for all subsequent scaling and generalization experiments.

4.3. Comparative Experiment

After identifying the best-performing architecture (16 attention heads and 5-layers), we conduct a systematic comparison against three families of methods: classical population-based heuristics (GA, ACO, DPSO), neural combinatorial baselines (AM, MatNet), and the hetero-attention model (HA) specifically designed for PDP. All methods use Dubins distance as the reward signal (i.e., negative cost), and are evaluated on a fixed test set of 100,000 randomly sampled instances with identical random seeds for strict fairness. For MatNet, we follow its original formulation by replacing its Euclidean distance matrix with a pairwise Dubins distance matrix to eliminate input-level structural bias.

As shown in Table 3, ACO slightly dominates on 10-node instances in terms of cost, but once the problem scale exceeds 20 nodes, all neural combinatorial optimization models consistently surpass classical heuristics and the advantage widens dramatically with growing problem size. On the largest scale (52 nodes), our method outperforms ACO by 21.97%, 19.28%, and 13.35% on HC-TSP, HC-CVRP, and HC-PDP respectively, while being three orders of magnitude faster in inference, making it far more suitable for real-time UAV deployment.

Compared with strong DRL baselines, our model achieves the best cost across all tasks and remains on the same order of inference latency, being only slightly slower than AM on HC-PDP. We attribute this superiority to the explicit injection of SE(2)-equivariant structure and physical priors, which eliminates the need for the model to implicitly infer motion geometry from scratch. Consequently, our approach exhibits stronger structural generalization and safety compliance under increasing motion constraints, making it more suitable for real-world fixed-wing UAV scheduling.

4.4. Ablation Experiments

To quantify the contribution the contribution of each proposed component, we conduct a series of ablation studies in which we systematically remove key modules from the full architecture, specifically, the Dubins-SE(2) embedding layer, the Cross-Semantic Fusion module, the Rotary Phase Encoding mechanism, and the DAMS-Attn. In each ablation setting, the remaining architecture is kept unchanged to ensure that any performance variation is solely attributable to the absence of the targeted component. Experiments are performed on three heading-constrained combinatorial problems (HC-TSP, HC-CVRP, and HC-PDP) across four instance scales (10, 20, 36, and 52 nodes).

As shown in Table 4, the full model consistently outperforms all ablated variants, confirming that the combination of modules is essential for handling the coupled constraints. Critically, the specific performance degradation observed in each variant reveals the impact of the corresponding module.

The ablation results reveal how specific architectural choices drive the solver’s efficacy. First, the most substantial performance drop stems from removing the Dubins-SE(2) embedding. This degradation occurs because, in the absence of explicit kinematic priors, the network regresses to a quasi-Euclidean policy. It prioritizes spatial proximity while failing to account for the asymmetric turning costs, resulting in theoretically short but dynamically expensive paths. Second, the omission of Rotary Phase Encoding noticeably impairs generalization in larger instances. This confirms that without SO(2) equivariance, the attention mechanism relies on absolute headings rather than relative geometric configurations, making it brittle to rotational variations. Finally, the performance gap observed when disabling the Cross-Semantic Fusion and DAMS-Attn modules underscores the risk of decoupling logical routing from motion planning. Without these fusion mechanisms, the solver struggles to weigh high-value targets against their traversal penalties.

In summary, these ablation results demonstrate that the performance gains are not merely due to increased model capacity, but stem directly from the specialized architectural designs that align the learning process with the physical reality of fixed-wing flight.

4.5. Generalization Experiment

To rigorously evaluate the generalization capability of our method under unseen kinematic constraints and task semantics, we conduct experiments along two distinct dimensions. First, we vary the minimum turning radius of the UAV to assess the model’s adaptability to different Dubins-style motion constraints. Second, we extend the evaluation to two out-of-distribution tasks, the Prize-Collecting Traveling Salesman Problem (PCTSP) and the Split Delivery Vehicle Routing Problem (SDVRP), to examine their robustness to task-structure shifts. All experiments are performed using exactly the same model and hyperparameter settings as in training, ensuring that the observed performance differences arise from true structural generalization rather than from any form of re-training or re-tuning.

As shown in Table 5, our method consistently outperforms the MatNet baseline across all turning radii and instance scales, achieving an average cost reduction of approximately 3.5% to 4.0%. Although the absolute cost naturally increases as the turning radius becomes larger (i.e., the maneuver constraints become tighter), the relative improvement over the baseline remains stable, with about a 3% advantage even at the most constrained and largest-scale settings (52 nodes). A similar pattern is observed for the extended tasks: the proposed method surpasses MatNet across all scales on both PCTSP and SDVRP, with particularly strong gains on mid-to-large instances (36 and 52 nodes), demonstrating robust semantic generalization beyond the training domains.

These results confirm that the embedded geometric inductive biases enable stable adaptation to shifts in both kinematic parameters and problem semantics.

4.6. Visual Analysis of Dubins Trajectories

To intuitively validate the effectiveness of the proposed method, we conduct a visualization analysis on HC-TSP instances across two scales (20 and 52 nodes). We compare the trajectories generated by MatNet and our Dubins-Aware NCO. For visual clarity and to highlight the impact of kinematic constraints, the minimum turning radius is set to

R_{min} = 0.03

. For each scale, two representative instances are selected to demonstrate performance differences and analyze the underlying causes.

Figure 3 and Figure 4 illustrate the generated paths. In these visualizations, squares denote the depot, while circles represent task nodes annotated with their required headings. Black numbers indicate the node index, and blue numbers along the edges indicate the sequence of the route segments.

4.6.1. Analysis of Small-Scale Instances (20 Nodes)

As shown in Figure 3, the Dubins-Aware NCO demonstrates a significant advantage in total path cost. Compared to MatNet, which yields costs of 6.18946 and 5.63673 for the two instances, our model achieves 5.37563 and 5.30065, representing reductions of 13.15% and 5.96%, respectively.

Observing the trajectory topology reveals the root of this improvement. MatNet tends to plan paths based primarily on Euclidean proximity, ignoring the heading-induced turning costs. This results in “zigzag” sequences where the UAV must perform sharp, high-cost maneuvers to align with the target heading. In contrast, our model effectively balances spatial distance with turning penalties:

In Instance 1, although the traversal order in Figure 3b (Ours) is roughly similar to Figure 3a (MatNet), our model intelligently reverses the global direction. This strategic choice avoids the continuous, sharp turns required by the MatNet solution at nodes {7, 16, 12, 5}, resulting in a smoother overall envelope.
In Instance 2, the Dubins-Aware NCO (Figure 3d) selects superior entry angles for transitions such as (5 → 17), (10 → 12), and (2 → 3 → 11). MatNet, lacking perception of the turning cost, generates Euclidean-shortest edges that are kinematically expensive to execute.

4.6.2. Analysis of Large-Scale Instances (52 Nodes)

When the problem scale increases to 52 nodes (Figure 4), the limitations of the baseline become more pronounced. MatNet generates trajectories characterized by frequent looping maneuvers (e.g., full circles required to adjust headings), as it attempts to force-fit a Euclidean sequence onto nonholonomic constraints.

Conversely, Dubins-Aware NCO optimizes the visitation order to minimize these mandatory loops. For example, in Instance 3, the path segment (33 → 47) generated by our model (Figure 4b) is significantly smoother than that of MatNet (Figure 4a), which requires a complex spiral to satisfy the arrival heading. Similar improvements are observed in the (48 → 52) segment.

These results visually confirm that our model possesses high sensitivity to heading constraints, enabling it to trade off slightly longer Euclidean distances for substantially lower Dubins maneuvering costs, yielding globally efficient solutions.

5. Conclusions

This work presents a Dubins-aware neural solver for fixed-wing UAV task scheduling, effectively bridging the gap between data-driven neural combinatorial optimization and the physical realities of nonholonomic deployment. In contrast to prior methods that rely on naive Euclidean approximations or reward-level adaptations, our framework explicitly internalizes Dubins geometry and SE(2) symmetry directly into the model architecture. Our design ensures that the learned policy is intrinsically rotation-equivariant and curvature-feasible, avoiding the pitfalls of naive Euclidean approximations.

Extensive experiments across heading-constrained TSP, CVRP, and PDP demonstrate that this geometry-physics-consistent embedding is not merely a superficial enhancement but a necessity for operational robustness. The solver consistently outperforms classical heuristics and state-of-the-art NCO baselines, exhibiting strong scalability across varying problem sizes (up to 100 nodes) and differing turning radii. By jointly optimizing logical routing and kinematic feasibility, the proposed approach establishes a principled foundation for real-world fixed-wing autonomy, offering a scalable alternative to computationally expensive exact methods.

6. Limitations and Future Directions

While the proposed framework demonstrates robust performance across a wide range of scenarios, it is essential to acknowledge its limitations and potential failure modes, which open avenues for future research.

The primary limitation arises in scenarios with extreme geometric density relative to the UAV’s maneuverability. When the Euclidean distance between target nodes is significantly smaller than the minimum turning radius (

d_{i j} ≪ 2 R_{min}

), the UAV is forced to perform complex, long-horizon looping maneuvers to adjust its heading. In these “adversarial” geometric configurations, the coupling between the visitation sequence and the precise entry angle becomes extremely strong. Our solver, while generally robust, may experience performance degradation or suboptimal convergence in such highly dense, constrained environments compared to exact trajectory planners.

Looking ahead, several meaningful extensions can further advance this paradigm:

Obstacle Avoidance and No-Fly Zones (NFZs): A critical extension is to incorporate non-convex environmental constraints. Real-world missions often involve restricted airspaces. Future work will explore integrating obstacle-aware attention masking or differentiable safety layers to handle No-Fly Zones explicitly within the NCO framework.
Multi-UAV Coordination: The inductive-bias-oriented design naturally extends to multi-agent settings. Adapting the encoder-decoder architecture to handle decentralized coordination for large-scale swarms remains a promising direction.

Author Contributions

Conceptualization, J.G. and Y.W.; methodology, J.G.; software, L.J.; validation, Y.W., J.G. and H.S.; formal analysis, H.S.; investigation, J.Z. and Y.W.; resources, Y.W.; data curation, J.G.; writing—original draft preparation, J.G. and H.S.; writing—review and editing, L.J. and H.S.; visualization, J.G.; supervision, J.Z.; project administration, L.J. and J.G.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Innovation Program for Doctoral Students of Xinjiang University under Grant [XJU2024BS091].

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Analytical Characterization of Dubins Path Properties

Dubins curves provide the analytic solution to the shortest-path problem under “forward-only, curvature-constrained” nonholonomic motion models [24,25]. For fixed-wing UAVs, the minimum turning radius is approximated by

R_{min} = \frac{V}{\sqrt{g^{2} - a_{max}^{2}}}

, where V is the cruising speed and

a_{max}

is the maximum lateral acceleration. This constraint prohibits instantaneous heading reversal, requiring all feasible trajectories to be composed of straight segments and circular arcs with radii of at least

R_{min}

.

Dubins established that, given a start pose

P_{s} : (x_{s}, y_{s}, θ_{s})

, an end pose

P_{t} : (x_{t}, y_{t}, θ_{t})

, and turning radius R, the globally optimal path must belong to one of the following six primitive families:

{L S L, R S R, L S R, R S L, L R L, R L R} .

(A1)

Corresponding to the six configurations illustrated in Figure A1, the letters L, R, and S denote a left turn, a right turn, and a straight segment, respectively. Consequently, a Dubins path is essentially a closed-form concatenation of these geometric primitives.

Figure A1. A graphical representation of the six types of Dubins curves.

To compute the Dubins distance between two nodes, the classical approach first applies SE(2) rigid-body normalization, translating

P_{s}

to the origin and rotating the frame such that

P_{t}

lies on the positive x-axis:

P_{s} \mapsto (0, 0, α), P_{t} \mapsto (d, 0, β),

(A2)

where:

\begin{matrix} α = \mod (θ_{s} - atan2 (y_{t} - y_{s}, x_{t} - x_{s}), 2 π) \\ d = \sqrt{{(x_{t} - x_{s})}^{2} + {(y_{t} - y_{s})}^{2}} / R \\ β = \mod (θ_{t} - atan2 (y_{t} - y_{s}, x_{t} - x_{s}), 2 π) \end{matrix}

(A3)

The entire normalization procedure relies on translation-rotation equivariance: global translations or rotations alter neither the optimal Dubins path type nor its length. Consequently, the analytical computation depends solely on the relative distance d and relative headings

α, β

. The resulting six closed-form solutions, providing segment lengths

(t, p, q)

and the total path length, are summarized in Table A1.

Table A1. Length formulas for the six types of Dubins paths (assuming unit turning radius). The three segments are denoted by

t, p, q

, respectively. All modulo operations are

2 π

.

Table A1. Length formulas for the six types of Dubins paths (assuming unit turning radius). The three segments are denoted by

t, p, q

, respectively. All modulo operations are

2 π

.

Category	Equation Set	Solution Formula
LSL	$\begin{matrix} p cos (α + t) - sin α + sin β = d \\ p sin (α + t) + cos α - cos β = 0 \\ α + t + q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{l s l} & = - α + arctan (\frac{cos β - cos α}{d + sin α - sin β}) {\mod 2 π}, \\ p_{l s l} & = \sqrt{d^{2} + 2 - 2 cos (α - β) + 2 d (sin α - sin β)}, \\ q_{l s l} & = β - arctan (\frac{cos β - cos α}{d + sin α - sin β}) {\mod 2 π}, \\ L_{l s l} & = t_{l s l} + p_{l s l} + q_{l s l} = - α + β + p_{l s l} . \end{matrix}$
RSR	$\begin{matrix} p cos (α - t) + sin α - sin β = d \\ p sin (α - t) - cos α + cos β = 0 \\ α - t - q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{r s r} & = α - arctan (\frac{cos α - cos β}{d - sin α + sin β}) {\mod 2 π}, \\ p_{r s r} & = \sqrt{d^{2} + 2 - 2 cos (α - β) + 2 d (sin β - sin α)}, \\ q_{r s r} & = - β (\mod 2 π) + arctan (\frac{cos α - cos β}{d - sin α + sin β}) \\ {\mod 2 π}, \\ L_{r s r} & = t_{r s r} + p_{r s r} + q_{r s r} = α - β + p_{r s r} . \end{matrix}$
LSR	$\begin{matrix} p cos (α + t) + 2 sin (α + t) - sin α - sin β = d \\ p sin (α + t) - 2 cos (α + t) + cos α + cos β = 0 \\ α + t - q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{l s r} & = - α + arctan (\frac{- cos α - cos β}{d + sin α + sin β}) \\ - arctan (\frac{2}{p_{r s l}}) {\mod 2 π}, \\ p_{l s r} & = \sqrt{d^{2} - 2 + 2 cos (α - β) + 2 d (sin β + sin α)}, \\ q_{l s r} & = - β (\mod 2 π) + arctan (\frac{- cos α - cos β}{d + sin α + sin β}) \\ - arctan (\frac{2}{p_{l s r}}) {\mod 2 π}, \\ L_{l s r} & = t_{l s r} + p_{l s r} + q_{l s r} = α - β + 2 t_{l s r} + p_{l s r} . \end{matrix}$
RSL	$\begin{matrix} p cos (α - t) - 2 sin (α - t) + sin α + sin β = d \\ p sin (α - t) + 2 cos (α - t) - cos α - cos β = 0 \\ α - t + q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{r s l} & = α - arctan (\frac{cos α + cos β}{d - sin α - sin β}) \\ + arctan (\frac{2}{p_{r s l}}) {\mod 2 π}, \\ p_{r s l} & = \sqrt{d^{2} - 2 + 2 cos (α - β) - 2 d (sin α + sin β)}, \\ q_{r s l} & = β (\mod 2 π) - arctan (\frac{cos α + cos β}{d - sin α - sin β}) \\ + arctan (\frac{2}{p_{r s l}}) {\mod 2 π}, \\ L_{r s l} & = t_{r s l} + p_{r s l} + q_{r s l} = - α + β + 2 t_{r s l} + p_{r s l} . \end{matrix}$
LRL	$\begin{matrix} - 2 sin (α + t - p) + 2 sin (α + t) = d + sin α - sin β \\ 2 cos (α + t - p) - 2 cos (α + t) = - cos α + cos β \\ α + t - p + q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{l r l} & = - α + arctan (\frac{- cos α + cos β}{d + sin α - sin β}) \\ + \frac{p_{l r l}}{2} {\mod 2 π}, \\ p_{l r l} & = arccos (\frac{1}{8} (- d^{2} + 6 + 2 cos (α - β) \\ + 2 d (sin α - sin β))) {\mod 2 π}, \\ q_{l r l} & = β (\mod 2 π) - α + 2 p_{l r l} {\mod 2 π}, \\ L_{l r l} & = t_{l r l} + p_{l r l} + q_{l r l} = - α + β + 2 p_{l r l} . \end{matrix}$
RLR	$\begin{matrix} 2 sin (α - t + p) - 2 sin (α - t) = d - sin α + sin β \\ - 2 cos (α - t + p) + 2 cos (α - t) = cos α - cos β \\ α - t + p - q = β {\mod 2 π} \end{matrix}$	$\begin{matrix} t_{r l r} & = α - arctan (\frac{cos α + cos β}{d - sin α + sin β}) \\ + \frac{p_{r l r}}{2} {\mod 2 π}, \\ p_{r l r} & = arccos (\frac{1}{8} (- d^{2} + 6 + 2 cos (α - β) \\ + 2 d (sin α - sin β))), \\ q_{r l r} & = α - β - t_{r l r} + p_{r l r} {\mod 2 π}, \\ L_{r l r} & = t_{r l r} + p_{r l r} + q_{r l r} = α - β + 2 p_{r l r} . \end{matrix}$

Appendix B. Derivation of the Rotational Equivariance of RoPhE

Consider the standard attention score between node i and node j:

u_{i j} = 〈q_{i}, k_{j}〉

(A4)

To incorporate heading information, RoPhE modulates the query and key vectors using 2D rotation matrices

R (θ)

. The resulting attention score is given by:

u_{i j} = 〈R (θ_{i}) {\hat{q}}_{i}, R (θ_{j}) {\hat{k}}_{j}〉 = {\hat{q}}_{i}^{⊤} R {(θ_{i})}^{⊤} R (θ_{j}) {\hat{k}}_{j}

(A5)

The standard 2D rotation matrix is defined as:

R (θ) = (\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix})

(A6)

This mapping forms a group homomorphism from the additive group of real numbers

R

to the special orthogonal group

SO (2)

, satisfying the following properties:

R (α + β) = R (α) R (β), R {(α)}^{⊤} = R {(α)}^{- 1} = R (- α) .

(A7)

Consequently, the product of the transposed rotation matrix and the rotation matrix simplifies to:

R {(θ_{i})}^{⊤} R (θ_{j}) = R (- θ_{i}) R (θ_{j}) = R (θ_{j} - θ_{i}) .

(A8)

Substituting this back into the attention score equation yields:

u_{i j} = {\hat{q}}_{i}^{⊤} R (θ_{j} - θ_{i}) {\hat{k}}_{j} .

(A9)

This demonstrates that the RoPhE-based attention score depends solely on the relative heading difference

Δ θ_{i j} = θ_{j} - θ_{i}

, and is independent of the absolute heading.

Now, consider a global rotation where all headings are uniformly rotated by a constant angle

ϕ

:

θ_{i} \mapsto θ_{i} + ϕ, θ_{j} \mapsto θ_{j} + ϕ .

(A10)

The transformed attention score becomes:

u_{i j}^{'} = {\hat{q}}_{i}^{⊤} R ((θ_{j} + ϕ) - (θ_{i} + ϕ)) {\hat{k}}_{j} = {\hat{q}}_{i}^{⊤} R (θ_{j} - θ_{i}) {\hat{k}}_{j} = u_{i j} .

(A11)

Since the attention scores remain invariant under global rotation (

u_{i j}^{'} = u_{i j}

), the learned relative relationships are preserved. This explicitly verifies that the RoPhE mechanism maintains strict SE(2) equivariance for the overall model.

Appendix C. Boxplot Analysis of Hyperparameter Sensitivity

In the box plots illustrating the attention head experiments shown in Figure A2, the distribution patterns for small-scale tasks (10 and 20 nodes) indicate that both HC-TSP and HC-PDP achieve optimal mean and median performance with 8 attention heads. This configuration is characterized by tightly concentrated interquartile ranges (IQRs) and minimal outliers, suggesting strong consistency at small problem sizes. By contrast, HC-CVRP achieves a slightly better mean with 16 heads, although its distribution is comparatively more dispersed, indicating weaker robustness. As the task size increases to 36 and 52 nodes, the 16-head configuration exhibits increasingly symmetric and Gaussian-like distributions, with the median closely aligned with the mean and noticeably fewer outliers. This behavior implies a significant reduction in sensitivity to random seed perturbations, identifying 16 heads as the most favorable configuration for large-scale instances.

Figure A2. Boxplot analysis of attention head sensitivity across different problem scales for HC-TSP, HC-CVRP, and HC-PDP.

Regarding the number of attention layers, the box plots in Figure A3 reveal that models with 4 and 5 layers exhibit comparable performance and maintain stability across small-scale tasks. However, once the task size reaches medium-to-large scales (≥36 nodes), the 5-layer configuration yields both superior mean performance and more compact boxplots with narrower IQRs and fewer outliers, indicating enhanced stability and generalization. In contrast, the 6-layer model shows increased variance and broader distributions across multiple tasks, suggesting that excessive depth introduces a risk of overfitting rather than performance gains.

In summary, the configuration of 16 attention heads and 5 attention layers strikes the optimal balance between performance and stability across task scales, demonstrating strong robustness and minimal sensitivity to random initialization. Consequently, this configuration is adopted as the default setting for all subsequent experiments.

Figure A3. Boxplot analysis of attention layer sensitivity across different problem scales for HC-TSP, HC-CVRP, and HC-PDP.

References

Aggarwal, S.; Kumar, N. Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Comput. Commun. 2020, 149, 270–299. [Google Scholar] [CrossRef]
Ait Saadi, A.; Soukane, A.; Meraihi, Y.; Benmessaoud Gabis, A.; Mirjalili, S.; Ramdane-Cherif, A. UAV path planning using optimization approaches: A survey. Arch. Comput. Methods Eng. 2022, 29, 4233–4284. [Google Scholar] [CrossRef]
Gao, J.; Jia, L.; Kuang, M.; Shi, H.; Zhu, J. An End-to-End Solution for Large-Scale Multi-UAV Mission Path Planning. Drones 2025, 9, 418. [Google Scholar] [CrossRef]
Li, H.; Dai, Y.; Qiu, Z.; Guo, Y.; Cheng, Q.; Zhang, M.; Liao, D. Fixed-wing UAVs Coverage Path Planning Based on Turning Span Selection. IEEE Internet Things J. 2025, 12, 9476–9490. [Google Scholar] [CrossRef]
Zhuang, X.; Li, D.; Wang, Y.; Liu, X.; Li, H. Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning. Aerosp. Sci. Technol. 2024, 148, 109089. [Google Scholar] [CrossRef]
Ding, Y.; Xin, B.; Dou, L.; Chen, J.; Chen, B.M. A Memetic Algorithm for Curvature-Constrained Path Planning of Messenger UAV in Air-Ground Coordination. IEEE Trans. Autom. Sci. Eng. 2022, 19, 3735–3749. [Google Scholar] [CrossRef]
Qian, L.; Lo, Y.L.; Liu, H.H. A path planning algorithm for a crop monitoring fixed-wing unmanned aerial system. Sci. China Inf. Sci. 2024, 67, 180201. [Google Scholar] [CrossRef]
Kumar, P.; Pal, K.; Govil, M.C. Comprehensive review of path planning techniques for unmanned aerial vehicles (uavs). ACM Comput. Surv. 2025, 58, 1–44. [Google Scholar] [CrossRef]
Savla, K.; Frazzoli, E.; Bullo, F. Traveling salesperson problems for the Dubins vehicle. IEEE Trans. Autom. Control 2008, 53, 1378–1391. [Google Scholar] [CrossRef]
Liu, C.; Lu, Y.; Xie, F.; Ji, T.; Zheng, Y. Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints. arXiv 2025, arXiv:2506.02365. [Google Scholar]
Zhou, X.; Li, L.; Zhang, X.; Gao, H.; Yao, K.; Xu, X. A Unified and Quality-Guaranteed Approach for Dubins Vehicle Path Planning With Obstacle Avoidance and Curvature Constraint. IEEE Trans. Intell. Transp. Syst. 2025, 26, 15219–15235. [Google Scholar] [CrossRef]
Du, Z.; Luo, C.; Min, G.; Wu, J.; Luo, C.; Pu, J.; Li, S. A Survey on Autonomous and Intelligent Swarms of Uncrewed Aerial Vehicles (UAVs). IEEE Trans. Intell. Transp. Syst. 2025, 26, 14477–14500. [Google Scholar] [CrossRef]
Gao, C.; Zhen, Z.; Gong, H. A self-organized search and attack algorithm for multiple unmanned aerial vehicles. Aerosp. Sci. Technol. 2016, 54, 229–240. [Google Scholar] [CrossRef]
Wu, W.; Xu, J.; Sun, Y. Integrate Assignment of Multiple Heterogeneous Unmanned Aerial Vehicles Performing Dynamic Disaster Inspection and Validation Task With Dubins Path. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 4018–4032. [Google Scholar] [CrossRef]
Qi, Y.; Jiang, H.; Huang, G.; Yang, L.; Wang, F.; Xu, Y. Multi-UAV path planning considering multiple energy consumptions via an improved bee foraging learning particle swarm optimization algorithm. Sci. Rep. 2025, 15, 14755. [Google Scholar] [CrossRef] [PubMed]
Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
Kool, W.; van Hoof, H.; Welling, M. Attention, Learn to Solve Routing Problems! In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Hua, C.; Berto, F.; Son, J.; Kang, S.; Kwon, C.; Park, J. CAMP: Collaborative Attention Model with Profiles for Vehicle Routing Problems. In Proceedings of the 2025 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Detroit, MI, USA, 19–23 May 2025; Available online: https://github.com/ai4co/camp (accessed on 23 October 2025).
Hu, M.; Liu, W.; Peng, K.; Ma, X.; Cheng, W.; Liu, J.; Li, B. Joint routing and scheduling for vehicle-assisted multidrone surveillance. IEEE Internet Things J. 2018, 6, 1781–1790. [Google Scholar] [CrossRef]
Calamoneri, T.; Corò, F.; Mancini, S. Management of a post-disaster emergency scenario through unmanned aerial vehicles: Multi-depot multi-trip vehicle routing with total completion time minimization. Expert Syst. Appl. 2024, 251, 123766. [Google Scholar] [CrossRef]
Pasha, J.; Elmi, Z.; Purkayastha, S.; Fathollahi-Fard, A.M.; Ge, Y.E.; Lau, Y.Y.; Dulebenets, M.A. The drone scheduling problem: A systematic state-of-the-art review. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14224–14247. [Google Scholar] [CrossRef]
Gao, J.; Kuang, M.; Shi, H.; Yuan, X.; Zhu, J.; Qiao, Z. Efficient Path Planning for UAV Formation Using Dubins Paths. In Proceedings of the International Conference on Guidance, Navigation and Control, Changsha, China, 9–11 August 2024; Springer: Heidelberg/Berlin, Germany, 2024; pp. 588–597. [Google Scholar]
Dubins, L.E. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 1957, 79, 497–516. [Google Scholar] [CrossRef]
Chen, Z.; Shima, T. Shortest Dubins paths through three points. Automatica 2019, 105, 368–375. [Google Scholar] [CrossRef]
Lugo-Cárdenas, I.; Flores, G.; Salazar, S.; Lozano, R. Dubins path generation for a fixed wing UAV. In Proceedings of the 2014 International Conference on Unmanned Aircraft Systems (ICUAS), Orlando, FL, USA, 27–30 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 339–346. [Google Scholar]
Helsgaun, K. An Extension of the Lin-Kernighan-Helsgaun TSP Solver for Constrained Traveling Salesman and Vehicle Routing Problems; Roskilde University: Roskilde, Denmark, 2017; Volume 12, pp. 966–980. [Google Scholar]
Gao, Z.; Wang, N.; Huang, J.; Xie, Y.; Zhang, Y. An Improved Genetic Algorithm for the Dubins Multiple Traveling Salesman Problem with Neighborhoods. In Proceedings of the 2024 5th International Conference on Computer Engineering and Intelligent Control (ICCEIC), Guangzhou, China, 11–13 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 199–203. [Google Scholar]
Gao, C.; Ding, W.; Zhao, Z.; Chen, B.M. Energy-Optimal Trajectory-Based Traveling Salesman Problem for Multi-Rotor Unmanned Aerial Vehicles. In Proceedings of the 2023 62nd IEEE Conference on Decision and Control (CDC), Singapore, 13–15 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 6110–6115. [Google Scholar]
Li, Y.; Wen, D.; Zhang, S.; Li, L. Sequential Task Allocation of More Scalable Artificial Dragonfly Swarms Considering Dubins Trajectory. Drones 2024, 8, 596. [Google Scholar] [CrossRef]
Fu, J.; Sun, G.; Liu, J.; Yao, W.; Wu, L. On Hierarchical Multi-UAV Dubins Traveling Salesman Problem Paths in a Complex Obstacle Environment. IEEE Trans. Cybern. 2023, 54, 123–135. [Google Scholar] [CrossRef]
Berto, F.; Hua, C.; Park, J.; Kim, M.; Kim, H.; Son, J.; Kim, H.; Kim, J.; Park, J. RL4CO: A unified reinforcement learning for combinatorial optimization library. In Proceedings of the NeurIPS 2023 Workshop: New Frontiers in Graph Learning, New Orleans, LA, USA, 15 December 2023. [Google Scholar]
Kwon, Y.D.; Choo, J.; Yoon, I.; Park, M.; Park, D.; Gwon, Y. Matrix encoding networks for neural combinatorial optimization. Adv. Neural Inf. Process. Syst. 2021, 34, 5138–5149. [Google Scholar]
Li, J.; Xin, L.; Cao, Z.; Lim, A.; Song, W.; Zhang, J. Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2021, 23, 2306–2315. [Google Scholar] [CrossRef]
Jones, M.; Djahel, S.; Welsh, K. Path-planning for unmanned aerial vehicles with environment complexity considerations: A survey. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
Hu, Y.; Yao, Y.; Lee, W.S. A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowl.-Based Syst. 2020, 204, 106244. [Google Scholar] [CrossRef]
Nayak, A.; Rathinam, S. Heuristics and learning models for dubins minmax traveling salesman problem. Sensors 2023, 23, 6432. [Google Scholar] [CrossRef]
Cui, Q. Multi-target points path planning for fixed-wing unmanned aerial vehicle performing reconnaissance missions. In Proceedings of the 5th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2023), Wuhan, China, 24–26 March 2023; SPIE: Bellingham, WA, USA, 2023; Volume 12748, pp. 713–723. [Google Scholar]
Shukla, P.; Shukla, S.; Singh, A.K. Trajectory-prediction techniques for unmanned aerial vehicles (UAVs): A comprehensive survey. IEEE Commun. Surv. Tutor. 2025, 27, 1867–1910. [Google Scholar] [CrossRef]
yangchb. Algorithms for Solving VRP. 2020. Available online: https://github.com/yangchb/Algorithms_for_solving_VRP (accessed on 15 October 2025).

Figure 1. Dual-channel embedding module for asymmetric Dubins distances and relative SE(2) representations, extracting row/column embeddings of the physical and geometric channels, respectively.

Figure 2. Architecture of the Dubins-Aware Encoder, illustrating the Dual-Channel Embedding, Rotary Phase Encoding, and DAMS-Attention mechanisms. (a) Cross-semantic fusion module; (b) Rotary Phase Encoding (RoPhE) module.

Figure 3. Path visualization for a 20-task node network. Squares and circles represent depots and task nodes, respectively. Associated arrows signify heading constraints, with original node indices displayed above. The line segments illustrate inter-node transitions, labeled with the optimized visitation order.

Figure 4. Path visualization for a 52 task node network. Squares and circles represent depots and task nodes, respectively. Associated arrows signify heading constraints, with original node indices displayed above. The line segments illustrate inter-node transitions, labeled with the optimized visitation order.

Table 1. Parameter settings for the sensitivity analysis experiments.

Parameter	Range	Default Value	Random Seeds	Problem Size (N)
Attention Layers	${3, 4, 5, 6}$	4	${40, \dots, 45}$	${10, 20, 36, 52}$
Attention Heads	${4, 8, 16}$	8	${40, \dots, 45}$	${10, 20, 36, 52}$

Table 2. Performance comparison (Average Reward) under different attention head and layer configurations across HC-TSP, HC-CVRP, and HC-PDP tasks. Bold values indicate the best performance.

		HC-TSP				HC-CVRP				HC-PDP
		10	20	36	52	10	20	36	52	10	20	36	52
Head	4	−3.17	−4.83	−8.14	−11.19	−4.14	−5.72	−9.22	−12.60	−4.83	−6.74	−11.08	−14.40
	8	−3.14	−4.79	−7.97	−10.48	−4.07	−5.60	−8.99	−11.99	−4.73	−6.68	−10.22	−13.60
	16	−3.18	−4.77	−7.96	−10.04	−4.05	−5.59	−8.99	−11.80	−4.76	−6.67	−10.09	−13.32
Layer	3	−3.27	−4.84	−8.17	−11.42	−4.32	−5.74	−9.32	−12.59	−4.97	−6.91	−10.51	−13.91
	4	−3.14	−4.79	−7.97	−10.48	−4.07	−5.60	−8.99	−11.99	−4.73	−6.68	−10.22	−13.60
	5	−3.14	−4.78	−7.75	−10.12	−4.06	−5.59	−8.72	−11.62	−4.75	−6.65	−10.08	−13.15
	6	−3.19	−4.83	−7.79	−10.83	−4.11	−5.63	−8.80	−11.69	−4.73	−6.71	−10.12	−13.43

Table 3. Comparative Performance of Different Methods on Heading-Constrained Path Planning Tasks. Cost denotes the average Dubins path length, and Time represents the average inference latency per instance (s: seconds, ms: milliseconds).

		10		20		36		52
Method		Cost	Time	Cost	Time	Cost	Time	Cost	Time
HC-TSP	GA	3.1321	1.48 s	4.9117	4.83 s	8.5137	10.36 s	13.8275	18.82 s
	ACO	3.0774	1.59 s	4.7692	5.77 s	8.1604	11.57 s	12.6384	20.25 s
	DPSO	3.6904	1.65 s	5.3368	6.11 s	9.3673	11.96 s	14.349	21.37 s
	AM	3.2381	1.03 ms	4.8834	1.77 ms	7.9681	4.06 ms	10.5273	6.34 ms
	MatNet	3.2048	1.07 ms	4.8263	1.86 ms	7.8529	4.11 ms	10.0374	6.4 ms
	Dubins-Aware NCO	3.1483	1.12 ms	4.7346	1.95 ms	7.7194	4.17 ms	9.8619	6.46 ms
HC-CVRP	GA	4.0514	1.67 s	5.8151	5.69 s	9.9582	10.76 s	15.0839	19.23 s
	ACO	3.8927	1.77 s	5.6804	6.56 s	9.2394	11.81 s	14.3167	21.31 s
	DPSO	4.8834	1.94 s	5.9078	7.1 s	10.6857	12.13 s	15.8591	22.42 s
	AM	4.1934	1.55 ms	5.736	2.68 ms	8.9316	5.09 ms	13.3762	8.07 ms
	MatNet	4.1357	1.63 ms	5.8394	2.79 ms	8.7996	5.21 ms	12.6327	8.09 ms
	Dubins-Aware NCO	4.0537	1.74 ms	5.5781	2.92 ms	8.6731	5.33 ms	11.5569	8.12 ms
HC-PDP	GA	4.7637	1.72 s	7.0691	5.73 s	11.9362	11.06 s	16.3018	20.25 s
	ACO	4.6972	1.8 s	6.8309	6.72 s	11.2561	12.39 s	14.7364	22.37 s
	DPSO	4.8608	1.94 s	7.9551	7.02 s	13.9105	13.16 s	18.1463	23.5 s
	AM	4.8006	1.65 ms	6.735	2.71 ms	10.6202	5.65 ms	14.6318	8.94 ms
	MatNet	4.7605	1.72 ms	6.7103	2.96 ms	10.3295	5.7 ms	13.9164	9.23 ms
	HA	4.7763	2.31 ms	6.6993	3.83 ms	10.4694	6.21 ms	14.2671	9.41 ms
	Dubins-Aware NCO	4.7287	1.82 ms	6.6277	3.05 ms	9.7911	5.86 ms	12.7694	9.05 ms

Table 4. Ablation Study Assessing the Contribution of Individual Components. ‘−’ Denotes the Removal of the Specific Module from the Full Model.

		10	20	36	52
HC-TSP	−Dubins-SE(2) Embedding	3.1785	4.8194	7.8616	10.1726
	−Cross-Semantic	3.1597	4.772	7.7937	9.9834
	−Rotary Phase Encoding	3.1631	4.7938	7.8392	10.0667
	−DAMS-Attn	3.1462	4.7481	7.7315	9.9022
	Dubins-Aware NCO (Full)	3.1483	4.7346	7.7194	9.8619
HC-CVRP	−Dubins-SE(2) Embedding	4.0969	5.6133	8.8449	11.7062
	−Cross-Semantic	4.0673	5.5836	8.7047	11.6097
	−Rotary Phase Encoding	4.1031	5.6297	8.8801	11.7678
	−DAMS-Attn	4.0734	5.5909	8.813	11.6535
	Dubins-Aware NCO (Full)	4.0537	5.5781	8.6731	11.5569
HC-PDP	−Dubins-SE(2) Embedding	4.7537	6.6822	9.896	12.9035
	−Cross-Semantic	4.7421	6.6493	9.8353	12.8407
	−Rotary Phase Encoding	4.7597	6.7082	9.9158	12.9171
	−DAMS-Attn	4.7353	6.6401	9.8261	12.8183
	Dubins-Aware NCO (Full)	4.7287	6.6277	9.7911	12.7694

Table 5. Generalization Performance Under Varying Minimum Turning Radii (

ρ

) and Extended Task Settings (PCTSP, SDVRP).

Table 5. Generalization Performance Under Varying Minimum Turning Radii (

ρ

) and Extended Task Settings (PCTSP, SDVRP).

Scenario	Method	10	20	36	52
$ρ = 0.001$	MatNet	3.2048	4.8263	7.8529	10.0374
$ρ = 0.001$	Dubins-Aware NCO	3.1483	4.7346	7.7194	9.8619
$ρ = 0.0015$	MatNet	3.4112	5.1468	8.3167	10.6901
$ρ = 0.0015$	Dubins-Aware NCO	3.3492	4.9763	8.0383	10.2112
$ρ = 0.002$	MatNet	3.6294	5.3872	8.6131	10.8993
$ρ = 0.002$	Dubins-Aware NCO	3.5731	5.2608	8.3647	10.6074
$ρ = 0.003$	MatNet	4.2288	5.9931	9.2376	11.8409
$ρ = 0.003$	Dubins-Aware NCO	4.0361	5.7732	8.8186	11.4643
PCTSP	MatNet	3.0892	4.7428	7.7918	9.9362
PCTSP	Dubins-Aware NCO	2.9767	4.6581	7.6105	9.7263
SDVRP	MatNet	4.0421	5.6028	8.7893	12.2097
SDVRP	Dubins-Aware NCO	3.9564	5.5391	8.6136	11.4836

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, J.; Wu, Y.; Jia, L.; Shi, H.; Zhu, J. Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing. Drones 2026, 10, 59. https://doi.org/10.3390/drones10010059

AMA Style

Gao J, Wu Y, Jia L, Shi H, Zhu J. Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing. Drones. 2026; 10(1):59. https://doi.org/10.3390/drones10010059

Chicago/Turabian Style

Gao, Jiazhan, Yutian Wu, Liruizhi Jia, Heng Shi, and Jihong Zhu. 2026. "Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing" Drones 10, no. 1: 59. https://doi.org/10.3390/drones10010059

APA Style

Gao, J., Wu, Y., Jia, L., Shi, H., & Zhu, J. (2026). Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing. Drones, 10(1), 59. https://doi.org/10.3390/drones10010059

Article Menu

Dubins-Aware NCO: Learning SE(2)-Equivariant Representations for Heading-Constrained UAV Routing

Highlights

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Limits of Current Learning-Based Solvers

1.3. Objectives and Contributions

2. Related Work

2.1. Dubins-Constrained UAV Routing and Classical Methods

2.2. Neural Combinatorial Optimization (NCO)

3. Model Architecture

3.1. Dubins and Relative SE(2) Representation Embedding

3.1.1. Physical Channel (Cost Encoding)

3.1.2. Geometric Channel (Spatial Encoding)

3.2. Cross-Semantic RoPhE Attention for SO(2) Equivariant Dubins-Aware Modeling

3.3. Trajectory-Autoregressive Decoding

4. Experiments

4.1. Experimental Setup

4.2. Sensitivity Analysis of Model Hyperparameters

4.3. Comparative Experiment

4.4. Ablation Experiments

4.5. Generalization Experiment

4.6. Visual Analysis of Dubins Trajectories

4.6.1. Analysis of Small-Scale Instances (20 Nodes)

4.6.2. Analysis of Large-Scale Instances (52 Nodes)

5. Conclusions

6. Limitations and Future Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Analytical Characterization of Dubins Path Properties

Appendix B. Derivation of the Rotational Equivariance of RoPhE

Appendix C. Boxplot Analysis of Hyperparameter Sensitivity

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI