Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments

Liu, Yichen; Zhou, Chijun; Shao, Lei; He, Yangchao; Wang, Xueqian; Ye, Jikun

doi:10.3390/drones10060444

Open AccessArticle

Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments

by

Yichen Liu

¹

,

Chijun Zhou

^2,*,

Lei Shao

²,

Yangchao He

¹

,

Xueqian Wang

¹ and

Jikun Ye

²

¹

Graduate School, Air Force Engineering University, Xi’an 710051, China

²

Air Defense and Antimissile School, Air Force Engineering University, Xi’an 710051, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(6), 444; https://doi.org/10.3390/drones10060444 (registering DOI)

Submission received: 1 May 2026 / Revised: 27 May 2026 / Accepted: 3 June 2026 / Published: 6 June 2026

(This article belongs to the Section Artificial Intelligence in Drones (AID))

Download

Browse Figures

Review Reports Versions Notes

Highlights

What are the main findings?

An intent-aware CNN–Informer framework is proposed for long-horizon UAV trajectory prediction in constrained environments, combining physically interpretable DBL control parameters with continuous intent features describing no-fly zone avoidance and destination-oriented motion.
The proposed method achieves the best prediction performance among SSD-LSTM, Transformer, iTransformer, DLinear, and Informer baselines, reducing the average prediction error by 17.2% compared with Informer and substantially improving terminal and maximum prediction accuracy.

What are the implications of the main findings?

The results demonstrate that incorporating vehicle dynamics, hidden control effects, and mission-related intent into deep sequence models can significantly enhance the reliability of UAV trajectory prediction under partial observability and constrained flight conditions.
The proposed framework provides a transferable methodology for behavior-aware forecasting, guidance support, and autonomous decision-making in complex UAV and drone operations involving restricted regions, maneuvering constraints, and mission-oriented flight.

Abstract

Long-horizon trajectory prediction for unmanned aerial vehicles (UAVs) operating in constrained environments remains challenging because of strongly nonlinear dynamics, hidden control effects, and evolving destination-oriented behavior. This challenge is particularly pronounced for highly maneuverable cross-domain unmanned aerial vehicles (CDUAVs), whose glide trajectories are strongly coupled with control and environmental constraints. To address this problem, this paper proposes an intent-aware CNN–Informer framework for accurate long-horizon trajectory prediction. First, a control-affine reformulation of the vehicle dynamics is used to construct physically interpretable DBL control parameters, which reduce the learning difficulty associated with hidden control effects. Second, three continuous intent features—tangential no-fly zone avoidance distance, heading error angle, and relative closing velocity—are introduced to encode destination tendency and avoidance requirements. These features are fused with historical trajectory states and fed into a hybrid CNN–Informer network, where the CNN extracts local maneuver patterns and the Informer captures long-range temporal dependencies. Experiments on a constrained trajectory dataset demonstrate that the proposed method achieves the best performance among all compared models, including SSD-LSTM, Transformer, iTransformer, DLinear, and Informer. Compared with Informer, the proposed approach reduces the average prediction error by 17.2% and significantly improves terminal and maximum prediction errors. These results indicate that the proposed framework provides an effective and physically interpretable solution for long-horizon UAV trajectory prediction in constrained flight scenarios, with potential extensions to behavior-aware forecasting and guidance support in autonomous aerial systems.

Keywords:

cross-domain unmanned aerial vehicle (CDUAV); drone trajectory prediction; autonomous aerial systems; intent-aware prediction; constrained flight; CNN–Informer

1. Introduction

Unmanned aerial vehicles (UAVs) and autonomous aerial systems are increasingly required to operate in complex mission environments involving long-range flight, restricted regions, and adaptive maneuvering. In such scenarios, reliable trajectory prediction is important for flight-state assessment, trajectory monitoring, guidance support, and situation awareness. Accurate long-horizon forecasting is especially valuable when aerial vehicles must continuously adjust their motion in response to mission objectives and environmental constraints.

Among these systems, cross-domain unmanned aerial vehicles (CDUAVs) represent a particularly challenging class of UAVs because they can perform long-range, unpowered, and highly maneuverable flight in near-space environments [1,2,3]. In this paper, CDUAV refers specifically to a class of high-speed unmanned vehicles capable of sustained unpowered hypersonic gliding in the near-space regime, with substantial cross-range maneuverability. The term “cross-domain” denotes a flight envelope that spans the high-dynamic-pressure maneuvering regime within the atmosphere and the rarefied near-space gliding regime. After obtaining suitable initial velocity and altitude conditions, such vehicles rely primarily on aerodynamic forces to sustain gliding motion, and their trajectories exhibit strong nonlinearity, broad maneuvering envelopes, and sensitivity to environmental and terminal constraints [4,5,6]. Compared with conventional ballistic flight, gliding flight allows more flexible lateral and longitudinal maneuvering, which makes future trajectory evolution substantially more difficult to estimate accurately [7,8]. Therefore, reliable medium- and long-horizon trajectory prediction is important for flight-state assessment, trajectory monitoring, and decision support in constrained airspace environments [9].

During the glide phase, the equations of motion are highly nonlinear, and the aerodynamic forces are strongly coupled with control variables such as angle of attack and bank angle, making it difficult to establish an accurate predictive dynamic model [10,11]. In addition, the vehicle may continuously adjust its flight path according to terminal objectives and environmental constraints, which introduces considerable uncertainty into trajectory evolution. In practical observation scenarios, only limited motion-state information can usually be obtained from external sensing systems, whereas internal variables related to control actions and motion tendency are not directly observable [12,13]. As a result, achieving accurate long-horizon trajectory prediction under partial observability is still difficult.

Early studies mainly relied on model-based extrapolation and recursive filtering. Numerical integration methods based on three-degree-of-freedom or six-degree-of-freedom motion models can propagate the current state forward in time to generate predicted trajectories. In addition, recursive estimation methods such as the extended Kalman filter, unscented Kalman filter, and particle filter have been widely used in maneuvering vehicle tracking and prediction tasks [14,15,16,17,18]. These methods can provide satisfactory performance when the motion model is sufficiently accurate or the maneuver intensity is limited. However, in highly nonlinear and strongly maneuvering scenarios, prediction accuracy often deteriorates rapidly because simplified dynamic assumptions and process-noise models cannot adequately characterize real trajectory evolution [19]. Although some studies improve performance through online estimation of aerodynamic parameters [20], such approaches still depend strongly on the accuracy of the assumed dynamic model and may remain sensitive to time-varying parameters and structural mismatch.

With the development of machine learning, data-driven methods have provided a new route for trajectory prediction in complex nonlinear systems. Recurrent neural networks and their variants, especially long short-term memory and gated recurrent unit models, have shown strong capability in sequence modeling tasks [21]. These models have also been applied to vehicle trajectory prediction problems and have achieved better performance than some conventional filtering and extrapolation methods in learning temporal patterns from historical trajectory data [22,23,24,25]. However, because recurrent architectures process sequential data step by step, their ability to model long-range temporal dependencies is limited, and prediction errors tend to accumulate as the forecasting horizon increases [26].

More recently, Transformer-based architectures have shown considerable potential for long-sequence forecasting because self-attention can directly model dependencies across distant time steps [27]. Informer improves computational efficiency through the ProbSparse self-attention mechanism and has achieved promising results in long-term time-series forecasting [28]. Related models such as FEDformer, iTransformer, and DLinear further enhance temporal modeling from different perspectives, including decomposition, variable interaction, and linear trend extraction [29,30,31]. Transformer-based methods have also been introduced into trajectory prediction tasks and have demonstrated competitive performance in modeling long-range dependencies [32,33]. Nevertheless, when directly applied to CDUAV prediction, existing deep learning methods still have several limitations.

First, most current approaches treat trajectory prediction mainly as a generic time-series forecasting problem and directly use historical state sequences for end-to-end prediction. Such strategies do not fully exploit the intrinsic relationship between control-related variables and state evolution embedded in the vehicle dynamics, which limits the model’s ability to learn physically meaningful motion patterns efficiently [34]. In the problem considered here, aerodynamic forces are strongly coupled with the control variables, and the original nonlinear equations of motion are not directly convenient for learning. Therefore, a physically informed representation is needed to reduce the difficulty of mapping hidden control effects to future state evolution.

Second, the long-horizon trajectory of CDUAV is influenced not only by its current state, but also by its evolving motion tendency with respect to terminal objectives and constrained regions. In the scenarios considered in this study, trajectory evolution is jointly shaped by terminal-oriented motion and no-fly zone avoidance requirements. Although some existing studies have considered multi-intent fusion or maneuver-state recognition in gliding-vehicle analysis [35,36,37,38], intent information is often represented as a discrete label or auxiliary recognition result rather than as a continuous computable feature directly incorporated into the prediction process. As a result, the model may fit local state variations while failing to capture the latent factors that govern large maneuvers during critical flight phases.

Third, although Transformer-based models are effective at modeling global dependencies, their ability to capture local maneuvering details is relatively limited. In contrast, convolutional neural networks have clear advantages in extracting local temporal patterns and neighborhood correlations [39]. For CDUAV trajectory prediction, both aspects are important: local patterns reflect rapid control-related maneuver changes, whereas long-range dependencies reflect the cumulative influence of motion tendency and environmental constraints over time. Therefore, it is necessary to develop a prediction framework that can integrate local feature extraction with global temporal modeling.

To address the above issues, this paper proposes an intent-aware CNN–Informer framework for medium- and long-horizon trajectory prediction of unmanned aerial vehicles in constrained environments, with CDUAV considered as a representative high-maneuverability case. First, based on aerodynamic control relationships, the vehicle dynamics are reformulated into a control-affine form, and three-dimensional DBL control parameters are constructed by decoupling the lift–drag coefficients from the bank angle. Second, three-dimensional continuous intent features, namely tangential no-fly zone avoidance distance, heading error angle, and relative closing velocity, are constructed according to the relative geometry among the vehicle, intended destination, and no-fly zones. Third, a CNN–Informer hybrid network is designed to combine local maneuver-pattern extraction with long-range dependency modeling. In addition, the constructed intent-related features are incorporated into the prediction process together with historical state and control information to improve multi-step forecasting performance.

The main contributions of this study are summarized as follows:

(1): A control-affine dynamic representation based on three-dimensional DBL control parameters is developed for cross-domain unmanned aerial vehicles, which reduces the learning difficulty associated with the mapping from hidden control effects to state evolution.
(2): A continuous intent-feature construction method is proposed using tangential no-fly zone avoidance distance, heading error angle, and relative closing velocity, enabling terminal-oriented motion tendency and constrained-region avoidance requirements to be directly incorporated into the prediction model in a computable form.
(3): A CNN–Informer hybrid architecture incorporating intent features is established for trajectory prediction, allowing the model to jointly capture local maneuvering patterns and long-range temporal dependencies.
(4): Ablation and comparative experiments on a simulated trajectory dataset with multiple terminal points and multiple no-fly zones verify the effectiveness of the proposed method against several representative baseline models.

The remainder of this paper is organized as follows. Section 2 introduces the motion modeling process, trajectory generation method, DBL control-parameter construction, and intent-feature extraction. Section 3 presents dataset construction, feature analysis, and the proposed CNN–Informer prediction mechanism. Section 4 reports the experimental setup and the results of ablation and comparative studies. Section 5 discusses the main findings and limitations of the study. Section 6 concludes the paper.

2. Motion Modeling and Feature Analysis of the CDUAV

2.1. Optimized Trajectory Generation

During the glide phase, the CDUAV is mainly driven by the Earth’s gravity and aerodynamic forces. Neglecting the Earth’s rotation, the nonlinear motion model of the vehicle in the VTC coordinate system is given by Equation (1):

\{\begin{cases} \dot{r} = v \sin θ \\ \dot{ϕ} = \frac{v \cos θ \sin ψ}{r \cos φ} \\ \dot{φ} = \frac{v \cos θ \cos ψ}{r} \\ \dot{v} = - \frac{D}{m} - g \sin θ \\ \dot{θ} = \frac{L \cos σ}{m v} + \frac{\cos θ}{v} (\frac{v^{2}}{r} - g) \\ \dot{ψ} = \frac{L \sin σ}{m v \cos θ} + \frac{v \cos θ \sin ψ \tan φ}{r} \end{cases}

(1)

where

r

denotes the distance from the vehicle to the center of the Earth;

m

denotes the mass of the vehicle;

ϕ, φ

denote the longitude and latitude of the vehicle;

v

denotes the velocity of the vehicle;

θ, ψ

denote the flight-path angle and heading angle of the vehicle; and

σ

denotes the bank angle of the vehicle.

L, D

represent the lift and drag acting on the vehicle.

\{\begin{cases} L = 0.5 C_{L} S ρ v^{2} \\ D = 0.5 C_{D} S ρ v^{2} \end{cases}

(2)

where

C_{L}

denotes the lift coefficient;

C_{D}

denotes the drag coefficient, both of which depend on the angle of attack

α

and the Mach number

M a

;

S

denotes the reference area of the vehicle; and

ρ

denotes the atmospheric density corresponding to the flight altitude.

Since the motion model in Equation (1) is strongly nonlinear with respect to the control variables angle of attack

α

and bank angle

σ

, Equation (1) is reconstructed into a control-affine system according to the drag polar relation:

C_{D} (α, M a) = C_{D_{0}} (M a) + K (M a) C_{L} {(α, M a)}^{2}

(3)

where the zero-lift drag coefficient

C_{D_{0}}

and induced drag coefficient

K

can be obtained by aerodynamic-data interpolation. Thus, the lift and drag coefficients corresponding to the maximum lift-to-drag ratio

{\hat{C}}_{L}, {\hat{C}}_{D}

can be derived as follows:

\{\begin{cases} {\hat{C}}_{L} (M a) = \sqrt{C_{D_{0}} (M a) / K (M a)} \\ {\hat{C}}_{D} (M a) = 2 C_{D_{0}} (M a) \end{cases}

(4)

Define the normalized coefficient

η

as

η = C_{L} (α, M a) / {\hat{C}}_{L} (M a)

(5)

Then, the lift and drag coefficients can be rewritten as

\{\begin{cases} C_{L} (α, M a) = η {\hat{C}}_{L} (M a) \\ C_{D} (α, M a) = (1 + η^{2}) {\hat{C}}_{D} (M a) / 2 \end{cases}

(6)

Accordingly, the lift and drag expressions can be transformed into

\{\begin{cases} L = η \hat{L} (M a) \\ D = \frac{(1 + η^{2}) \hat{D} (M a)}{2} \end{cases}

(7)

where

\hat{L}, \hat{D}

denote the lift and drag corresponding to the lift and drag coefficients

{\hat{C}}_{L}, {\hat{C}}_{D}

associated with the maximum lift-to-drag ratio. At this point, the lift and drag in Equation (1) can be represented by the new control variable normalized coefficient

η

and bank angle

σ

. However, the system is still not in a control-affine form. Therefore, the affine variables

u_{1}, u_{2}, u_{3}

are introduced:

\{\begin{cases} u_{1} = η \cos σ \\ u_{2} = η \sin σ \\ u_{3} = η^{2} \end{cases}

(8)

Substituting the affine variables

u = [u_{1}, u_{2}, u_{3}]

into Equation (1), the motion model can be converted into the following control-affine system:

\dot{x} = f (x) + B (x) u

(9)

f (x) = [\begin{matrix} v \sin θ \\ v \cos θ \sin ψ / (r \cos φ) \\ v \cos θ \cos ψ / r \\ - 0.5 \hat{D} - \sin θ / r^{2} \\ \cos θ (v^{2} r - 1) / v r^{2} \\ v \cos θ \sin ψ \tan φ / r \end{matrix}]

(10)

B (x) = [\begin{matrix} 0_{3 \times 1} & 0_{3 \times 1} & 0_{3 \times 1} \\ 0 & 0 & - 0.5 \hat{D} \\ \hat{L} / v & 0 & 0 \\ 0 & \hat{L} / (v \cos θ) & 0 \end{matrix}]

(11)

Subsequently, Equation (9) is linearized by first-order Taylor expansion around the previous optimized trajectory. The detailed derivation is omitted here. The focus is placed on the construction of the constraints and objective function in the trajectory planning process.

In general, during flight, the CDUAV should satisfy constraints on dynamic pressure

q

, overload

n

, and heat flux density

\dot{Q}

, which can be expressed as

\{\begin{cases} q = 0.5 ρ v^{2} \leq q_{\max} \\ n = \sqrt{L^{2} + D^{2}} \leq n_{\max} \\ \dot{Q} = C ρ^{0.5} v^{3.15} \leq {\dot{Q}}_{\max} \end{cases}

(12)

where

q_{\max}

denotes the upper limit of dynamic pressure,

n_{\max}

denotes the upper limit of overload,

{\dot{Q}}_{\max}

denotes the upper limit of heat flux density, and

C

denotes the aerodynamic heating coefficient.

The affine variables should satisfy

u_{1}^{2} + u_{2}^{2} = u_{3}

(13)

A CDUAV usually performs long-range flight above the Earth’s surface. During flight, it may need to avoid restricted areas or high-risk regions imposed by environmental or operational constraints. In this paper, these areas are modeled as infinitely high cylindrical no-fly zones in three-dimensional space. Therefore, the following no-fly zone avoidance constraint is incorporated into the planning process:

\sqrt{(ϕ - ϕ_{n}) + (φ - φ_{n})} \geq r_{n}

(14)

where

ϕ_{n}, φ_{n}

denote the center longitude and latitude of the no-fly zone, and

r_{n}

denotes the radius of the no-fly zone.

To obtain an accurate terminal position while preserving sufficient vehicle energy, the objective function is formulated as

J = - c_{1} v_{f} + c_{2} \sqrt{{(ϕ - ϕ_{f})}^{2} + {(φ - φ_{f})}^{2}} + c_{3} |r - r_{f}|

(15)

where

v_{f}

denotes the desired terminal velocity of the trajectory,

ϕ_{f}, φ_{f}

denote the desired terminal longitude and latitude, and

c_{1}, c_{2}, c_{3}

denote the tuning coefficients. The ultimate goal is to obtain the trajectory that minimizes the objective function value

J

.

2.2. Selection of Control Parameters

According to Newton’s second law, the ratio of the aerodynamic force acting on the vehicle to the vehicle mass is the aerodynamic acceleration. Based on this, the aerodynamic drag acceleration

a_{v}

, aerodynamic climb acceleration

a_{c}

, and aerodynamic turning acceleration

a_{t}

of the CDUAV in the VTC coordinate system are defined. These quantities correspond to the unknown terms in

\dot{v}, \dot{θ}, \dot{ψ}

of the motion model. Substituting the expressions of lift

L

and drag

D

yields

\{\begin{cases} a_{v} = \frac{C_{D} S ρ v^{2}}{2 m} \\ a_{c} = \frac{C_{L} S ρ v^{2} \cos σ}{2 m} \\ a_{t} = \frac{C_{L} S ρ v^{2} \sin σ}{2 m} \end{cases}

(16)

The above parameters decompose the lift into the turning-force direction and the climb-force direction, i.e., the lift coefficient

C_{L}

and bank angle

σ

are combined together, which is not conducive to intuitively distinguishing the variation law of each control quantity. To address this issue, this paper proposes decoupling the lift–drag coefficients from the bank angle and constructing the parameters

K_{D}, K_{L}

:

\{\begin{cases} K_{D} = \frac{a_{v}}{2 q} = \frac{C_{D} S}{2 m} \\ K_{L} = \frac{a_{c}}{2 q \cos σ} = \frac{C_{L} S}{2 m} \end{cases}

(17)

Together with the bank angle

σ

,

K_{D}

, K_{L}

constitute the control parameters of the vehicle, which are hereinafter referred to as the DBL parameters. Essentially, the DBL parameters reflect the drag acceleration and potential lift-acceleration capability generated by a unit-mass vehicle under unit dynamic pressure. They provide an efficient decoupled and abstract representation of the conventional control variables and physical interpretability. Here, the term “physical interpretability” specifically refers to the fact that the DBL parameters carry explicit dimensions of acceleration and retain an analytical relationship with the aerodynamic forces; it does not imply a direct mapping to specific actuator commands. By substituting them into the model, the motion equation of the vehicle can be transformed into

\{\begin{cases} \dot{r} = v \sin θ \\ \dot{ϕ} = \frac{v \cos θ \sin ψ}{r \cos φ} \\ \dot{φ} = \frac{v \cos θ \cos ψ}{r} \\ \dot{v} = - K_{D} ρ v^{2} - g \sin θ \\ \dot{θ} = K_{L} ρ v \cos σ + \frac{\cos θ}{v} (\frac{v^{2}}{r} - g) \\ \dot{ψ} = \frac{K_{L} ρ v \sin σ}{\cos θ} + \frac{v \cos θ \sin ψ \tan φ}{r} \end{cases}

(18)

To obtain the discrete-time motion model of the vehicle, appropriate sensing devices and filtering algorithms are employed during the tracking stage. In this paper, the current statistical (CS) model and the unscented Kalman filter (UKF) are used to obtain the estimated state vector of the vehicle in the east-north-up coordinate system. CS-UKF is selected for three reasons: the CS model is suitable for the rapidly varying DBL parameters by treating maneuvering acceleration as a colored-noise process; the unscented transform avoids linearization errors that an EKF would incur on the nonlinear spherical observation; and it is computationally lighter than a particle filter while remaining numerically stable:

{\tilde{x}}_{k} = [x_{k}, y_{k}, z_{k}, {\dot{x}}_{k}, {\dot{y}}_{k}, {\dot{z}}_{k}, a_{v, k}, a_{t, k}, a_{c, k}]

(19)

where

[x_{k}, y_{k}, z_{k}]

denotes the position vector of the vehicle;

[{\dot{x}}_{k}, {\dot{y}}_{k}, {\dot{z}}_{k}]

denotes the velocity vector of the vehicle;

[a_{v, k}, a_{t, k}, a_{c, k}]

denotes the aerodynamic acceleration vector of the vehicle in the semi-velocity coordinate system; and

k

denotes the tracking time instant.

The vehicle state vector required by the proposed model is represented in the VTC coordinate system as follows:

x_{k} = [r_{k}, ϕ_{k}, φ_{k}, V_{k}, θ_{k}, ψ_{k}, K_{D, k}, σ_{k}, K_{L, k}]

(20)

2.3. Extraction of Intent Features

In practical constrained-flight scenarios, the trajectory of a CDUAV is determined not only by its physical characteristics but also by mission-related factors such as intended destinations and no-fly zones. Therefore, trajectory prediction should jointly consider the relative geometry among the vehicle, the no-fly zones, and the intended destinations. This paper analyzes the relative positional and angular relationships among the three, and then extracts intent features that can directly reflect the factors affecting the vehicle trajectory.

The vehicle is expected to approach its intended destination during flight, but it may need to alter its maneuvering mode to avoid no-fly zones, thereby affecting the trajectory. A no-fly zone influences the trajectory through its location and coverage range. In the simplified model used in this paper, a no-fly zone is described by its center longitude and latitude and its radius, as Figure 1 show:

If the line connecting the vehicle and the intended destination passes through the interior of a no-fly zone, that no-fly zone will affect the flight trajectory. In order to unify the influence of the no-fly zone location and coverage range into a single variable, this paper proposes the tangential no-fly zone avoidance distance

d_{n}

. Let the angle between the line connecting the vehicle and the center of the no-fly zone and the flight-velocity direction be

ψ_{n}

, and let the longitude and latitude of the nearest no-fly zone center to the vehicle be

(ϕ_{n 1}, φ_{n 1})

. Then, the tangential no-fly zone avoidance distance

d_{n}

is expressed as

d_{n} = R_{e} \sqrt{{(ϕ_{n 1} - ϕ)}^{2} + {(φ_{n 1} - φ)}^{2}} \sin ψ_{n}

(21)

where

R_{e}

denotes the radius of the Earth. Here

R_{e}

= 6378.137 km, corresponding to the WGS-84 spherical Earth assumption; ellipsoidal corrections are not considered in this work. If the tangential no-fly zone avoidance distance satisfies

d_{n} < r_{n}

, it indicates that the vehicle needs to maneuver to avoid the no-fly zone at the current time; if

d_{n} > r_{n}

, it means that maintaining the current velocity direction will not cause the vehicle to pass through the no-fly zone, and no redundant maneuver is required.

The influence of the no-fly zone on the trajectory has thus been incorporated into the tangential no-fly zone avoidance distance

d_{n}

. Since the trajectory is ultimately oriented toward the intended destination, the influence of the destination on trajectory evolution cannot be neglected. Define the angle between the line connecting the vehicle and the intended destination and the geographic north direction as the line-of-sight angle

ψ_{l o s}

. The difference between the vehicle heading angle

ψ

and the line-of-sight angle

ψ_{l o s}

is defined as the heading error angle

Δ ψ

, which is expressed as

Δ ψ = ψ - ψ_{l o s}

(22)

The heading error angle

Δ ψ

reflects the degree of deviation of the vehicle from the intended destination. If

Δ ψ

is positive, a larger value indicates that the velocity direction is more biased toward the south, and the vehicle should maneuver northward. If

Δ ψ

is negative, a larger absolute value indicates that the velocity direction is more biased toward the north, and the vehicle should maneuver southward. In addition to the degree of deviation, the relative closing velocity between the vehicle and the intended destination reflects the efficiency of approach. The relative closing velocity

v_{c}

is defined strictly as the “kinematic line-of-sight velocity component”—the projection of the velocity vector onto the line of sight—without any energy-state interpretation. The term “closing” here refers solely to its kinematic meaning, and is expressed as

v_{c} = v \cos (Δ ψ)

(23)

A larger

v_{c}

indicates that the current heading is more aligned with the line-of-sight direction to the intended destination, which is more favorable for rapidly approaching the destination. When

Δ ψ = 0

,

v_{c} = v

indicates that the vehicle is flying directly toward the intended destination and the closing velocity reaches its maximum. In contrast, when

Δ ψ

approaches ±90°,

v_{c}

approaches zero, indicating that the vehicle is mainly flying laterally and has almost no capability to approach the intended destination.

v_{c} < 0

represents a deviation from the intended destination.

Under the simplifying assumptions of cylindrical no-fly zones and point destinations, these three features span the minimal complete geometric description of the vehicle’s intent state and no additional intent variables are needed. Their optimality in more complex scenarios should be validated through feature-learning methods in future work.

In practical scenarios, the observer usually cannot directly determine the true intended destination of the vehicle and can only infer its motion intent from the current trajectory, the no-fly zone distribution, and the candidate destination set. Therefore, the destination-oriented intent is modeled as a discrete latent variable

g_{k} \in ς = {g^{(1)}, g^{(2)}, \dots, g^{(M)}}

, where

ς

denotes the complete candidate destination set determined from prior information and environmental settings. For any candidate destination

g^{(m)}

, the corresponding continuous intent features can be computed as

ξ_{k}^{(m)} = {[d_{n, k}^{(m)}, Δ ψ_{k}^{(m)}, v_{c, k}^{(m)}]}^{T}

(24)

Furthermore, the intent consistency score of each candidate destination is defined as

s_{k}^{(m)} = - \partial_{1} χ_{d} (d_{n, k}^{(m)}) - \partial_{2} |Δ ψ_{k}^{(m)}| + \partial_{3} χ_{v} (v_{c, k}^{(m)})

(25)

where

\partial_{1}, \partial_{2}, \partial_{3} > 0

is the weighting coefficient, and

χ_{d} (\cdot), χ_{v} (\cdot)

denote the normalized mappings for avoidance pressure and approach efficiency, respectively. Then, an observer-side approximation of the intent posterior at time

k

can be constructed as

π_{k}^{(m)} = p (g_{k} = g^{(m)} |ζ_{k}) = \frac{\exp (s_{k}^{(m)} / τ)}{\sum_{j = 1}^{M} \exp (s_{k}^{(j)} / τ)}

(26)

where

ζ_{k}

denotes the observation history up to time

k

, and

τ

denotes the tuning parameter.

Based on the above posterior, the weighted intent features can be obtained as

{\bar{ξ}}_{k} = \sum_{m = 1}^{M} π_{k}^{(m)} ξ_{k}^{(m)}

(27)

These weighted features are then used as continuous inputs to the deep neural network. This construction has two implications. First, it transforms destination-category discrimination into a differentiable, continuous, and computable intent representation, thereby avoiding the information loss caused by discrete labels. Second, when the posterior probability of a certain candidate destination dominates,

{\bar{ξ}}_{k}

degenerates into the deterministic intent feature corresponding to that destination. The single destination setting adopted in the current experiments of this paper can be viewed as a special case of the above general framework under point estimation.

By extracting the above intent features, the no-fly zone avoidance requirement and the destination-approach tendency are unified into computable geometric and kinematic quantities, thereby providing physically meaningful features for subsequent trajectory prediction.

3. Deep Learning-Based Trajectory Prediction Mechanism for CDUAV

3.1. Construction of the Trajectory Dataset

The trajectory dataset is generated using the sequential convex optimization method described in Section 2.1. The initial state of the trajectory is selected as

[r_{0}, ϕ_{0}, φ_{0}, v_{0}, θ_{0}, ψ_{0}] = [70 km, 0 °, 0 °, 6000 m / s, 0 °, 60 °]

, and the process constraints are set as

{\dot{Q}}_{\max} = 4 \times 10^{6} W / m^{2}

,

q_{\max} = 100 Kpa

, and

n_{\max} = 4

. Three guidance locations,

[r_{f 1}, ϕ_{f 1}, φ_{f 1}] = [30 km, 23 °, 13 °]

and

[r_{f 3}, ϕ_{f 3}, φ_{f 3}] = [30 km, 31 °, 17 °]

, are selected for trajectory generation. The no-fly zone parameters are set as

[ϕ_{n 1}, φ_{n 1}] = [20 °, 10 °]

,

[ϕ_{n 2}, φ_{n 2}] = [24 °, 17 °]

, and

[ϕ_{n 3}, φ_{n 3}] = [32 °, 14 °]

, and the radius of the no-fly zones is set as

2 °

. The tuning coefficients are set as

c_{1} = 10

,

c_{2} = 1

,

c_{3} = 0.1

. Because of the constraints imposed by the vehicle motion equations, the number of trajectories that can be optimized under the above initial conditions is limited. Therefore, the initial conditions are adjusted such that

r_{0}

is set to

70 ~ 72 km

,

v_{0}

is set to

5500 ~ 6200 m / s

, and

ψ_{0}

is set to

55 ° ~ 85 °

. By traversing the above ranges of altitude, velocity, and heading angle, a total of 658 optimized trajectories are obtained. The sequential convex optimization enforces no-fly zone constraints as hard constraints throughout each iteration, and the resulting trajectory library was verified by geometric inspection to confirm full compliance. The resulting flight trajectories are shown in the Figure 2 below:

3.2. Feature Analysis

In Section 1, this paper proposed 6-dimensional state features, 3-dimensional control-parameter features, and 3-dimensional intent features. These features jointly constitute the input space of the trajectory prediction model. The selection of these features is not arbitrary, but is based on a comprehensive consideration of the motion characteristics, control mechanism, and destination-oriented behavior of the CDUAV, and thus has clear physical meaning.

For any dynamical system, its future evolution directly depends on its current state. Therefore, the 6-dimensional state features

r, ϕ, φ, v, θ, ψ

are the most fundamental set describing the vehicle motion state and provide the initial conditions for trajectory prediction. The 6-dimensional state parameters represent the most direct choice of state variables in CDUAV trajectory prediction; they are derived from the classical three-degree-of-freedom equations of motion and constitute a fully observable state for the trajectory. The 3-dimensional DBL control parameters are derived from the drag profile method and represent the parameter selection yielding the best prediction performance in existing research on CDUAV trajectory prediction; the 3-dimensional intent feature parameters proposed in this paper, however, are derived from mission-level constraints such as no-fly zone avoidance and destination guidance, representing a low-dimensional abstraction of prior mission knowledge. These three features constitute the minimal complete geometric description of the aircraft’s intent state. The three categories of features correspond to the three complementary semantic levels of “where the aircraft is (state)/how it is flying (control)/where it intends to go (intent)”.

To more intuitively illustrate the variation in the features along the trajectory and the correlations among them, this paper selects one trajectory with initial state

[r_{0}, ϕ_{0}, φ_{0}, v_{0}, θ_{0}, ψ_{0}] = [70 km, 0 °, 0 °, 6000 m / s, 0 °, 76 °]

and intended destination

[r_{f 2}, ϕ_{f 2}, φ_{f 2}] = [30 km, 30 °, 20 °]

, as shown in the Figure 3 below:

The analyzed segment starts at 200 s of flight and continues for 400 s. The resulting variation in the 12-dimensional features over time is shown below:

From the flight trajectory, it can be seen that the initial heading deviation of the vehicle from the intended destination is relatively large, and the line connecting the vehicle and the intended destination passes through the interior of the first no-fly zone. Therefore, the vehicle performs a large maneuver to avoid that no-fly zone. It can be seen from Figure 4 that when the vehicle performs the avoidance maneuver, the DBL control parameters vary significantly, and the bank angle

σ

shows a downward trend after avoiding the no-fly zone, indicating that the vehicle must continue maneuvering to rapidly adjust its heading toward the intended destination. The tangential no-fly zone avoidance distance

d_{n}

increases rapidly at about 180 s and then remains unchanged, because after the vehicle avoids the first no-fly zone, the line connecting the vehicle and the intended destination no longer passes through the remaining two no-fly zones, and therefore they no longer affect

d_{n}

. Except for a temporary increase during no-fly zone avoidance, the heading error angle

Δ ψ

shows an overall decreasing trend, reflecting that the vehicle progressively adjusts its trajectory toward the intended destination. Because a maneuver is required for avoidance, the relative closing velocity

v_{c}

is relatively small in the early stage of flight, indicating low efficiency in approaching the intended destination. Although the flight velocity continuously decreases, as the vehicle gets closer to the intended destination, the relative closing velocity

v_{c}

shows an increasing trend instead. The variation trend of the 6-dimensional state features is relatively less pronounced, which also verifies the rationality of combining state features with control-parameter features and intent features for trajectory prediction.

Pearson correlation analysis is performed on the 12-dimensional feature parameters to verify the rationality of the proposed features from the perspective of feature correlation. The resulting figure is shown below:

From Figure 5, it can be seen that the trajectory state parameters exhibit strong intrinsic coupling characteristics. The correlation coefficient between longitude and latitude reaches 0.9868, indicating an almost complete positive correlation, which suggests that within the recorded 400 s, the vehicle trajectory in the horizontal plane is approximately a straight line, consistent with the energy-optimal glide trajectory characteristic. The correlation coefficient between velocity

v

and heading angle

ψ

reaches 0.9977, revealing the high coordination between the magnitude and direction of the velocity vector during high-speed flight; an increase in velocity must be accompanied by heading adjustment to maintain flight stability. The correlation coefficient between flight-path angle

θ

and all other features is below 0.216, and its maximum correlation appears between

θ

and the heading error angle

Δ ψ

, indicating nearly independent behavior. This suggests that

θ

as the control variable in the vertical plane is adjusted independently of the motion parameters in the horizontal plane, which provides theoretical support for the three-dimensional decoupled control of the vehicle.

Among the DBL control parameters,

K_{D}

and

K_{L}

show a very strong positive correlation. Since drag and lift usually vary coordinately through the angle of attack, the correlation coefficient of 0.9791 confirms the tightness of this coupling. The maximum absolute correlation coefficient between

σ

and the other 11 features is only 0.3676, and its correlation coefficients with other trajectory features are all below 0.17. This weak correlation indicates that

σ

, as a lateral control channel, is indeed relatively decoupled from the longitudinal and heading motions. The correlation coefficient between

d_{n}

and

ϕ

is 0.7858, that between

d_{n}

and

φ

is 0.7844, and that between

d_{n}

and

v

is −0.7983, indicating that

d_{n}

can effectively reflect the relative geometric relationship between the vehicle and the no-fly zone while avoiding complete collinearity with a single position parameter and retaining independent no-fly zone avoidance information. The correlation coefficient between

Δ ψ

and

v

is 0.984, and that between

Δ ψ

and heading angle

ψ

is 0.9865, indicating that

Δ ψ

not only reflects the deviation between the current heading angle and the direction toward the intended destination, but also organically integrates heading-correction demand with energy management through its strong correlation with velocity. The correlation coefficient between

v_{c}

and

Δ ψ

is −0.9638, which is consistent with the theoretical expectation that the larger the heading deviation is, the smaller the effective closing velocity is.

From the above analysis, it can be concluded that, in addition to the 6-dimensional state features, the DBL parameter features

K_{D}

and

K_{L}

exhibit high correlation, ensuring longitudinal maneuvering efficiency, while their weak correlation with

σ

provides lateral maneuvering freedom. The intent features

d_{n}, Δ ψ, v_{c}

correspond to no-fly zone avoidance, heading correction, and destination approach, respectively, thereby ensuring a comprehensive consideration of the relative situation among the vehicle, no-fly zones, and intended destinations. The feature construction proposed in this section defines the input space of the trajectory prediction network and provides feature inputs that integrate theoretical consistency and physical significance for the subsequent realization of intent-aware trajectory prediction. The Pearson correlation analysis is presented here as a descriptive summary of static linear associations among the features and is not intended as a statistical test of temporal independence.

From a modeling perspective, when only historical states are used for prediction, the observer captures a mixed state-transition process driven by both hidden control inputs and latent motion intents. As a result, its conditional distribution often exhibits multimodal characteristics. The DBL parameters are essentially surrogate representations of the instantaneous control effect of the vehicle, while the intent features are surrogate representations of the destination-approach requirement and the no-fly zone avoidance requirement. By jointly forming an enhanced information state with the state variables, the original problem can be approximately rewritten as

p (s_{k + 1}| s_{1 : k}, u_{1 : k}, i_{1 : k}) \approx p (s_{k + 1}| s_{k}, u_{k}, i_{k})

(28)

That is, the enhanced information state is used to recover the one-step conditional sufficiency of the system. The deep network adopted in this paper learns this conditional mapping, which is essentially the learning of the state-transition kernel jointly induced by the vehicle dynamics, control strategy, and motion intent.

3.3. CNN–Informer Inference Mechanism Incorporating Vehicle Intent Features

To construct a trajectory prediction framework capable of jointly modeling local physical maneuvers and global motion tendencies, this paper proposes a CNN–Informer hybrid deep learning model. Theoretically, the trajectory prediction problem is transformed into the process of learning a state-transition function in a high-dimensional feature space. Its core lies in establishing the mapping from the historical observation sequence

X_{1 : T} = [x_{1}, x_{2}, \dots x_{T}]

to the future state sequence

Y_{T + 1 : T + τ} = [y_{T + 1}, y_{T + 2}, \dots y_{T + τ}]

.

3.3.1. Local Spatiotemporal Feature Enhancement Based on Convolutional Neural Networks

The original input features are 12-dimensional, including the 6-dimensional vehicle state features

r, ϕ, φ, v, θ, ψ

, the 3-dimensional decoupled control-parameter features DBL, and the 3-dimensional intent features

d_{n}, Δ ψ, v_{c}

. These dimensions exhibit strong local correlations along the time axis. To extract such local spatiotemporal patterns, a one-dimensional convolutional neural network is used in the first layer of the model. Its operation is defined as

H_{t}^{(l)} = δ (W^{(l)} * H_{t - k : t + k}^{(l - 1)} + b^{(l)})

(29)

where

H^{(0)} = X

is the input sequence,

*

denotes the one-dimensional convolution operation,

(2 k + 1)

denotes the kernel size,

W^{(l)}

and

b^{(l)}

are the convolution parameters of the

l

-th layer, and

δ

denotes the ReLU activation function. Through two convolution layers with kernel sizes of 3 and 5, the model gradually abstracts the local spatiotemporal features

F_{C N N}

. Its objective is to minimize the prediction error of state changes between adjacent time steps, thereby explicitly encoding the short-term inertial characteristics of vehicle motion.

3.3.2. Residual Feature Fusion and High-Dimensional Space Embedding

To preserve the explicit physical meaning of the original features, promote stable gradient propagation and mitigate the optimization difficulties that may arise in deeply stacked attention layers, residual connections are introduced to construct the fused features:

{\tilde{X}}_{t} = C o n c a t (x_{t}, F_{C N N, t})

(30)

The purpose is to form an extension of the identity mapping by integrating the identity mapping with the nonlinear transformation of the CNN. The branch of the original features serving as the identity mapping ensures that gradients can be directly back-propagated, thereby alleviating the gradient attenuation problem in deep networks. Meanwhile, the concatenation operation avoids mutual suppression among feature channels and preserves the independence of information in each dimension. Therefore, the fused features

{\tilde{X}}_{t}

contain not only directly interpretable physical states and intent signals, but also the abstract spatiotemporal patterns learned by the CNN, jointly forming the complete input to the subsequent sequence modeling module.

3.3.3. Global Intent Encoder–Decoder Based on ProbSparse Self-Attention

To capture the long-range dependencies in the sequence associated with global motion tendencies such as no-fly zone avoidance and destination approach, the encoder structure of Informer is adopted in this paper. Its core is the ProbSparse self-attention mechanism. ProbSparse attention measures the similarity between Query and Key distributions to screen important queries, thereby achieving near-linear complexity.

The encoder receives the historical sequence composed of the above 12-dimensional features and exploits multiple layers of ProbSparse self-attention to mine both intra-feature dependencies and cross-time-step dependencies. Among them, the intent features, as key signals connecting physical motion and destination-oriented behavior, interact with the state and control parameters in the attention mechanism, enabling the model to perceive the vehicle’s avoidance requirement and destination-approach tendency under the current situation. The encoder finally outputs a hidden representation

C \in R^{T \times d_{model}}

that fuses the global temporal context and intent semantics. This representation integrates the dynamic state, control logic, and real-time motion intent of the vehicle. A schematic of the encoder is shown in the Figure 6 below:

As shown in Figure 7, the decoder takes the context representation output by the encoder as input and progressively generates future control-parameter predictions from the encoded historical feature sequence. Through the cross-attention mechanism, the decoder aligns and exploits the encoded historical information to progressively generate the predicted three-dimensional control parameters at future time steps. The predicted trajectory is then obtained by integrating the predicted control parameters. Intent features derived from the observed state sequence and the geometric relationships among the vehicle, the intended destination, and the no-fly zones are introduced as auxiliary inputs, so that the model can better capture destination-oriented tendency and avoidance requirements.

The cascade of CNN and Informer in this paper is not a simple stacking of network structures, but corresponds to two different time-scale information structures in unmanned glide trajectory generation. At a short time scale, the local maneuvers of the vehicle are mainly dominated by the rapid adjustment of control variables and their instantaneous effects on velocity and flight-path angles, exhibiting clear local stationarity and neighborhood correlation. Therefore, they are suitable for extraction by a one-dimensional convolutional network with a limited receptive field. At a long-time scale, the behaviors of no-fly zone avoidance and destination approach depend on long-horizon trajectory evolution relationships and are typical long-range dependency modeling problems. Therefore, they are suitable for encoding by a sparse self-attention mechanism. Based on this, the CNN front-end is responsible for extracting local maneuvering patterns, while the Informer encoder is responsible for modeling global intent dependencies. Functionally, the two correspond to hierarchical modeling of short-term dynamics and long-term trajectory constraints, respectively.

Through the integration of local convolutional feature extraction, residual feature fusion, sparse global attention encoding, and intent-aware decoding, the proposed CNN–Informer model forms a hierarchical deep learning architecture for long-horizon trajectory prediction. This design combines sequence modeling with physically meaningful feature construction, thereby providing an interpretable framework for trajectory prediction under partial observability in constrained environments. The overall algorithm framework is shown in Figure 8. The core novelty of this work does not lie in the CNN–Transformer cascade topology per se, which has been previously explored in trajectory prediction. Rather, it lies in pairing a hierarchical temporal architecture with a control-affine, physics-grounded input space (DBL parameters) and a continuous-valued intent representation. The methodological contribution is the co-designed integration of these three elements. The “predict-then-integrate” design preserves consistency with the equations of motion but introduces monotonic error accumulation along the prediction horizon. Direct position regression would mitigate accumulation but at the cost of dynamical fidelity. The trade-off is intrinsic to physics-informed predictors of this type.

4. Simulation Results and Validation

4.1. Construction of Training Samples

The trajectories in the trajectory library constructed in Section 2.1 are tracked. The instant at 200 s of flight is taken as the starting point of tracking. The 200 s start point is selected so that prediction begins after the trajectories have entered the stable gliding phase, ensuring well-conditioned initial dynamics. The CS-UKF algorithm is used for continuous tracking over 400 s, and sliding-window segmentation is performed on the tracked trajectories at an interval of 1 s, thereby forming a large number of segmented samples with a duration of 200 s. For each segmented sample, the 200 s segment is used as the historical input sequence. Although the number of optimized base trajectories is limited by the underlying vehicle dynamics and planning constraints, sliding-window segmentation was used to generate sufficiently diverse training samples for sequence learning. The dataset is partitioned at the trajectory level: all sliding windows derived from a single trajectory are assigned exclusively to one subset, eliminating sample-level leakage.

The 658 original trajectories are partitioned at the trajectory level into training, validation, and test subsets in approximately a 7:2:1 ratio (461/132/65 trajectories), corresponding to 24,700/6700/1500 sliding-window samples. All sliding windows from a given original trajectory are assigned exclusively to one subset to prevent sample-level leakage. Sliding windows derived from the same trajectory are temporally overlapping; the figure of 24,700 reflects the augmented training set rather than statistically independent samples. The sliding-window scheme is a standard data augmentation technique in sequence learning. Trajectory-level partitioning ensures that the test evaluation remains valid despite the within-trajectory overlap.

To eliminate the influence of differences in dimension and magnitude among features on model training, all 12-dimensional input features are normalized. The normalization extrema are statistically obtained from the training set, and the same normalization parameters are used for the validation and test sets. The feature ranges in the training set cover the typical distribution observed in the test set, and no significant out-of-range samples were observed during evaluation.

4.2. Model Training

4.2.1. Model Parameter Settings

The CNN–Informer model proposed in this paper consists of four parts: a convolutional feature extraction module, a residual fusion layer, a ProbSparse self-attention encoder, and a cross-attention decoder. The convolutional feature extraction module contains two residual convolution blocks, each composed of two one-dimensional convolutions with kernel sizes of 3 and 5, respectively. An SE channel-attention module is embedded after each residual block to adaptively adjust the weight allocation of feature channels. The encoder consists of three layers of ProbSparse self-attention, with the number of attention heads set to 8, the hidden dimension set to 512, and the feed-forward-layer dimension set to 2048. The decoder adopts two layers of cross-attention, and its structural parameters are the same as those of the encoder. The global dropout rate is set to 0.05, and GELU is used as the activation function.

The model is trained using the PyTorch 1.13.0 deep learning framework. The Adam optimizer is adopted, and the initial learning rate is set to 0.01. The learning rate is dynamically adjusted: when the validation loss does not decrease for several epochs, the learning rate is reduced to half of its current value. To avoid overfitting, an early stopping strategy is employed. Training is terminated when the validation loss does not improve for 40 consecutive epochs, and the model parameters corresponding to the minimum validation loss are restored. A patience of 40 was chosen to accommodate occasional plateau-and-jump dynamics observed in Transformer-class training under composite losses. In practice, the optimal validation loss was typically reached 15–25 epochs before patience was triggered, and parameters were rolled back to that optimum.

4.2.2. Performance Evaluation Metrics

To directly reflect the prediction performance, the average prediction error

E_{A}

, terminal prediction error

E_{F}

, and maximum prediction error

E_{M}

are introduced. In the Cartesian coordinate system, let the prediction duration be

t_{p}

. Define the instantaneous position error at time

t

as

ε (t)

, which is calculated as

ε (t) = \sqrt{{(\hat{x} (t) - x (t))}^{2} + {(\hat{y} (t) - y (t))}^{2} + {(\hat{z} (t) - z (t))}^{2}}

(31)

where

\hat{x} (t), \hat{y} (t), \hat{z} (t)

denote the predicted values in the

x, y, z

directions at time

t

, and

x (t), y (t), z (t)

denote the true values in the

x, y, z

directions at time

t

, respectively. Then, the evaluation metrics are calculated as follows:

E_{A} = \frac{\sum_{t = 1}^{t_{p}} ε (t)}{t_{p}}

(32)

E_{F} = ε (t_{p}) = \sqrt{{(\hat{x} (t_{p}) - x (t_{p}))}^{2} + {(\hat{y} (t_{p}) - y (t_{p}))}^{2} + {(\hat{z} - z (t_{p}))}^{2}}

(33)

E_{M} = \max_{1 \leq t \leq t_{p}} ε (t)

(34)

The average prediction error

E_{A}

reflects the overall deviation between the predicted trajectory and the true trajectory, the terminal prediction error

E_{F}

measures the accuracy at the end of the prediction interval, and the maximum prediction error

E_{M}

reflects the worst-case error during the entire prediction process. The size of the test set is sufficient to ensure stable mean estimates of the three metrics.

4.3. Error Analysis and Comparison

To verify the effectiveness of the proposed algorithm, this paper conducts both ablation experiments, comparative experiments with advanced algorithms and noise disturbance experiments. The ablation experiments are intended to analyze the effects of the physical-information modules and network structure on prediction performance, while the comparative experiments are used to evaluate the relative merits of the proposed algorithm against other intelligent prediction methods. The noise disturbance experiments are intended to evaluate the robustness of the proposed model under realistic engineering conditions. All experiments are conducted on the same test dataset, and the evaluation metrics defined in Section 4.2.2 are used to quantitatively assess the prediction performance.

4.3.1. Ablation Experiments

To verify the contributions of the physical-information modules and the network structure to prediction performance, Informer is selected as the baseline prediction model, and the prediction accuracies under different information inputs and before/after integrating the CNN network are compared. Specifically, the 3-dimensional control parameters, the 3-dimensional control parameters plus 6-dimensional state parameters, and the full 12-dimensional historical data are respectively used as inputs to predict the variations in the drag parameter

K_{D}

, bank angle

σ

, and lift parameter

K_{L}

. A trajectory from the test set is randomly selected, and the simulation results are shown below:

As can be seen from Figure 9, since the drag parameter

K_{D}

and lift parameter

K_{L}

are small in magnitude (approximately on the order of

10^{- 4}

), their true variation curves exhibit relatively severe fluctuations. By contrast, the parameter curves predicted by the Informer network are relatively smooth. This phenomenon is mainly attributed to two reasons. On the one hand, the attention mechanism of Informer tends to focus on low-frequency components and is insufficiently sensitive to high-frequency, small-amplitude fluctuations. On the other hand, MAE and MSE losses tend to drive the model output toward the mean trend while ignoring sharp local fluctuations. However, from the perspective of overall trend fitting, the CNN–Informer network, owing to the integration of the convolutional neural network structure, can extract local sequence features more effectively and fit the overall trend more closely to the true trajectory.

When predicting the variation in bank angle

σ

, it can be more intuitively seen that the CNN–Informer with intent-feature input performs significantly better than the other models, and can still achieve high-accuracy prediction even when the bank angle undergoes abrupt reversal. It should be noted that, due to the smoothing effect of the loss function, the predicted roll angle exhibits a slight lag (of approximately 2–3 time steps) during sharp reversals. This is a common characteristic of first-moment regression and does not affect the overall trend or the relative performance ranking compared to the baseline.

From the perspective of input dimensionality, compared with using only the 3-dimensional control parameters, adding the 6-dimensional state parameters does not significantly improve prediction accuracy. However, after introducing the intent features, the predicted curves become closer to the true values in both the overall trend and key change points, indicating that intent features play an important role in improving model prediction capability. Overall, compared with the other models, the CNN–Informer model incorporating intent features achieves obvious improvement in DBL control-parameter prediction, which also demonstrates the effectiveness of the proposed algorithm from the perspective of parameter prediction. The prediction performance is further analyzed below from the three-dimensional trajectory perspective:

As shown in Figure 10, in terms of the final three-dimensional trajectory prediction, CNN–-Informer–ip12 performs significantly better than the other three methods and is able to maintain a variation trend that is basically consistent with the true trajectory. Comparing the three methods based on the conventional Informer reveals that the prediction performance improves as the input information dimension increases: the prediction performance of ip12 is better than that of ip9, while ip3 performs the worst. The average evaluation metrics of the four methods on the 1500 test trajectories are listed in Table 1 for quantitative analysis.

As can be seen from Table 1, from the perspective of input dimensionality, as the input information is gradually expanded from control parameters only to state parameters and intent features, i.e., from ip3 to ip12, all the error metrics of the Informer model show a decreasing trend, indicating that fusing state information and intent features has a beneficial effect on improving prediction accuracy. Further analysis shows that under the ip12 setting, the average prediction error

E_{A}

of CNN–Informer is 7.561 km, which is reduced by 17.2% compared with the baseline Informer model. The terminal prediction error

E_{F}

and the maximum prediction error

E_{M}

are both reduced by about 20%, fully demonstrating the effectiveness of the convolutional structure in improving the fitting ability for local abrupt changes. Combining the qualitative and quantitative results of the ablation study, the CNN–Informer model integrating intent features and convolution structure achieves the best prediction performance among all the schemes. The ablation is organized at the semantic-group level to isolate the contribution of each information source. Per-feature ablation in correlated input spaces is susceptible to compensation effects and limited interpretive power; a more systematic feature-attribution analysis using model-agnostic methods is left for future work.

4.3.2. Comparative Experiments

To evaluate the performance of the proposed CNN–Informer model, this paper compares it with four representative deep learning methods that have been widely used in trajectory prediction or long-sequence forecasting tasks, namely SSD-LSTM [20], Transformer-based prediction [29], iTransformer [27], and DLinear [28].

To verify the effectiveness of the proposed algorithm, all five models are retrained, validated, and tested using the same 12-dimensional feature parameters as inputs. For fairness, all compared models were trained and evaluated on the same training, validation, and test splits. All baseline models are trained on the same 12-dimensional input feature space, adopt the core hyperparameters recommended in their respective original works, with the learning rate, batch size, and optimizer kept identical to those of the proposed model to ensure training-side comparability. The performance gaps reflect architectural differences in temporal modeling rather than disparities in input or training resources.

Because an early stopping mechanism is adopted, the number of training epochs differs among the models. As shown in Figure 11, compared with Transformer, iTransformer, DLinear, and SSD-LSTM, the CNN–Informer proposed in this paper exhibits better convergence characteristics. It reaches a lower training loss within fewer training epochs and maintains a lower validation loss throughout the training process.

The lower training loss indicates that the model architecture has stronger representation capability and can more effectively fit the latent mapping relationship between the input and output features. This is attributed to the synergistic effect of the convolutional front-end and the ProbSparse self-attention mechanism proposed in this paper: the convolutional front-end can extract local temporal patterns, while the ProbSparse self-attention mechanism is effective at capturing long-term dependency relationships. The lower validation loss indicates that the model has good generalization ability on unseen data, with limited signs of overfitting. Owing to global self-attention, Transformer can leverage all time steps from the first epoch, leading to faster initial loss reduction. CNN–Informer requires the convolutional front-end to first establish local pattern representations, hence a slower initial descent. The hierarchical architecture, however, ultimately yields a richer feature space, which is reflected in the superior steady-state validation loss.

Considering both the magnitude of the training/validation losses and the convergence speed, CNN–Informer exhibits smoother gradient flow and a faster loss-decrease rate, and finally obtains lower training and validation losses. This demonstrates that CNN–Informer has higher learning efficiency and stronger generalization capability than the other four models.

This paper includes a trajectory predicted using traditional function fitting methods as a physical baseline, which is compared with the trajectory predicted by the neural network described above, thereby demonstrating the improvement in predictive performance.

To intuitively demonstrate the prediction performance of the above models, one trajectory is randomly selected from the test set, as shown below:

The reported values are arithmetic means over 1500 independent test trajectories, providing a stable estimate of overall predictive performance. As shown in Figure 12, it can be directly seen from the three-dimensional trajectories that CNN–Informer achieves the highest prediction accuracy, followed by Transformer, DLinear, and iTransformer, whereas SSD-LSTM performs the worst. In addition, compared with the other models, the trajectory predicted by CNN–Informer is smoother and its trend is the closest to that of the true trajectory. In particular, during the maneuvering-turn phase of the vehicle, all the other four comparison methods exhibit obvious drift, while the trajectory predicted by CNN–Informer shows the smallest prediction error and is most consistent with the true trajectory variation, indicating its excellent adaptability to vehicle trajectory prediction. The average evaluation metrics of the 1500 test trajectories are listed in Table 2.

As shown in Table 2 and Figure 13, The maximum error

E_{M}

is close to the terminal error

E_{F}

, which is the expected behavior of the predict-and-integrate approach: error grows nearly monotonically along the horizon, so the worst-case point typically occurs near the terminal time. This pattern also indicates that the error growth is gradual rather than driven by abrupt mid-horizon mispredictions.

An analysis of the prediction performance based on the physical baseline reveals that, in this complex flight scenario, the function fitting method is largely ineffective, with an

E_{A}

of 36.161 km, a significant deviation from the neural network method. This is because the function fitting method is unable to capture the complex variations in the DBL parameters, resulting in a substantial prediction error. CNN–Informer obtains the minimum errors on all three metrics. Compared with SSD-LSTM, the improvement even exceeds 50%. Even compared with the second-best model, Transformer, the metrics are improved significantly: the average prediction error

E_{A}

is reduced by 18.2%, and the terminal prediction error

E_{F}

is reduced by 18.8%. These significant improvements demonstrate that the proposed algorithm achieves good prediction performance when handling complex vehicle maneuvers. In addition, the difference between

E_{F}

and

E_{A}

reveals the sensitivity of each model to error accumulation. For example, for SSD-LSTM,

E_{F}

is nearly three times

E_{A}

, indicating severe error propagation over the prediction horizon. By contrast, CNN–Informer exhibits the smallest relative gap (

E_{F} / E_{A} = 2.66

), indicating that it effectively alleviates temporal error accumulation. This is consistent with the qualitative observations in Figure 12: even when the vehicle performs maneuvering turns, CNN–Informer still maintains high trajectory fidelity.

4.3.3. Noise Disturbance Experiments

To evaluate the robustness of the proposed model under realistic engineering conditions, two complementary perturbation experiments are designed. The “Destination Perturbation Experiment” injects relative noise into the intended destination coordinates, which is used to examine the sensitivity of the prediction framework to uncertainty in the destination prior—an inevitable issue in practical applications where the true destination must be inferred from limited observations. The “State Observation Noise Experiment” injects observation noise directly into the 6-dimensional state inputs that serve as the integration starting point, reflecting the sensitivity of the end-to-end predictor to sensor measurement errors. Perturbations are introduced only in the testing stage, while the training data are kept clean, corresponding to the engineering-relevant “train–test distribution-shift” robustness test.

Destination Perturbation Experiment

In the Destination Perturbation Experiment, Gaussian perturbations with relative amplitude

σ_{r e l}

are added to the intended destination coordinates used in computing the intent features. The scanning interval is set as

σ_{r e l} \in [0, 1000]

to fully observe the tolerance boundary. The results show that with

σ_{r e l} \leq 1.0

, the model exhibits a complete robustness plateau, with

E_{A}

degradation below

0.1 %

relative to the noise-free baseline. When

σ_{r e l}

enters the extreme range of 10~1000,

E_{A}

no longer increases linearly but approaches saturation, with an upper bound of approximately 18 km. This indicates that even when the destination prior is severely biased, the prediction error remains bounded rather than diverging, demonstrating that the proposed framework has strong tolerance to inaccurate destination information. The results are shown in Figure 14.

State Observation Noise Experiment

In the State Observation Noise Experiment, relative observation noise is injected into the state components in the normalized space, with

σ_{r} \in [0, 0.02]

. After de-normalization, this interval roughly corresponds to position accuracy of 0~500 m and velocity accuracy of 0~30 m/s, covering typical operating conditions from high-precision cooperative-target localization to medium- and long-range early-warning radar tracking. The results are summarized in Table 3.

As

σ_{r}

increases from 0 to 0.02,

E_{A}

degrades smoothly from 7.66 km to 12.62 km, with a relative increase of 64.8%, while

E_{F}

and

E_{M}

show much smaller relative increases of only about 12%. The degradation curves of all three metrics are monotonic and approximately linear, without inflection points or abrupt jumps, and the prediction success rate is maintained at 100% throughout the scan, indicating that the integration-based predictor remains numerically stable and does not exponentially amplify the contaminated initial value.

A further inspection of the

E_{F} / E_{A}

ratio reveals a slow decrease from 2.62 at the noise-free baseline to 1.78 at

σ_{r} = 0.02

. This suggests that within the engineering-relevant range, the total error remains dominated by the model-prediction accumulation term along the prediction horizon, while the contaminated initial state has not yet become the major source of error. This is consistent with the theoretical expectation of integration-based trajectory prediction and indirectly confirms that the proposed model does not exhibit abnormal amplification of initial-value errors under moderate observation perturbations.

Comprehensive Analysis

The two experiments jointly demonstrate the robustness of the proposed model from complementary perspectives. The Destination Perturbation Experiment shows that the model is tolerant to inaccuracies in the destination prior, with the prediction error saturating at approximately 18 km even under extreme contamination, which alleviates the practical concern that destination uncertainty might lead to unbounded error growth. The State Observation Noise Experiment shows that within the engineering-relevant observation-noise range, the error degradation is gradual, controllable, and monotonic, with the prediction success rate remaining at 100%. Neither type of perturbation triggers nonlinear amplification or numerical divergence, indicating that the proposed framework can deliver stable and predictable prediction performance under both destination-prior uncertainty and sensor observation errors, thereby satisfying the practical noise-adaptability requirement of long-horizon trajectory prediction tasks.

4.4. Inference Efficiency and Real-Time Analysis

The following is an analysis of the computational efficiency and real-time performance of the algorithm described in this paper. Inference time benchmark results are shown in the Table 4 below:

The benchmark results demonstrate that the proposed method achieves sub-millisecond per-step inference and millisecond-level full-trajectory generation on a desktop-class GPU. Specifically, the per-step inference takes only 30.5 μs, and predicting the entire 150-step future trajectory costs only 4.58 ms, yielding a real-time margin of

η = t_{t r a j} / t_{r e a l} \approx 3.1 \times 10^{- 5}

, which is far below the engineering threshold

η < 1

. This implies that within each 1 s sampling cycle, the method can in principle perform about

3.3 \times 10^{4}

complete rolling predictions. Even when migrated to airborne embedded platforms, a complete trajectory inference can still be finished within 50–250 ms, comfortably meeting the second-level real-time requirement of trajectory prediction. Under batched inference with batch size = 64, the throughput reaches 6801 trajectories/s, allowing a single GPU to handle parallel trajectory prediction for thousands of targets—a capability well suited to ground-based multi-target situational awareness. Moreover, the standard deviation of single-trajectory inference time is only ±0.72 ms, indicating low latency jitter and high stability with strong engineering predictability. These results are consistent with the theoretical complexity advantage of the Informer encoder, which reduces self-attention cost from

O (L^{2})

to

O (L \log L)

via ProbSparse attention while halving the sequence length at each distillation layer, thereby confirming from an engineering perspective that the proposed approach offers both high accuracy and strong real-time deployability.

5. Discussion

While the proposed framework demonstrates clear advantages over the baseline methods, several limitations should be acknowledged in order to delineate the scope of validity of the present results and to motivate future work.

(1): The trajectory library used in this study contains 658 offline-optimized base trajectories, which are then expanded into approximately 24,700 sliding-window samples for sequence learning. Although the proposed CNN–Informer contains roughly $1.5 \times 10^{7}$ trainable parameters, the close consistency observed between the validation and test losses, together with the use of early stopping, dropout, and sliding-window data augmentation, indicates that severe trajectory-level memorization is unlikely. Nevertheless, we acknowledge that the sample-to-parameter ratio remains modest and that training on substantially larger and more diverse datasets is an important next step. Moreover, the initial-condition ranges adopted in this work are bounded by the connected feasible set of the underlying trajectory-optimization problem and do not span the full CDUAV operational envelope. The conclusions should therefore be interpreted strictly within this training distribution, and envelope-extension validation is left as future work. In addition, all experiments are conducted on a single CDUAV aerodynamic configuration; transferability to vehicles with different lift-to-drag ratios or different platform classes requires further empirical validation, which we plan to undertake in subsequent studies.
(2): The DBL control-parameter construction relies on nominal values of the zero-lift drag coefficient $C_{D_{0}}$ and the induced-drag coefficient $K$ . In real flight, aerodynamic coefficients are inevitably uncertain, and such uncertainty may propagate into the predictor through the control-affine transformation. A natural and important extension is to explicitly inject parametric noise into $C_{D_{0}}$ and $K$ during training, so as to obtain a robustness profile of the predictor with respect to ±20% coefficient perturbations; this will be investigated in future work. Similarly, the no-fly zones are modeled as infinitely tall vertical cylinders, which is a reasonable simplification for the unified high-altitude gliding band considered here, but is not directly applicable to low- and medium-altitude UAV operations involving altitude-dependent or finite-height restrictions. Generalization to finite-height polyhedral or irregularly shaped restricted zones requires extending the geometric projection used to compute the tangential avoidance distance and is part of planned future work.
(3): The current model is trained with symmetric regression losses, consistent with the standard prediction-accuracy formulation in the trajectory-forecasting literature. Such symmetric losses treat all directions of error equally and are appropriate for the open-loop prediction setting addressed in this paper. However, when the predictor is to be embedded in a closed-loop collision-avoidance pipeline, the cost of an error that brings the predicted trajectory closer to a no-fly zone is no longer symmetric to that of an error in the opposite direction. In such safety-critical contexts, asymmetric or no-fly zone-aware loss formulations should be preferred, and we identify this as an explicit future research direction. Furthermore, the present model produces deterministic point predictions and does not quantify predictive uncertainty. Extending the framework to output probabilistic distributions—e.g., through deep ensembles, Monte-Carlo dropout, or distributional decoders—would enable principled confidence intervals for long-horizon forecasts and is a key component of our follow-up work.
(4): Two further assumptions limit the operational realism of the present study. First, the no-fly zone configuration is assumed to be known a priori as batch information, which is consistent with the offline prediction setting addressed here but does not directly accommodate scenarios in which restricted zones are discovered sequentially during flight. Extending the framework to streaming no-fly zone discovery and incremental intent-feature update is a natural next step. Second, this paper focuses primarily on the methodological core of trajectory prediction and does not address operational integration with broader autonomous navigation and airspace-management systems. In practice, the predicted trajectories and intent features produced by the proposed framework could naturally feed into UTM-style architectures, supporting downstream functions such as conflict detection, sector-load forecasting, and multi-vehicle coordination. A more systematic investigation of such integration—including interface definitions, latency budgets, and cooperative-prediction protocols—is left for future work.

In summary, although the present study was validated on a representative CDUAV scenario, the proposed framework is not intended as a fully platform-agnostic solution. The limitations identified above—covering data scale and envelope, aerodynamic and geometric assumptions, loss design and uncertainty quantification, and operational integration—jointly define a coherent roadmap for extending the framework toward broader UAV trajectory-prediction tasks in constrained airspace, autonomous navigation, and mission-aware aerial behavior forecasting.

6. Conclusions

To address the problem of high-accuracy trajectory prediction for unmanned aerial vehicles in complex constrained environments, this paper proposed an intent-aware CNN–Informer framework and validated it on a representative cross-domain unmanned aerial vehicle (CDUAV) scenario. The proposed method systematically improves existing data-driven trajectory prediction approaches from three aspects, namely physically interpretable control representation, continuous intent-feature fusion, and coordinated modeling of local and global temporal dependencies. The main conclusions are summarized as follows.

(1): Based on control-affine system theory, a DBL control-parameter system was constructed by decoupling the lift–drag coefficients from the bank angle, and the nonlinear motion equations of the vehicle were transformed into a control-affine form. This parameter system reflects the drag acceleration and potential lift-acceleration capability generated by a unit-mass vehicle under unit dynamic pressure, thereby simplifying the mapping from hidden control effects to state evolution. Pearson correlation analysis further verified the rationality of the proposed decoupling strategy.
(2): By analyzing the relative geometry among the vehicle, no-fly zones, and intended destinations, three continuous intent features—tangential no-fly zone avoidance distance, heading error angle, and relative closing velocity—were constructed. These features transform destination-oriented behavior and constrained-region avoidance requirements into a quantitative form that can be directly incorporated into the deep learning model. The ablation results confirmed that intent features contribute positively to prediction accuracy, particularly during maneuvering phases involving large heading adjustment and no-fly zone avoidance.
(3): A CNN–Informer hybrid deep learning architecture was developed to combine local maneuver-pattern extraction with long-range temporal dependency modeling. By integrating state variables, DBL control parameters, and intent-aware features, the proposed framework achieved the best performance among all compared methods. On the constructed dataset, the proposed model reduced the average prediction error by 17.2% compared with Informer and also achieved clear improvements in terminal and maximum prediction errors. Comparative experiments with SSD-LSTM, Transformer, iTransformer, and DLinear further demonstrated the superiority of the proposed framework in handling complex maneuvering scenarios.

Several issues deserve further investigation. First, the extraction of intent features currently depends on prior knowledge of no-fly zone locations and intended destinations, whereas in practical applications such prior information may be uncertain. Second, the trajectory dataset used in this paper was generated offline based on sequential convex optimization, and more realistic environmental interactions and online guidance effects remain to be incorporated. Third, the current model produces deterministic predictions and does not explicitly characterize uncertainty. Future work will therefore consider probabilistic intent inference, uncertainty-aware trajectory forecasting, and broader validation in more realistic UAV operating scenarios.

Overall, although this study was validated on a CDUAV case, the proposed framework may also provide useful methodological support for broader UAV trajectory prediction tasks in constrained airspace, autonomous navigation, and mission-aware aerial behavior forecasting, which is consistent with the interests of the drones research community.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L. and Y.H.; validation, C.Z., L.S. and Y.L.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L. and X.W.; data curation, Y.L. and Y.H.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and Y.H.; visualization, Y.L.; supervision, Y.L.; project administration, J.Y. and L.S.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62173339.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, F.; Lu, L.; Zhang, Z.; Xie, Y.; Chen, J. Intelligent Trajectory Prediction Algorithm for Hypersonic Vehicle Based on Sparse Associative Structure Model. Drones 2024, 8, 505. [Google Scholar] [CrossRef]
Li, J.; Feng, X.; He, Y.; Shao, L. A Coverage-Based Cooperative Detection Method for CDUAV: Insights from Prediction Error Pipeline Modeling. Drones 2025, 9, 397. [Google Scholar] [CrossRef]
Yeo, H.; Seo, S.H.; Kim, C.; Kim, K.H.; Park, H.; Kim, J.G. Development of a rapid analysis program for the prediction of aerothermodynamics in high-speed vehicles. Aerosp. Sci. Technol. 2025, 164, 110415. [Google Scholar] [CrossRef]
Jieqing, C.; Ruisheng, S.; Yu, L. Cooperative game penetration guidance for multiple hypersonic vehicles under safety critical framework. Chin. J. Aeronaut. 2024, 37, 247–255. [Google Scholar]
Boretti, A. Hydrogen hypersonic combined cycle propulsion: Advancements, challenges, and applications. Int. J. Hydrogen Energy 2023, 55, 394–399. [Google Scholar] [CrossRef]
Arani, A.H.; Fernando, X.; Alhussein, O.; Zhu, Y. Deep Reinforcement Learning for Resource Sharing and UAV Trajectory Optimization in Multi-Operator UAV-Assisted Wireless Networks. IEEE Trans. Veh. Technol. 2026, 1–16. [Google Scholar] [CrossRef]
Luo, Y.; Wang, J.; Jiang, J.; Liang, H. Reentry trajectory planning for hypersonic vehicles via an improved sequential convex programming method. Aerosp. Sci. Technol. 2024, 149, 109130. [Google Scholar] [CrossRef]
Li, J.; He, Y.; Shao, L.; Feng, X. Reentry glide vehicle trajectory prediction method via multidimensional intention fusion. Aerosp. Sci. Technol. 2025, 159, 109960. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Zheng, W. Adaptively tracking hypersonic gliding vehicles. Aerosp. Sci. Technol. 2024, 147, 109035. [Google Scholar] [CrossRef]
Cai, Y.; Zhuang, X. Hypersonic glide vehicle trajectory prediction based on frequency enhanced channel attention and light sampling-oriented MLP network. Def. Technol. 2025, 46, 199–212. [Google Scholar] [CrossRef]
Wang, Z.; Grant, M.J. Constrained Trajectory Optimization for Planetary Entry via Sequential Convex Programming. J. Guid. Control. Dyn. 2017, 40, 2603–2615. [Google Scholar] [CrossRef]
Zhang, H.; Li, K.; Liang, Y.; Liu, S. Adaptive Target Tracking Method for Hypersonic Gliding Vehicle’s Glide Phase. IFAC-PapersOnLine 2025, 59, 2266–2271. [Google Scholar] [CrossRef]
Huang, J.; Zhang, H.; Tang, G.; Bao, W. Robust UKF-based filtering for tracking a maneuvering hypersonic glide vehicle. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2021, 236, 2162–2178. [Google Scholar] [CrossRef]
Arulampalam, M.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
Deng, M.; Li, S.; Jiang, X.; Li, X. Vehicle Trajectory Prediction Method Based on “Current” Statistical Model and Cubature Kalman Filter. Electronics 2023, 12, 2464. [Google Scholar] [CrossRef]
Cebeira, A.A.; Vicente, M.A. Adaptive IMM-UKF for Airborne Tracking. Aerospace 2023, 10, 698. [Google Scholar] [CrossRef]
Li, X.R.; Jilkov, V. Survey of maneuvering target tracking. Part V: Multiple-model methods. IEEE Trans. Aerosp. Electron. Syst. 2005, 41, 1255–1321. [Google Scholar] [CrossRef]
Canolla, A.; Jamoom, M.B.; Pervan, B. Interactive multiple model sensor analysis for Unmanned Aircraft Systems (UAS) Detect and Avoid (DAA). In Proceedings of the 2018 IEEE/ION Position, Location and Navigation Symposium (PLANS), Monterey, CA, USA, 23–26 April 2018; IEEE: New York, NY, USA, 2018; pp. 757–766. [Google Scholar]
Tian, W.; Fang, L.; Li, W.; Ni, N.; Wang, R.; Hu, C.; Liu, H.; Luo, W. Deep-Learning-Based Multiple Model Tracking Method for Targets with Complex Maneuvering Motion. Remote. Sens. 2022, 14, 3276. [Google Scholar] [CrossRef]
Sun, W.; He, Y.; Barlow, J.; Ren, C.; Tam, C.Y.; Yuan, C.; Fung, J.C.H.; Ng, E. Categorical evaluation of methods for estimating aerodynamic parameters for vertical wind speed profiles over built-up areas: A systematic review. J. Wind. Eng. Ind. Aerodyn. 2026, 274, 106457. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Zhang, Y.; Zhenyun, S.; Hongbing, J.; Zhenzhen, S. Online multi-target intelligent tracking using a deep long-short term memory network. Chin. J. Aeronaut. 2023, 36, 313–329. [Google Scholar] [CrossRef]
Zeng, W.; Quan, Z.; Zhao, Z.; Xie, C.; Lu, X. A Deep Learning Approach for Aircraft Trajectory Prediction in Terminal Airspace. IEEE Access 2020, 8, 151250–151266. [Google Scholar] [CrossRef]
Razaq, A.; Shah, B.; Khan, G.; Alfandi, O.; Ullah, A.; Halim, Z.; Rahman, A.U. Improving paraphrase generation using supervised neural-based statistical machine translation framework. Neural Comput. Appl. 2023, 37, 7705–7719. [Google Scholar] [CrossRef]
Lixun, H.; Cunqian, F.; Xiaowei, H.; Sisan, H.; Xuguang, X. Ballistic target recognition based on multiple data representations and deep-learning algorithms. Chin. J. Aeronaut. 2024, 37, 167–181. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
Rahali, A.; Akhloufi, M.A. End-to-End Transformer-Based Models in Textual-Based NLP. AI 2023, 4, 54–110. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, H.; Xian, J.; Mei, X.; Zhang, K.; Zhang, Q.; Wang, F. A Multihead ProbSparse Self-Attention Mechanism-Based High-Precision and High-Robustness Reconstruction Model for Missing Ocean Data. IEEE Sens. J. 2025, 25, 13374–13385. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Huang, X.; Feng, X. FEDAF: Frequency enhanced decomposed attention free transformer for long time series forecasting. Neural Comput. Appl. 2024, 36, 16271–16288. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
Zhang, Z.; Chang, Y.; Dong, Q.; Wang, L. Trajectory-Guided Driving Behavior Prediction for Autonomous Driving. IEEE Trans. Intell. Veh. 2024, 10, 4546–4556. [Google Scholar] [CrossRef]
Zhang, K.; Feng, X.; Wu, L.; He, Z. Trajectory Prediction for Autonomous Driving Using Spatial-Temporal Graph Attention Transformer. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22343–22353. [Google Scholar] [CrossRef]
Xue, H.; Wang, S.; Xia, M.; Guo, S. G-Trans: A hierarchical approach to vessel trajectory prediction with GRU-based transformer. Ocean Eng. 2024, 300, 117431. [Google Scholar] [CrossRef]
Ren, J.; Wu, X.; Liu, Y.; Ni, F.; Bo, Y.; Jiang, C. Long-Term Trajectory Prediction of Hypersonic Glide Vehicle Based on Physics-Informed Transformer. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 9551–9561. [Google Scholar] [CrossRef]
Li, J.; Guo, J.; Tang, S. Trajectory prediction based on multi-model and multi-intent fusion for hypersonic gliding targets. J. Astronaut. 2024, 45, 167–180. [Google Scholar]
Wang, X.; Jing, Z.; Tuo, H.; Leung, H. Air Target Intention Recognition via Bidirectional Long Short-Term Memory Networks and Hierarchical Maneuver Feature Extraction. J. Aerosp. Inf. Syst. 2025, 22, 842–852. [Google Scholar] [CrossRef]
Zhang, J.; Xiong, J.; Li, L.; Xi, Q.; Chen, X.; Li, F. Motion State Recognition and Trajectory Prediction of Hypersonic Glide Vehicle Based on Deep Learning. IEEE Access 2022, 10, 21095–21108. [Google Scholar] [CrossRef]
Xiao, Z.; Yuan, S.; Xu, G.; Zeng, X.; Hu, H.; He, J.; Yang, Y. AV-DTEC: Self-Supervised Audio–Visual Fusion for Drone 3-D Trajectory Estimation and Classification. IEEE Sens. J. 2026, 26, 15912–15924. [Google Scholar] [CrossRef]
Li, T.; Zhang, Z.; Zhu, M.; Cui, Z.; Wei, D. Combining transformer global and local feature extraction for object detection. Complex Intell. Syst. 2024, 10, 4897–4920. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of intent features.

Figure 2. All flight trajectories.

Figure 3. Representative flight trajectory: (a) three-dimensional view of the trajectory; (b) two-dimensional view of the trajectory.

Figure 4. Curve showing changes in the features of the tracking section.

Figure 5. Analysis of feature correlations.

Figure 6. Encoder structure.

Figure 7. Decoder structure.

Figure 8. Algorithm framework.

Figure 9. DBL parameter prediction curve.

Figure 10. Predicted trajectories for different inputs.

Figure 11. Comparison of model loss curves.

Figure 12. Predicted trajectories of compared models.

Figure 13.

E_{A}

growth curve of compared models.

Figure 13.

E_{A}

growth curve of compared models.

Figure 14. Prediction error in the Destination Perturbation Experiment.

Table 1. Evaluation metrics for different inputs.

Input	$E_{A}$ (km)	$E_{F}$ (km)	$E_{M}$ (km)
Informer–ip3	10.289	28.403	28.466
Informer–ip9	9.879	26.675	26.913
Informer–ip12	9.155	25.075	25.091
CNN–Informer–ip12	7.561	20.116	20.209

Table 2. Evaluation metrics of compared models.

Models	$E_{A}$ (km)	$E_{F}$ (km)	$E_{M}$ (km)
SSD-LSTM	15.222	43.368	43.406
Transformer	9.249	24.742	24.796
iTransformer	10.028	27.406	27.672
DLinear	10.412	29.637	29.640
CNN–Informer	7.562	20.088	20.182
Physics baseline	36.161	138.312	138.356

Table 3. Prediction error in the State Observation Noise Experiment.

$σ_{r}$	$E_{A}$ (km)	$E_{F}$ (km)	$E_{M}$ (km)
0.000	7.562	20.088	20.182
0.005	8.343	20.284	20.398
0.010	9.559	20.757	21.011
0.015	11.020	21.506	21.992
0.020	12.624	22.487	23.197

Table 4. Inference time benchmark results.

Metric	Value	Remarks
Per-step inference time	0.0305 ± 0.0048 ms	Batch size = 1
Single-trajectory inference time	4.58 ± 0.72 ms	N = 150
Throughput	6801.35 (trajectories/s)	Batch size = 64
Real-time margin	3.1 × 10⁻⁵	$Δ t$ = 1.0 s/step, $t_{r e a l}$ = 150.0 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhou, C.; Shao, L.; He, Y.; Wang, X.; Ye, J. Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments. Drones 2026, 10, 444. https://doi.org/10.3390/drones10060444

AMA Style

Liu Y, Zhou C, Shao L, He Y, Wang X, Ye J. Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments. Drones. 2026; 10(6):444. https://doi.org/10.3390/drones10060444

Chicago/Turabian Style

Liu, Yichen, Chijun Zhou, Lei Shao, Yangchao He, Xueqian Wang, and Jikun Ye. 2026. "Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments" Drones 10, no. 6: 444. https://doi.org/10.3390/drones10060444

APA Style

Liu, Y., Zhou, C., Shao, L., He, Y., Wang, X., & Ye, J. (2026). Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments. Drones, 10(6), 444. https://doi.org/10.3390/drones10060444

Article Menu

Intent-Aware CNN–Informer for Long-Horizon Trajectory Prediction of Cross-Domain Unmanned Aerial Vehicles in Constrained Environments

Highlights

Abstract

1. Introduction

2. Motion Modeling and Feature Analysis of the CDUAV

2.1. Optimized Trajectory Generation

2.2. Selection of Control Parameters

2.3. Extraction of Intent Features

3. Deep Learning-Based Trajectory Prediction Mechanism for CDUAV

3.1. Construction of the Trajectory Dataset

3.2. Feature Analysis

3.3. CNN–Informer Inference Mechanism Incorporating Vehicle Intent Features

3.3.1. Local Spatiotemporal Feature Enhancement Based on Convolutional Neural Networks

3.3.2. Residual Feature Fusion and High-Dimensional Space Embedding

3.3.3. Global Intent Encoder–Decoder Based on ProbSparse Self-Attention

4. Simulation Results and Validation

4.1. Construction of Training Samples

4.2. Model Training

4.2.1. Model Parameter Settings

4.2.2. Performance Evaluation Metrics

4.3. Error Analysis and Comparison

4.3.1. Ablation Experiments

4.3.2. Comparative Experiments

4.3.3. Noise Disturbance Experiments

Destination Perturbation Experiment

State Observation Noise Experiment

Comprehensive Analysis

4.4. Inference Efficiency and Real-Time Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI