Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network

Dai, Encheng; Cai, Guangbin; Fan, Yonghua; Xu, Hui; Wei, Hao; Li, Xin

doi:10.3390/aerospace13030283

Open AccessArticle

Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network

by

Encheng Dai

¹,

Guangbin Cai

^1,*

,

Yonghua Fan

²,

Hui Xu

¹,

Hao Wei

¹ and

Xin Li

¹

College of Missile Engineering, Rocket Force University of Engineering, Xi’an 710025, China

²

School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Aerospace 2026, 13(3), 283; https://doi.org/10.3390/aerospace13030283

Submission received: 4 February 2026 / Revised: 6 March 2026 / Accepted: 13 March 2026 / Published: 17 March 2026

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

Trajectory optimization of hypersonic vehicles face challenges from complex aerodynamic environments and multiple constraints, where traditional offline optimization methods struggle to meet real-time requirements. This study proposes a novel online trajectory optimization framework for hypersonic vehicles that integrates a multi-strategy improved whale optimization algorithm (MWOA) with an attention-mechanism Long Short-Term Memory (AM-LSTM) network. First, an offline trajectory dataset under aerodynamic uncertainties is generated using sequential second-order cone programming (SOCP). Subsequently, a multi-head attention mechanism is incorporated into the LSTM network to effectively capture sequential dependencies within the trajectory data. To automate the hyperparameter tuning of the AM-LSTM architecture, a multi-strategy improved whale optimization algorithm is developed, which incorporates circle chaotic mapping for population initialization, a nonlinear convergence factor to balance global and local search, and a dynamic golden-sine mutation strategy to enhance optimization robustness. The trained MWOA-AM-LSTM hybrid model is then employed for real-time trajectory generation. Numerical simulation results demonstrate that the proposed framework achieves superior terminal accuracy under aerodynamic perturbations, validating its effectiveness and robustness for hypersonic vehicle reentry trajectory optimization.

Keywords:

hypersonic vehicle; improved whale optimization algorithm; attention-LSTM network; real-time trajectory optimization; hyperparameter optimization

1. Introduction

Hypersonic vehicles represent a class of aircraft operating in the near-space region with flight velocity exceeding Mach 5, characterized by high maneuverability and extended operational range [1,2]. Due to the complex flight environment and numerous constraints, trajectory optimization is critical to ensuring the successful flight of hypersonic vehicles under multiple terminal and path constraints [3,4]. However, reference trajectories in engineering practice are typically generated through offline optimization methods, which struggle to accommodate diverse uncertainties during flight operations. Consequently, real-time reentry trajectory optimization remains a significant technical challenge.

Trajectory optimization is fundamentally a nonlinear optimal control problem, primarily addressed through indirect and direct methods. The indirect method, leveraging Pontryagin’s Minimum Principle, transforms the problem into a Hamiltonian boundary value problem, thereby offering theoretical guarantees of global optimality [5,6,7]. In contrast, the direct method discretizes the continuous problem into a nonlinear programming (NLP) formulation, which is then solved numerically. This approach has been successfully demonstrated in diverse applications, including atmospheric reentry [8,9], rocket landing [10,11], and planetary missions [12,13,14]. Despite their respective strengths, both methods face certain challenges in trajectory optimization. The indirect method grapples with problem size that grows exponentially under complex constraints, while the direct method often incurs prohibitive computational costs for large-scale NLPs.

In recent years, with the advancement of computational capabilities, deep learning has demonstrated tremendous application potential in the field of trajectory optimization [15,16,17]. Leveraging the powerful nonlinear mapping capabilities of neural networks, researchers have effectively reduced online computational burdens through offline training on historical trajectory datasets, thereby providing a promising alternative for real-time trajectory optimization in hypersonic vehicles [18,19,20,21]. Work [22] developed a dual-phase hypersonic reentry planner, incorporating offline fuzzy-optimized trajectory planning and online DNN execution. This strategy achieves real-time trajectory execution while maintaining solution feasibility under complex flight constraints, demonstrating superior computational efficiency compared to conventional optimization-based methods. Work [23] proposed an offline-trained Deep Neural Network (DNN) controller for hypersonic flight that learns state-to-optimal-action mappings from homotopy-generated trajectory data, enabling real-time near-optimal control with stable convergence. Work [24] proposed a DNN-constrained trajectory generator coupled with adaptive reinforcement learning control for air-breathing hypersonic vehicles, which achieves precise tracking control while satisfying path constraints. Although the aforementioned strategies achieve real-time computation through offline training, the static feedforward architecture of its DNN struggles to effectively capture the inherent temporal dependencies in trajectory data.

In this article, Long Short-Term Memory (LSTM) networks combined with multi-head attention mechanisms are proposed as the core architecture for trajectory planning. As a representative variant of recurrent neural networks, LSTM demonstrates exceptional capability in capturing complex temporal features within sequential data, exhibiting significant potential for predictive applications [25,26]. This specialized recurrent structure has proven effective in modeling data with strong nonlinearities and temporal dependencies. Furthermore, the multi-head attention mechanism improves prediction accuracy of LSTM by adaptively weighting critical information in a sequence [27,28]. The outstanding predictive capabilities of LSTM networks integrated with multi-head attention mechanisms have been documented across multiple disciplines, including anomaly detection [29,30], medical diagnosis [31,32], hydrology and water resources [33,34], and civil engineering applications [35,36].

Despite the proven efficacy of LSTM and multi-head attention mechanisms in temporal modeling, their hyperparameter configuration is still limited by a heavy reliance on empirical knowledge and inefficient tuning processes [37,38]. Such manual tuning paradigms fail to unlock the full potential of neural networks, proving inadequate for the rigorous demands of complex trajectory optimization tasks. While multiple approaches exist for hyperparameter optimization, the adoption of swarm intelligence algorithms represents a particularly promising solution [39,40]. Their principal advantage stems from replacing traditional gradient calculations with probabilistic search to find the global optimum solution in high-dimensional spaces.

This paper presents an online trajectory planning framework for hypersonic vehicles based on a multi-strategy improved whale optimization algorithm and an attention-enhanced Long Short-Term Memory (MWOA–AM-LSTM) model. The framework is designed to enable real-time onboard trajectory generation in complex reentry aerodynamic environments by learning an expert state–command mapping from offline solutions, while maintaining comparable solution quality with substantially reduced online computational cost. Specifically, the main contributions are:

(1): Online learning-based trajectory generation guided by an offline expert database. We propose an integrated MWOA–AM-LSTM framework for hypersonic vehicle reentry trajectory planning, where sequential second-order cone programming (SOCP) is used offline to generate a reference trajectory–command dataset under bounded aerodynamic uncertainties. The AM-LSTM is trained in a supervised manner to approximate the expert state–command mapping—i.e., to infer the next-step bank-angle command from a short history of flight states—thereby enabling real-time online rollout with comparable performance to the SOCP-generated references. The resulting trajectory is propagated via numerical integration under admissible control bounds, allowing constraint-related quantities to be monitored during online execution and improving practical robustness in disturbed aerodynamic conditions.
(2): Automated and robust hyperparameter tuning for AM-LSTM via an improved WOA. To avoid manual and empirical hyperparameter selection, we develop a multi-strategy improved whale optimization algorithm to automatically tune the AM-LSTM architecture. By incorporating circle chaotic mapping for diversified initialization, a nonlinear convergence factor for balancing exploration and exploitation, and a dynamic golden-sine mutation strategy to mitigate premature convergence, the proposed MWOA enhances the efficiency and robustness of hyperparameter search in high-dimensional spaces, thereby improving the reliability of the learned mapping for real-time deployment.

The article is structured as follows: Section 2 formulates the reentry trajectory optimization problem for hypersonic vehicles. Section 3 details a novel trajectory planning methodology based on the MWOA-AM-LSTM framework. Section 4 analyzes and evaluates simulation results. Section 5 provides concluding remarks.

2. Entry Problem Formulation

2.1. Trajectory Dynamics

This subsection presents the normalized dynamics for hypersonic vehicles through a dimensionless dynamics. By employing an energy-like variable e as the state variable and rigorously incorporating Earth’s rotational dynamics, the derived dimensionless trajectory dynamics are expressed as follows [19]:

\{\begin{matrix} \frac{d r}{d e} = \frac{sin γ}{D} \\ \frac{d θ}{d e} = \frac{cos γ sin ψ}{r D cos ϕ} \\ \frac{d ϕ}{d e} = \frac{cos γ cos ψ}{r D} \\ \frac{d γ}{d e} = \frac{1}{V^{2} D} (L cos σ + (V^{2} - \frac{1}{r}) \frac{cos γ}{r} + V Ω_{1}) \\ \frac{d ψ}{d e} = \frac{1}{V^{2} D} (\frac{L sin σ}{cos γ} + \frac{V^{2} cos γ sin ψ tan ϕ}{r} + V Ω_{2}) \end{matrix}

(1)

where

\begin{matrix} Ω_{1} = & [2 V Ω sin ψ cos ϕ + Ω^{2} r {cos}^{2} ϕ cos γ \\ + Ω^{2} r cos ϕ sin ϕ cos ψ sin γ] / V \end{matrix}

(2)

\begin{matrix} Ω_{2} = & [- 2 V Ω (cos ψ sin γ cos ϕ - cos γ sin ϕ) \\ + Ω^{2} r cos ϕ sin ϕ sin ψ] / V cos γ \end{matrix}

(3)

where r represents the normalized radial distance from Earth’s center to the hypersonic vehicle, scaled by Earth’s reference radius

R_{0}

.

θ

and

ϕ

correspond to geographic longitude and latitude;

γ

signifies the flight path angle, while

ψ

designates the heading angle.

σ

is the bank angle. The dimensionless velocity V is scaled by

\sqrt{g_{0} R_{0}}

and the dimensionless Earth rotation rate

Ω

is scaled by

\sqrt{g_{0} / R_{0}}

.

g_{0}

is the gravitational acceleration at

R_{0}

, where

R_{0} = 6371

km. Dimensionless Lift acceleration L and dimensionless drag acceleration D are characterized by expressions detailed as follows:

\{\begin{matrix} L = 0.5 R_{0} ρ V^{2} S C_{L} / m \\ D = 0.5 R_{0} ρ V^{2} S C_{D} / m \end{matrix}

(4)

where

C_{L}

is the lift coefficient and

C_{D}

is drag coefficient. S, m denote the reference area and mass, respectively.

ρ

is atmospheric density, which is usually approximated as a local exponential function of altitude. Following [3], its expression is given by

ρ = ρ_{0} exp (- \frac{r - 1}{H})

(5)

where H is dimensionless density scale altitude,

H = H_{s} / R_{0}

.

H_{s} = 7110

m is the density scale height and treated as a constant.

The dimensionless energy in Equation (1) is formulated as:

e = \frac{1}{r} - \frac{V^{2}}{2} \approx 1 - \frac{V^{2}}{2}

(6)

Convert it to the derivative form with respect to dimensionless time

τ

:

\frac{d e}{d τ} = D V > 0, τ = \frac{t}{\sqrt{R_{0} / g_{0}}}

(7)

For reentry trajectory optimization, the state and control vectors are formally defined as follows:

\{\begin{matrix} x = {[r (e), θ (e), ϕ (e), γ (e), ψ (e)]}^{T} \\ u = {[σ (e)]}^{T} \end{matrix}

(8)

2.2. Reentry Constraints

Hypersonic vehicles are subjected to both terminal and path constraints throughout the reentry phase to ensure structural integrity and mission success.

Initial state constrain

x (e_{0})

and terminal state constrain

x (e_{f})

are expressed as:

\{\begin{matrix} x (e_{0}) = {[r_{0}, θ_{0}, ϕ_{0}, γ_{0}, ψ_{0}]}^{T} \\ x (e_{f}) = {[r_{f}, θ_{f}, ϕ_{f}, γ_{f}, ψ_{f}]}^{T} \end{matrix}

(9)

Heating rate

\dot{Q}

, dynamic pressure q, and normal load n represent common path constraints, formulated as follows:

\{\begin{matrix} \dot{Q} & = k_{Q} \sqrt{ρ} {(\sqrt{g_{0} R_{0}} V)}^{3.15} \leq {\dot{Q}}_{\max} \\ q & = \frac{1}{2} ρ {(\sqrt{g_{0} R_{0}} V)}^{2} \leq q_{\max} \\ n & = \sqrt{L^{2} + D^{2}} \leq n_{\max} \end{matrix}

(10)

where

k_{Q} = 9.4369 \times 10^{- 5}

.

The control variable is chosen as

σ

which should be limited as follows:

σ_{\min} \leq σ \leq σ_{\max}

(11)

The angle-of-attack profile is prescribed as a velocity-dependent piecewise linear function which is given as follows [21]:

α (V) = \{\begin{matrix} α_{\max}, & V_{1} < V \leq V_{0}, \\ \frac{α_{g} - α_{\max}}{V_{2} - V_{1}} (V - V_{1}) + α_{\max}, & V_{2} < V \leq V_{1}, \\ α_{g}, & V_{f} < V \leq V_{2}, \end{matrix}

(12)

where

V_{0}

and

V_{f}

denote the entry and terminal velocities, respectively;

α_{\max}

is the maximum angle of attack;

α_{g}

is the angle of attack corresponding to the maximum lift-to-drag ratio; and

V_{1}

and

V_{2}

are predefined velocity thresholds.

The dynamics Equation (1) can be expressed as an equality constraint:

\frac{d x}{d e} = f (x (e), u (e), e), e \in [e_{0}, e_{f}]

(13)

2.3. Trajectory Optimization Problem

The performance index for the optimization problem is selected as the minimum time:

J = \int_{τ_{0}}^{τ_{f}} 1 d τ = \int_{e_{0}}^{e_{f}} \frac{1}{D V} d e

(14)

Combining the dimensionless dynamics Equation (1), reentry constraints Equations (9)–(13), and performance index Equation (14), the trajectory optimization problem is formally expressed as a nonlinear optimal control formulation:

\begin{matrix} P_{0} : \min J = \int_{e_{0}}^{e_{f}} \frac{1}{D V} d e \\ \begin{matrix} s . t . & \{\begin{matrix} x (e_{0}) = x_{0} \\ x (e_{f}) = x_{f} \\ \dot{Q} = k_{Q} \sqrt{ρ} {(\sqrt{g_{0} R_{0}} V)}^{3.15} \leq {\dot{Q}}_{\max} \\ q = 0.5 ρ {(\sqrt{g_{0} R_{0}} V)}^{2} \leq q_{\max} \\ n = \sqrt{L^{2} + D^{2}} \leq n_{\max} \\ σ_{\min} \leq σ \leq σ_{\max} \\ \frac{d x}{d e} = f (x (e), u (e), e), e \in [e_{0}, e_{f}] \end{matrix} \end{matrix} \end{matrix}

(15)

The optimal control problem predominantly features nonconvex constraints, precluding direct solution via sequential SOCP. Consequently, convexification and linearization treatments are applied to relevant constraint functions, thereby transforming the problem into a computationally tractable convex program.

Following the convexification and discretization procedures, the final convex programming problem

P_{f}

is obtained. The specific process can be found in Appendix A.

\begin{matrix} P_{f} : \min \begin{matrix}  \end{matrix} R^{T} z \\ \begin{matrix} s . t . & \{\begin{matrix} M Z = F \\ |x_{i} - x_{i}^{k}| ⩽ δ \\ u_{1}^{2} + u_{2}^{2} ⩽ 1 \\ \cos σ_{\max} ⩽ {(u_{1})}_{i} ⩽ cos σ_{\min} \\ R_{i} ⩾ \bar{L} \\ |θ (e_{f}) - θ_{f}| ⩽ ς, |ϕ (e_{f}) - ϕ_{f}| ⩽ η \\ r (e_{f}) = r_{f}, γ (e_{f}) = γ_{f}, ψ (e_{f}) = ψ_{f} \end{matrix} \end{matrix} \end{matrix}

(16)

3. Online Trajectory Planning Framework Based on MWOA-AM-LSTM Network

3.1. Principles of Whale Optimization Algorithm

WOA constitutes a metaheuristic optimization methodology rooted in swarm intelligence principles, modeled after the distinctive bubble-net predation tactics of humpback whales. The algorithm is designed to balance global exploration and local exploitation in complex optimization through the emulation of three distinct behavioral patterns exhibited by whales: prey encircling, bubble-net foraging, and random search.

3.1.1. Encircling Prey

It has been demonstrated that humpback whales possess the cognitive ability to identify the whereabouts of their prey and subsequently encircle it. Since the optimal position in the search space remains unknown, the WOA algorithm thus presumes the current best candidate solution’s location as the target prey position. Following the determination of the target prey position, the remaining search agents coordinate encirclement maneuvers through positional updates governed by the following mathematical formulation:

\{\begin{matrix} X_{m} (t + 1) = X_{m}^{*} (t) - A_{m} \times D_{m} \\ A_{m} = 2 a \times r d - a \\ D_{m} = |C_{m} \times X_{c}^{*} (t) - X_{c} (t)| \\ C_{m} = 2 \times r d \end{matrix}

(17)

where t is defined as the current number of iterations,

A_{m}

and

C_{m}

represents the coefficient vectors,

X_{m}

is the current position of the solution,

X_{m}^{*}

is the location of the current optimal solution. The control coefficient

a

undergoes linear reduction from 2 to 0. rd is a random number from 0 to 1.

3.1.2. Bubble-Net Attacking Method

The bubble-net attacking strategy of humpback whales is computationally modeled through two synergistic mechanisms: prey encircling via shrinking encircling mechanism and spiral trajectory updating for position refinement. The Shrinking encircling mechanism is controlled through the coefficient

a

. As the coefficient A is bounded within

[- a, a]

, its fluctuation amplitude contracts proportionally with the reduction of a. The spiral update position mechanism employs a spiral function to connect the whale and its prey, replicating the helical bubble-net feeding behavior of a humpback whale. Its formulation is as follows:

\{\begin{matrix} X_{m} (t + 1) = X_{m}^{*} (t) + D_{m}^{*} \times e^{b l} \times cos (2 π l) \\ D_{m}^{*} = |X_{m}^{*} (t) - X_{m} (t)| \end{matrix}

(18)

where

D_{m}^{*}

quantifies the distance between the i-th search agent and the incumbent optimal solution. Parameter b determines the curvature topology of the logarithmic spiral trajectory, while l represents a random variable uniformly distributed over

[- 1, 1]

.

Assuming that the two mechanisms are executed with equal probability, the position update operator is written in the following piecewise form to account for both the shrink-encircling and spiral-feeding behaviors:

X_{m} (t + 1) = \{\begin{matrix} X_{m}^{*} (t) - A_{m} \times D_{m}, & if p < 0.5 \\ X_{m}^{*} (t) + cos (2 π l) \times e^{b l} \times D_{m}^{*}, & if p \geq 0.5 \end{matrix}

(19)

where p is a random number in the interval

[0, 1]

.

3.1.3. Search for Prey

Beyond the bubble-net attacking strategy, humpback whales also perform random prey exploration within the algorithm. Again, this is accomplished by changing the value of A. In instances where the absolute value of A exceeds 1, the whale will deviate from the trajectory of the target prey. Distinct from the Bubble-net attacking phase, this mode designates a random selected position vector as the update reference instead of the current global optimum. This exploration-oriented phase demonstrably augments the global search capacity, as formalized:

\{\begin{matrix} X_{m} (t + 1) = X_{r a n d} - A_{m} \times D_{m} \\ D_{m} = | C_{m} \times X_{r a n d} - X_{m} (t) | \end{matrix}

(20)

where

x_{r a n d}

is the location of a random individual in the whale population.

3.2. Multi-Strategy Improved WOA

The Whale Optimization Algorithm (WOA) is an efficient swarm-intelligence method that has demonstrated strong potential for complex optimization problems due to its concise mathematical formulation and robust global search capability. However, the canonical WOA still faces inherent limitations when tackling high-dimensional, nonlinear, multimodal, and dynamic optimization tasks. These limitations typically manifest as reduced population diversity, limited convergence accuracy, and a tendency to become trapped in local optima. Such issues are particularly pronounced in LSTM hyperparameter optimization within deep-learning frameworks, where WOA can be highly sensitive to parameter settings and may suffer from low search efficiency. To address these challenges, we propose a multi-strategy Improved WOA (IWOA) to enhance optimization performance. The pseudo-code is provided in Algorithm 1.

3.2.1. Circle Chaotic Mapping

The conventional WOA employs random initialization for the purpose of generating populations. However, this stochastic initialization strategy may result in an uneven distribution of search agents across the solution space, thereby compromising the algorithm’s global search space coverage capability.To address this limitation, we propose enhancing the population initialization strategy through the application of a Circle chaotic map. The mathematical formulation of the Circle chaotic mapping is defined as:

X^{i + 1} = \mod (X^{i} + 0.2 - (\frac{1}{4 π}) sin (2 π X^{i}), 1)

(21)

where

X^{i}

,

X^{i + 1}

are denoted as the current individual as well as the subsequently generated individuals.

Algorithm 1 Multi-Strategy Improved Whale Optimization Algorithm (MWOA)

Require: Objective function f, dimension D, population size N, maximum iterations

T_{\max}

Require: Lower bound

lb

, upper bound

ub

, damping factor b

Ensure: Optimal solution

X_{best}

1: Initialization Phase:

2: Generate initial population using Circle chaotic map:

3:

ξ_{i}^{(k)} \leftarrow (ξ_{i}^{(k - 1)} + 0.2 - \frac{0.5}{2 π} sin (2 π ξ_{i}^{(k - 1)})) \mod 1

4:

X_{i} \leftarrow lb + ξ_{i} ⊙ (ub - lb)

5: ▹ Map to parameter space

6:

X_{best} \leftarrow \arg \min f (X_{i})

7: for

t = 1

to

T_{\max}

do

8: Update coefficient a

9: for each individual

X_{i}

do

10:

r_{1}, r_{2} \sim U (0, 1), l \sim U (- 1, 1), p \sim U (0, 1)

11: if

p < 0.5

then

12:

A \leftarrow 2 a r_{1} - a

13: if

| A | < 1

then

14: Update position using shrinking encircling by Equation (17)

15: else

16: Randomly select individual

X_{rand}

17: Update position using searching for prey by Equation (20)

18: end if

19: else

20: Update position using spiral update by Equation (18)

21: end if

22: if

rand () < 0.3 + 0.5 \cdot (t / T_{\max})

then

23: Perform golden sine mutation by Equations (24)–(26)

24: end if

25: if

f (X_{new}) < f (X_{i})

then

26:

X_{i} \leftarrow X_{new}

27: if

f (X_{new}) < f (X_{best})

then

28:

X_{best} \leftarrow X_{new}

29: end if

30: end if

31: end for

32: end for

3.2.2. Nonlinear Decay Factor

In the conventional WOA, the linear decay pattern of the convergence factor fails to effectively reconcile the algorithm’s distinct phase-dependent requirements: intensive global exploration during initial iterations and refined local exploitation in later stages. To address the limitations inherent in the linear decay mechanism of the convergence factor, this study proposes a dynamically adjusted non-linear decay factor based on an exponential function. The function is as follows:

a = 2 - \frac{2}{1 + \exp (- 10 b (\frac{2 t}{T_{\max}} - 1))}

(22)

where t represent the iteration index and

T_{m a x}

defining the terminal iteration count. The image of this function is shown in Figure 1.

3.2.3. Dynamic Gold Sine Mutation Strategy

The WOA algorithm adopts a fixed spiral update mechanism during the local development stage. This makes it susceptible to falling into local optima and lacks an effective escape mechanism. Additionally, the general mutation strategy uses a fixed probability, making it challenging for the algorithm to balance global exploration and local development during later iterations. To overcome this defect, this paper introduces the dynamic golden sine mutation strategy. This strategy is based on a dynamic probabilistic perturbation triggering mechanism within an iterative process. It fuses the characteristics of the golden ratio and the sine function, thereby enhancing the algorithm’s searching ability at different stages. It also adaptively adjusts the searching step size and direction, balancing the algorithm’s global exploration and local exploitation abilities.

The steps are as follows. After the spiral update mechanism, a dynamic probabilistic perturbation mechanism is added to determine whether the golden sinusoidal mutation is performed. The expression for the mutation probability decaying over time is as follows:

p (t) = 0.3 + 0.5 t / T_{\max}

(23)

The golden sinusoidal variation mechanism first generates two sets of coefficients based on the golden ratio. These coefficients are then used to regulate the contraction and expansion of the sinusoidal function parameters, respectively. This process is expressed mathematically as follows:

\{\begin{matrix} ζ_{1} = a \cdot τ + (1 - τ) \cdot b \\ ζ_{2} = (1 - τ) \cdot a + τ \cdot b \end{matrix}

(24)

where golden section ratio

τ = (\sqrt{5} - 1) / 2

, a and b are the lower and upper bounds of the search space, respectively. Subsequently, the current solution is perturbed using a sinusoidal function, which is combined with the global optimal solution to adjust the current individual position:

\begin{matrix} X_{mut} (t + 1) = & - r_{2} \cdot sin (r_{1}) \cdot (ζ_{1} \cdot X^{*} - ζ_{2} \cdot X (t)) \\ + X^{*} \cdot | sin (r_{1}) | \end{matrix}

(25)

where,

r_{1} \in [0, 2 π]

and

r_{2} \in [0, π]

.

After the spiral update with the golden sine mutation mechanism, a reflection boundary treatment needs to be introduced to ensure that the newly generated solution

X_{m u t} (t + 1)

satisfies the predefined feasible domain constraints. The expression is as follows:

X_{mut}^{i} = \{\begin{matrix} 2 \cdot {lb}^{i} - X_{mut}^{i}, & if X_{mut}^{i} < {lb}^{i} \\ 2 \cdot {ub}^{i} - X_{mut}^{i}, & if X_{mut}^{i} > {ub}^{i} \end{matrix}

(26)

3.3. Principle of LSTM

LSTM represents a specialized deep learning architecture for processing sequential data, distinguished from conventional RNNs by three gating mechanisms: the input gate, forget gate, and output gate. These gates, coupled with a cell state that supersedes traditional hidden units, enable LSTM to model long-range temporal dependencies effectively.

As illustrated in Figure 2, the core components operate as follows: The cell state

c_{t}

functions as the central memory conduit, preserving information across sequential timesteps. At each timestep, a candidate cell state

{\tilde{c}}_{t}

is generated via tanh activation, representing potential updates to the memory. Concurrently, three gating units regulate information flow: The forget gate

f_{t}

modulates retention of historical cell state

c_{t - 1}

, the input gate

i_{t}

controls assimilation of candidate state

{\tilde{c}}_{t}

, and the output gate

o_{t}

gates the current cell state to yield hidden state

h_{t}

, which transmits temporal dependencies as the external output. The calculation process can be expressed as follows:

\{\begin{matrix} h_{t} = o_{t} ⊙ \tanh (c_{t}) \\ c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t} \\ {\tilde{c}}_{t} = \tanh (B_{c} + ω_{c} [v_{t - 1}, x_{t}]) \\ f_{t} = σ (B_{f} + ω_{f} [v_{t - 1}, x_{t}]) \\ i_{t} = σ (B_{i} + ω_{i} [v_{t - 1}, x_{t}]) \\ o_{t} = σ (B_{o} + ω_{o} [v_{t - 1}, x_{t}]) \end{matrix}

(27)

where

ω

denotes trainable weight matrices, B represents corresponding bias terms, and

σ

signifies the sigmoid activation function. These parameters collectively govern the gating mechanisms and state transformations.

3.4. Principle of Multi-Head Attention Mechanisms

Multi-head attention employs parallelized attention mechanisms to project input data into orthogonal feature subspaces, enabling selective feature weighting for critical information enhancement. The calculation process is shown as follows:

(1): Input projection. The LSTM output sequence X through three linear layers:

Q = X W^{Q}, K = X W^{K}, V = X W^{V}

(28)

(2): Subspace projection. Each matrix is partitioned into h heads and independently projected:

Q_{i} = Q W_{i}^{Q}, K_{i} = K W_{i}^{K}, V_{i} = V W_{i}^{V}

(29)

(3): Parallel attention computation. Each head calculates the attention weights and weights the value vectors, using softmax to normalize and generate an attention weight matrix:

{head}_{i} = softmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}

(30)

(4): Output fusion. Heads are concatenated and linearly transformed:

\begin{matrix} MutiHead & = Concat ({head}_{1}, {head}_{2}, \dots, {head}_{h}) W^{O} \end{matrix}

(31)

Through the above process, the input sequence is mapped into h independent subspaces, where each head learns distinct feature patterns. The outputs of all heads are then concatenated and integrated via linear transformation, forming a global representation that captures complex dependencies between time steps, significantly enhancing the modeling capability for complex sequential relationships.

3.5. Reentry Trajectory Planning Method

While convex optimization provides theoretical solutions to trajectory optimization problems, its practical implementation is limited by computational burden. Convex optimization is employed to generate offline datasets supplying training samples for neural networks. Subsequently, the approximation capabilities of neural networks for complex nonlinear functions are leveraged to construct models mapping vehicle state variables to control commands. This study consequently proposes a hybrid framework enabling real-time trajectory planning and online command acquisition for hypersonic vehicles. The overall design scheme is shown in Figure 3.

3.5.1. Offline Dataset Generation

Deviations from expected aerodynamic parameter values occur during the entry of hypersonic vehicles into the atmosphere, which may lead to discrepancies between the actual flight path and the predefined ideal path. The deviation in aerodynamics can be modeled using a Gaussian distribution, and according to the

3 σ

rule of the Gaussian distribution, the deviation range of aerodynamic parameters (including drag and lift) is set to

\pm 20 %

. The drag deviation coefficient and lift deviation coefficient can be represented as

ξ_{D}

and

ξ_{L}

, respectively. Based on this deviation model, 500 different sets of drag and lift coefficients are generated for further analysis. The drag coefficient and lift coefficient can be expressed respectively as

(ξ_{D} + 1) C_{D}

,

(ξ_{L} + 1) C_{L}

. Subsequently, the cvx solver was used to solve this problem and 500 trajectories with aerodynamic uncertainties were obtained. The convergence condition can be satisfied after seven iterations, and the entire SOCP solution process requires 18.779 s.

3.5.2. Offline Network Training

The proposed hybrid architecture integrates multi-head attention mechanisms with LSTM networks to enhance sequential data modeling. While LSTM’s gated structure and cell state design provide inherent advantages in gradient stability and long-term dependency capture, its performance remains highly sensitive to hyperparameter configurations. Critical parameters include: Hidden layer size (determining model capacity and feature extraction), Learning rate (controlling gradient descent convergence), and number of attention heads (controlling multi-scale feature weighting). The MWOA algorithm has been developed for the purpose of optimizing the hyperparameter. The resulting hybrid model MWOA-AM-LSTM, has the capacity to be utilized for the purpose of predicting time series data.

The overall training procedure of the proposed MWOA–AM-LSTM model is summarized in Algorithm 2.

Algorithm 2 MWOA–AM-LSTM model training process

Require: Trajectory dataset

D

with aerodynamic uncertainties; MWOA configuration; AM-LSTM model structure.

Ensure: Trained MWOA–AM-LSTM hybrid model; evaluation metrics (MSE, MAE).

1: Step 1: Data preprocess.

2: Split

D

into training set

D_{train}

and test set

D_{test}

.

3: Apply min–max normalization to all features for dimensional consistency.

4: Step 2: Hyperparameter optimization.

5: Use MWOA to optimize the AM-LSTM hyperparameters: learning rate, number of hidden neurons, and number of attention heads.

6: Define the objective function as the model loss on a validation subset of

D_{train}

.

7: Obtain the optimal hyperparameter set and configure the AM-LSTM network.

8: Step 3: Model integration.

9: Combine the optimized AM-LSTM and the MWOA-based tuning strategy to form the MWOA–AM-LSTM hybrid model.

10: Train the optimized AM-LSTM on

D_{train}

to obtain the final deployed model.

11: Step 4: Control command prediction.

12: Given historical flight states

(h, θ, ϕ, γ, ψ)

, predict the future bank-angle command

σ

.

13: Step 5: Performance validation.

14: Compare predicted and reference/measured quantities on

D_{test}

.

15: Quantify model performance using MSE and MAE.

3.5.3. Online Trajectory Planning

The Runge–Kutta method is employed to simulate online trajectory planning. Commencing from the initial state, the trained neural network predicts control commands for the next time step. These commands are then numerically integrated to obtain subsequent flight states. Iteratively repeating this process enables rapid generation of the complete flight trajectory.

4. Numerical Simulation and Analysis

To comprehensively evaluate the performance of the proposed MWOA and its MWOA-driven AM-LSTM prediction framework, while validating the efficacy of the online trajectory planning methodology, the following simulation algorithms are performed: (1) Benchmark function performance comparison between MWOA and mainstream swarm intelligence algorithms; (2) Neural network prediction performance assessment; (3) Online trajectory planning verification under aerodynamic uncertainty. All numerical simulations are performed on a PC with Intel(R) Core(TM) i5-13490F CPU @ 2.50 GHz.

4.1. Testing and Analysis of MWOA Algorithm

This section evaluates the performance of the MWOA algorithm by comparing it with prominent algorithms that have gained popularity in recent years, which are listed below: GOOSE Algorithm (GO), Whale Optimization Algorithm (WOA), Harris Hawks Optimization Algorithm (HHO), Black-winged Kite Algorithm (BKA), Particle Swarm Optimization (PSO). The selected test functions include Sphere, Rastrigin, Ackley, Griewank, Zakharov, Levy, Rothyp, and Easom. Detailed mathematical definitions of each function can be found on this website: Optimization Test Functions (http://www.sfu.ca/~ssurjano/optimization.html (accessed on 12 March 2026)). The diagrams of these eight benchmark functions are presented in the Figure 4. These functions enable a comprehensive evaluation of the search and optimization capabilities of algorithms under diverse problem landscapes.

The comparative performance of MWOA against several swarm intelligence algorithms is summarized in Table 1, with all algorithms configured with a population size of 50 and a maximum of 1000 iterations, evaluated over 30 independent runs. The superior competitiveness and stability of MWOA are demonstrated by the results across all benchmark functions. The algorithm achieves perfect convergence on the Sphere, Zakharov, and Griewank functions, characterized by zero residual error. It also exhibits exceptional stability, registering a standard deviation of 0.00 across 30 runs on the highly multimodal Rastrigin and Griewank functions. On the Ackley function, MWOA attains a mean value of

2.48 \times 10^{- 16}

with zero standard deviation, outperforming the GO algorithm. The convergence behavior illustrated in Figure 5 further confirms these performance advantages. On unimodal functions such as Sphere, MWOA descends rapidly, approaching the theoretical optimum within 300 iterations. When applied to multimodal landscapes including Rastrigin and Ackley, the algorithm quickly escapes early oscillations and converges to extremely low error levels, as exemplified by its performance on Rastrigin where it reaches approximately

10^{- 10}

by 500 iterations. This behavior reflects a superior capacity to avoid local optima. The trend of high-precision convergence persists on complex functions such as Rothyp and Easom, where MWOA achieves significantly lower final fitness values and converges to the global optimum more rapidly and stably than all competing algorithms.

4.2. Neural Network Prediction Performance Experiment

Thissection verifies the effectiveness of the MWOA-AM-LSTM hybrid model by achieving adaptive optimization of AM-LSTM hyperparameters, aiming to enhance the prediction accuracy of bank angle

σ

.

The simulation object selected is the generic hypersonic aircraft CAV-H, with the initial conditions configured as specified in Table 2. To balance model capacity and computational burden, the hidden-layer neuron number is treated as an optimization variable and searched within [10, 200]. This interval is selected to span from under-parameterized to sufficiently expressive networks. Subsequently, the trajectory dataset is generated following the methodology detailed in Section 3.5.1. The dataset with the aerodynamic deviation consists of 500 trajectories, divided according to a fixed scale, with the training set containing 400 trajectories and the test set retaining 100 trajectories. Each trajectory consists of 201 discrete sampling points, each representing a six-dimensional feature vector containing the following flight states: altitude, longitude, latitude, flight path angle, heading angle and bank angle. The robustness of the neural network to biases caused by aerodynamic uncertainties is enhanced by offline network training.

Figure 5. Convergence curves of the algorithms.

Subsequently, a comparative analysis was conducted using hybrid models including MWOA-AM-LSTM, WOA-AM-LSTM, HHO-AM-LSTM, AM-LSTM, and the baseline LSTM model. The parameter boundaries for each model are detailed in Table 3, while Table 4 presents performance comparisons across these architectures. The comparative evaluation reveals distinct performance characteristics across hybrid architectures. MWOA-AM-LSTM exhibits exceptional training accuracy, achieving the lowest RMSE of 0.0535 and highest R² of 0.9964 while maintaining competitive test performance with an RMSE of 0.0641 and R² of 0.9956. Its MAE values of 0.0241 for training and 0.0261 for testing represent the most balanced error distribution among hybrid models. The AM-LSTM baseline substantially outperforms conventional LSTM, reducing training RMSE by 7.3% and testing RMSE by 4.1%, thereby validating the efficacy of the attention mechanism. These superior metrics are clearly reflected in the bank angle predictions visualized in Figure 6, which demonstrates that the MWOA-AM-LSTM model achieves superior trajectory fitting.

4.3. Online Trajectory Planning Performance Verification

In this section, the the classical fourth-order Runge–Kutta method is employed to simulate the online trajectory optimization process. The trained neural network generates bank angle commands in real-time within 7 milliseconds. Subsequently, these commands are numerically integrated using the Runge–Kutta method to rapidly construct the complete flight trajectory. Figure 7 depicts the computational efficiency distribution from 1000 Monte Carlo simulations, validating the real-time processing efficacy of the MWOA-AM-LSTM framework.

Under nominal conditions, the comparative results of online trajectories generated by different neural networks and the reference trajectory are shown in Figure 8 and Figure 9. The MWOA-AM-LSTM, HHO-AM-LSTM, WOA-AM-LSTM, AM-LSTM, and baseline LSTM trajectories are denoted by distinct line styles with markers, while the reference trajectory is represented by the black solid line. Figure 8 demonstrates that all network-generated trajectories exhibit near-consistency with the offline trajectory. However, terminal state errors vary significantly across methods. Detailed terminal error metrics are listed in Table 5, confirming that the MWOA-AM-LSTM model achieves the smallest terminal errors.

Under no-nominal conditions with aerodynamic parameter perturbations, five aerodynamic deviation cases are considered to evaluate robustness, where each case corresponds to a specific combination of lift and drag deviation coefficients. The terminal position errors between trajectories generated online by the MWOA-AM-LSTM model and reference trajectories are quantified in Table 6 and Figure 10, Figure 11 and Figure 12. To evaluate model robustness, multiple aerodynamic deviation combinations were tested. In the above scenarios, the relative errors of terminal altitude are lower than 4.52%, the relative errors of terminal longitude are lower than 1.45%, and the relative errors of terminal latitude are lower than 3.24%. This demonstrates that the proposed trajectory planning method can successfully complete online trajectory planning tasks under aerodynamic uncertainty while maintaining a certain level of accuracy and robustness.

5. Conclusions

This paper proposes an online trajectory planning framework for hypersonic vehicles based on a multi-strategy improved whale optimization algorithm and an attention-LSTM network. Considering aerodynamic uncertainties, trajectory samples are provided offline by the sequential second-order cone programming method. The hyperparameters of the AM-LSTM network, including the number of hidden layer neurons, the initial learning rate, and the number of multi-head attention heads, were first specified within predefined boundaries. Then, by minimizing the model loss function, the MWOA optimized these hyperparameters to their global optimum. Thus, the constructed MWOA-AM-LSTM model was able to generate optimal control commands online based on historical flight states and demonstrates outstanding generalization capabilities. Subsequently, it was used as a real-time trajectory generator for the hypersonic vehicle. Numerical simulations demonstrate remarkable performance of the proposed framework in computational efficiency and planning precision under both nominal and perturbated conditions.

In the future, we will investigate more complex reentry trajectory-planning scenarios and evaluate alternative deep learning backbones under the same training and onboard inference constraints.

Author Contributions

Conceptualization, E.D., G.C. and Y.F.; methodology, E.D. and G.C.; software, H.X.; validation, E.D. and H.X.; formal analysis, E.D. and G.C.; investigation, H.X., Y.F., H.W. and X.L.; resources, G.C.; data curation, E.D., H.X. and Y.F.; writing—original draft preparation, E.D.; writing—review and editing, G.C., H.X. and Y.F.; visualization, E.D., Y.F., H.W. and X.L.; supervision, G.C.; project administration, H.X.; funding acquisition, G.C. and H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Program of National Natural Science Foundation of China (grant number 62473374) and the National Natural Science Foundation of China for Young Scholars (grant number 62403487).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Convexification and Discretization of the Optimal Control Problem

Appendix A.1. Convexification of the Problem

The path constraints in the reentry constraint Equation (10) can be converted by equivalent transformation into the following linear inequality form:

\{\begin{matrix} R (e) ⩾ L_{Q} (e) = 1 + H ln \frac{k_{Q}^{2} ρ_{0} g_{0}^{3.15} R_{0}^{3.15} {(2 - 2 e)}^{3.15}}{{\dot{Q}}_{\max}^{2}} \\ R (e) ⩾ L_{q} (e) = 1 + H ln \frac{ρ_{0} g_{0} R_{0} (1 - e)}{q_{\max}} \\ R (e) ⩾ L_{n} (e) = 1 + H ln \frac{ρ_{0} R_{0} (1 - e) S \sqrt{{C_{L}}^{2} + {C_{D}}^{2}}}{n_{\max} m} \end{matrix}

(A1)

It can be rewritten as

R (e) \geq \bar{L} (e) = \max \{L_{Q}, L_{q}, L_{n}\}

(A2)

To avoid rendering the optimization problem unsolvable due to overly strict terminal constraints, the longitude and latitude constraints are replaced by introducing penalty terms in the performance metrics as

d (θ (e_{f}), ϕ (e_{f})) = c_{θ} ς + c_{ϕ} η

(A3)

where

c_{θ}

and

c_{ϕ}

are both constants, while

ς

and

η

are are two slack variables. The slack variables are defined as follow:

\{\begin{matrix} |θ (e_{f}) - θ_{f}| ⩽ ς \\ |ϕ (e_{f}) - ϕ_{f}| ⩽ η \end{matrix}

(A4)

Consequently, performance index defined in Equation (14) is reformulated as

\min J = \int_{e_{0}}^{e_{f}} (1 / D V) d e + d (θ (e_{f}), ϕ (e_{f}))

(A5)

The control variable is selected as the bank angle, and to avoid high-frequency oscillations, it is reformulated as follow:

u = {[u_{1}, u_{2}]}^{T} = {[\cos σ, sin σ]}^{T}

(A6)

Subsequently, Equation (1) is linearized for convexification.

\{x^{k} (e), u^{k} (e)\}

is the solution for the k-th iteration, which can be used to calculate the result of the next iteration. Ultimately, the following equation can be obtained:

\frac{d x}{d e} = A (x^{k}, e) x^{k} + c (x^{k}, e) + B (x^{k}, e) u^{k}

(A7)

where

A (x^{k}, e) = [\begin{matrix} - \frac{D_{r}^{k} sin γ^{k}}{{(D^{k})}^{2}} & 0 & 0 & a_{14} & 0 \\ a_{21} & 0 & a_{23} & a_{24} & \frac{cos γ^{k} cos ψ^{k}}{r^{k} D^{k} cos ϕ^{k}} \\ a_{31} & 0 & 0 & a_{34} & - \frac{cos γ^{k} sin ψ^{k}}{r^{k} D^{k}} \\ a_{41} & 0 & 0 & a_{44} & 0 \\ a_{51} & 0 & a_{53} & a_{54} & \frac{cos γ^{k} cos ψ^{k} tan ϕ^{k}}{r^{k} D^{k}} \end{matrix}]

(A8)

where

\begin{matrix} D_{r}^{k} = - \frac{R_{0} S ρ_{0} {(V^{k})}^{2} C_{D}^{k} e^{(1 - r^{k}) / H}}{2 m H}, V^{k} = \sqrt{2 (1 - e^{k})} \\ a_{14} = \frac{cos γ^{k}}{D^{k}}, a_{23} = \frac{cos γ^{k} sin ψ^{k} sin ϕ^{k}}{r^{k} D^{k} {cos}^{2} ϕ^{k}}, \\ a_{53} = \frac{cos γ^{k} sin ψ^{k}}{r^{k} D^{k} {cos}^{2} ϕ^{k}} a_{24} = - \frac{sin γ^{k} sin ψ^{k}}{r^{k} D^{k} cos ϕ^{k}}, \\ a_{34} = - \frac{sin γ^{k} cos ψ^{k}}{r^{k} D^{k}}, a_{54} = - \frac{sin γ^{k} sin ψ^{k} tan ϕ^{k}}{r^{k} D^{k}} \\ a_{44} = \frac{sin γ^{k}}{D^{k} r^{k}} (\frac{1}{r^{k} {(V^{k})}^{2}} - 1) \\ a_{21} = - (\frac{D_{r}^{k}}{r^{k} {(D^{k})}^{2}} + \frac{1}{{(r^{k})}^{2} D^{k}}) \frac{cos γ^{k} sin ψ^{k}}{cos ϕ^{k}} \\ a_{31} = - (\frac{D_{r}^{k}}{r^{k} {(D^{k})}^{2}} + \frac{1}{{(r^{k})}^{2} D^{k}}) cos γ^{k} cos ψ^{k} \\ a_{41} = \frac{D_{r}^{k} cos γ^{k}}{r^{k} {(D^{k})}^{2}} (\frac{1}{r^{k} {(V^{k})}^{2}} - 1) + \frac{cos γ^{k}}{D^{k} {(r^{k})}^{2}} (\frac{2}{r^{k} {(V^{k})}^{2}} - 1) \\ a_{51} = - (\frac{D_{r}^{k}}{r^{k} {(D^{k})}^{2}} + \frac{1}{{(r^{k})}^{2} D^{k}}) cos γ^{k} sin ψ^{k} tan ϕ^{k} \end{matrix}

(A9)

B (x, e) = [\begin{matrix} 0_{3 \times 1} & 0_{3 \times 1} \\ (L / D) / V^{2} & 0 \\ 0 & (L / D) / (V^{2} cos γ) \end{matrix}]

(A10)

To ensure the validity of successive linearization, it is necessary to add a trust region constraint expressed as

|x - x^{k}| \leq δ

(A11)

Appendix A.2. Discretization of the Problem

This article selects energy as the independent variable, discretizes the energy history

[e_{0}, e_{f}]

into N equal-interval sub-intervals, with a step size of

Δ e = (e_{f} - e_{0}) / N

. The discretized state and control sequences are

{x_{1}, x_{2}, \dots, x_{N}}

and

{u_{1}, u_{2}, \dots, u_{N}}

respectively. According to the step size, the differential expression of the state variable can be obtained as

H_{i - 1} x_{i - 1} - H_{i} x_{i} + G_{i - 1} u_{i - 1} + G_{i} u_{i} = - \frac{Δ e}{2} (c_{i - 1}^{k} + c_{i}^{k})

(A12)

where

H_{i - 1} = I + A_{i - 1}^{k} \frac{Δ e}{2}

,

H_{i} = I - A_{i}^{k} \frac{Δ e}{2}

,

G_{i - 1} = B_{i - 1}^{k} \frac{Δ e}{2}

,

G_{i} = B_{i}^{k} \frac{Δ e}{2}

, I is the unit matrix of the same dimension.

The complete optimization variable sequence consists of state variable sequence, control variable sequence, and two slack variables, expressed as

z = {[x_{0}^{T}, \dots, x_{N}^{T}, u_{0}^{T}, \dots, u_{N}^{T}, ς, η]}^{T}

(A13)

Through successive linearization and discretization, the state equation is transformed into:

M z = F

(A14)

where

M = [\begin{matrix} I & 0 & 0 & \dots & 0 & 0 \\ H_{0} & H_{1} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & H_{N - 1} & H_{N} \end{matrix} \begin{matrix} 0 & 0 & 0 & \dots & 0 & 0 & 0_{1 \times 2} \\ G_{0} & G_{1} & 0 & \dots & 0 & 0 & 0_{1 \times 2} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & G_{N - 1} & G_{N} & 0_{1 \times 2} \end{matrix}]

(A15)

F = - \frac{Δ e}{2} [\begin{matrix} - \frac{2}{Δ e} x_{0} \\ c_{0}^{\tilde{k}} + c_{1}^{\tilde{k}} \\ ⋮ \\ c_{N - 1}^{\tilde{k}} + c_{N}^{\tilde{k}} \end{matrix}]

(A16)

The performance index is discretized using the same method, expressed as

R^{T} z

. During each iteration, R remains a constant vector, ultimately yielding the convex programming problem

P_{f}

as follows:

\begin{matrix} P_{f} : \min R^{T} z \\ \begin{matrix} s . t . & \{\begin{matrix} M Z = F \\ |x_{i} - x_{i}^{k}| ⩽ δ \\ u_{1}^{2} + u_{2}^{2} ⩽ 1 \\ \cos σ_{\max} ⩽ {(u_{1})}_{i} ⩽ cos σ_{\min} \\ R_{i} ⩾ \bar{L} \\ |θ (e_{f}) - θ_{f}| ⩽ ς, |ϕ (e_{f}) - ϕ_{f}| ⩽ η \\ r (e_{f}) = r_{f}, γ (e_{f}) = γ_{f}, ψ (e_{f}) = ψ_{f} \end{matrix} \end{matrix} \end{matrix}

(A17)

References

Huang, A.; Yu, J.; Liu, Y.; Hua, Y.; Dong, X.; Ren, Z. Multitask-constrained reentry trajectory planning for hypersonic gliding vehicle. Aerosp. Sci. Technol. 2024, 155, 109636. [Google Scholar] [CrossRef]
Liu, X.; Li, X.; Zhang, H.; Huang, H.; Wu, Y. Entry Guidance for Hypersonic Glide Vehicles via Two-Phase hp-Adaptive Sequential Convex Programming. Aerospace 2025, 12, 539. [Google Scholar] [CrossRef]
Jorris, T.R.; Cobb, R.G. Three-dimensional trajectory optimization satisfying waypoint and no-fly zone constraints. J. Guid. Control Dyn. 2009, 32, 551–572. [Google Scholar] [CrossRef]
Xu, H.; Cai, G.; Mu, C.; Li, X.; Wei, H. Segmented predictor-corrector reentry guidance based on an analytical profile. J. Zhejiang-Univ.-Sci. A 2025, 26, 50–65. [Google Scholar] [CrossRef]
Grant, M.J.; Braun, R.D. Rapid indirect trajectory optimization for conceptual design of hypersonic missions. J. Spacecr. Rocket. 2015, 52, 177–182. [Google Scholar] [CrossRef]
Mall, K.; Taheri, E. Three-degree-of-freedom hypersonic reentry trajectory optimization using an advanced indirect method. J. Spacecr. Rocket. 2022, 59, 1463–1474. [Google Scholar] [CrossRef]
Saranathan, H.; Grant, M.J. Incorporation of rigid body dynamics into indirect hypersonic trajectory optimization. J. Spacecr. Rocket. 2024, 61, 393–406. [Google Scholar] [CrossRef]
Liu, X.; Shen, Z.; Lu, P. Entry trajectory optimization by second-order cone programming. J. Guid. Control Dyn. 2016, 39, 227–241. [Google Scholar] [CrossRef]
Luo, Y.; Wang, J.; Jiang, J.; Liang, H. Reentry trajectory planning for hypersonic vehicles via an improved sequential convex programming method. Aerosp. Sci. Technol. 2024, 149, 109130. [Google Scholar] [CrossRef]
Liu, X. Fuel-optimal rocket landing with aerodynamic controls. J. Guid. Control Dyn. 2019, 42, 65–77. [Google Scholar] [CrossRef]
Ma, Y.; Pan, B.; Tang, J. Reduced space sequential convex programming for rapid trajectory optimization. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 9060–9072. [Google Scholar] [CrossRef]
Wang, Z.; Grant, M.J. Constrained trajectory optimization for planetary entry via sequential convex programming. J. Guid. Control Dyn. 2017, 40, 2603–2615. [Google Scholar] [CrossRef]
Song, Z.-Y.; Wang, C.; Theil, S.; Seelbinder, D.; Sagliano, M.; Liu, X.-F.; Shao, Z.-J. Survey of autonomous guidance methods for powered planetary landing. Front. Inf. Technol. Electron. Eng. 2020, 21, 652–674. [Google Scholar] [CrossRef]
Cui, P.; Zhao, D.; Zhu, S. Obstacle avoidance guidance for planetary landing using convex trajectory and adaptive curvature regulation. Acta Astronaut. 2022, 199, 313–326. [Google Scholar] [CrossRef]
Bao, C.; Li, X.; Xu, W.; Tang, G.; Yao, W. Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones. Aerospace 2025, 12, 591. [Google Scholar] [CrossRef]
Deng, J.; Zhang, H.; Zhang, Y.; Hua, M.; Sun, Y. A Method for UAV Path Planning Based on G-MAPONet Reinforcement Learning. Drones 2025, 9, 871. [Google Scholar] [CrossRef]
Yang, T.; Shi, X.; Zeng, Q.; Yang, Y.; Xu, C.; Liu, H. Optimization methods in fully cooperative scenarios: A review of multiagent reinforcement learning. Front. Inf. Technol. Electron. Eng. 2025, 26, 479–509. [Google Scholar] [CrossRef]
Chai, R.; Tsourdos, A.; Savvaris, A.; Chai, S.; Xia, Y.; Chen, C.L.P. Six-DOF Spacecraft optimal trajectory planning and real-time attitude control: A deep neural network-based approach. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 5005–5013. [Google Scholar] [CrossRef]
Dai, P.; Feng, D.; Feng, W.; Cui, J.; Zhang, L. Entry trajectory optimization for hypersonic vehicles based on convex programming and neural network. Aerosp. Sci. Technol. 2023, 137, 108259. [Google Scholar] [CrossRef]
Tang, D.; Gong, S. Trajectory optimization of rocket recovery based on neural network and genetic algorithm. Adv. Space Res. 2023, 72, 3344–3356. [Google Scholar] [CrossRef]
Xu, H.; Cai, G.; Zhang, S.; Yang, X.; Hou, M. Hypersonic reentry trajectory optimization by using improved sparrow search algorithm and control parametrization method. Adv. Space Res. 2022, 69, 2512–2524. [Google Scholar] [CrossRef]
Chai, R.; Tsourdos, A.; Savvaris, A.; Xia, Y.; Chai, S. Real-time reentry trajectory planning of hypersonic vehicles: A two-step strategy incorporating fuzzy multiobjective transcription and deep neural network. IEEE Trans. Ind. Electron. 2019, 67, 6904–6915. [Google Scholar] [CrossRef]
Shi, Y.; Wang, Z. Onboard generation of optimal trajectories for hypersonic vehicles using deep learning. J. Spacecr. Rocket. 2021, 58, 400–414. [Google Scholar] [CrossRef]
Wang, G.; An, H.; Wang, Y.; Xia, H.; Ma, G. Intelligent control of air-breathing hypersonic vehicles subject to path and angle-of-attack constraints. Acta Astronaut. 2022, 198, 606–616. [Google Scholar] [CrossRef]
Tian, R.; Li, X.; Ma, Z.; Liu, Y.; Wang, J.; Wang, C. LDformer: A parallel neural network model for long-term power forecasting. Front. Inf. Technol. Electron. Eng. 2023, 24, 1287–1301. [Google Scholar] [CrossRef]
Presekal, A.; Stefanov, A.; Semertzis, I.; Palensky, P. Spatio-Temporal Advanced Persistent Threat Detection and Correlation for Cyber-Physical Power Systems Using Enhanced GC-LSTM. IEEE Trans. Smart Grid 2025, 16, 1654–1666. [Google Scholar] [CrossRef]
Qin, X.; Hu, W.; Xiao, C.; He, C.; Pei, S.; Zhang, X. Attention-based efficient robot grasp detection network. Front. Inf. Technol. Electron. Eng. 2023, 24, 1430–1444. [Google Scholar] [CrossRef]
Chen, X.; Yang, H.; Yu, C.; Yang, X. Optimised CNN-LSTM-Attention deep learning network model based on RIME algorithm for predicting HFSWR gravity wave parameters during Typhoon Muifa (2022). Adv. Space Res. 2025, 75, 4613–4639. [Google Scholar] [CrossRef]
Zhang, L.; Shi, Y.; Wang, D. A Real-Time Lightweight Perceptron for Cloud–Edge Collaborative Predictive Maintenance of Online Service Systems. IEEE Internet Things J. 2025, 12, 12640–12657. [Google Scholar] [CrossRef]
Kommanduri, R.; Ghorai, M. DAST-Net: Dense visual attention augmented spatio-temporal network for unsupervised video anomaly detection. Neurocomputing 2024, 579, 127444. [Google Scholar] [CrossRef]
Wu, Y.; Sun, X.; Spasojevic, I.; Kumar, V. Deep Learning for Optimization of Trajectories for Quadrotors. IEEE Robot. Autom. Lett. 2024, 9, 2479–2486. [Google Scholar] [CrossRef]
Rukhsar, S.; Tiwari, A.K. ARNN: Attentive recurrent neural network for multi-channel EEG signals to identify epileptic seizures. Neurocomputing 2025, 620, 129203. [Google Scholar] [CrossRef]
Tripathy, K.P.; Mishra, A.K. Deep learning in hydrology and water resources disciplines: Concepts, methods, applications, and research directions. J. Hydrol. 2024, 628, 130458. [Google Scholar] [CrossRef]
Zhou, R.; Zhang, Y.; Wang, Q.; Jin, A.; Shi, W. A hybrid self-adaptive DWT-WaveNet-LSTM deep learning architecture for karst spring forecasting. J. Hydrol. 2024, 634, 131128. [Google Scholar] [CrossRef]
Yu, S.; Zhang, Z.; Wang, S.; Huang, X.; Lei, Q. A performance-based hybrid deep learning model for predicting TBM advance rate using attention-ResNet-LSTM. J. Rock Mech. Geotech. Eng. 2024, 16, 65–80. [Google Scholar] [CrossRef]
Sun, J.; Fang, Y.; Luo, H.; Yao, Z.; Xiang, L.; Wang, J.; Wang, Y.; Jiang, Y. Hybrid deep learning approach for rock tunnel deformation prediction based on spatio-temporal patterns. Undergr. Space 2025, 20, 100–118. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An ensemble CNN-LSTM and GRU adaptive weighting model based improved sparrow search algorithm for predicting runoff using historical meteorological and runoff data as input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
Guo, Y.; Yang, D.; Zhang, Y.; Wang, L.; Wang, K. Online estimation of SOH for lithium-ion battery based on SSA-Elman neural network. Prot. Control Mod. Power Syst. 2022, 7, 1–17. [Google Scholar] [CrossRef]
Peng, S.; Zhu, J.; Wu, T.; Yuan, C.; Cang, J.; Zhang, K.; Pecht, M. Prediction of wind and PV power by fusing the multi-stage feature extraction and a PSO-BiLSTM model. Energy 2024, 298, 131345. [Google Scholar] [CrossRef]
Zhang, M.; Tan, S.; Zhang, C.; Chen, E. Machine learning in modelling the urban thermal field variance index and assessing the impacts of urban land expansion on seasonal thermal environment. Sustain. Cities Soc. 2024, 106, 105345. [Google Scholar] [CrossRef]

Figure 1. Nonlinear decay function.

Figure 2. Structure of LSTM network.

Figure 3. Overall scheme of reentry trajectory planning: (A) offline trajectory optimization and dataset generation; (B) real-time control framework based on the trained network; (C) overall workflow of the proposed method; (D) structure of the trained network with LSTM layers and multi-head attention; (E) generated three-dimensional reentry trajectory.

Figure 4. Diagrams of the benchmark functions.

Figure 6. Fitting results of different neural networks.

Figure 7. Histogram of Monte Carlo tests of processing time consumption.

Figure 8. Comparison of trajectories of different models under nominal conditions.

Figure 9. Comparison of flight-path angle and heading angle under nominal conditions: (a) comparison of flight-path angle profiles; (b) comparison of heading angle profiles.

Figure 10. Comparisons of altitude under different aerodynamic deviations generated by the MWOA-AM-LSTM model.

Figure 11. Comparisons of longitude under different aerodynamic deviations generated by the MWOA-AM-LSTM model.

Figure 12. Comparisons of latitude under different aerodynamic deviations generated by the MWOA-AM-LSTM model.

Table 1. Performance comparison of optimization algorithms on benchmark functions.

Function	Criteria	Algorithm
Function	Criteria	MWOA	GO	WOA	HHO	BKA	PSO
Sphere	Mean	0.00	0.00	$7.41 \times 10^{- 320}$	$1.90 \times 10^{- 266}$	$1.37 \times 10^{2}$	$1.10 \times 10^{0}$
Sphere	Std	0.00	$3.35 \times 10^{- 154}$	0.00	0.00	$1.61 \times 10^{1}$	$6.09 \times 10^{- 1}$
Rastrigin	Mean	0.00	0.00	$8.50 \times 10^{0}$	0.00	$3.37 \times 10^{2}$	$9.50 \times 10^{1}$
Rastrigin	Std	0.00	0.00	$2.68 \times 10^{1}$	0.00	$1.82 \times 10^{1}$	$2.45 \times 10^{1}$
Ackley	Mean	$2.48 \times 10^{- 16}$	$2.22 \times 10^{- 15}$	$2.46 \times 10^{- 15}$	$4.44 \times 10^{- 16}$	$2.03 \times 10^{1}$	$7.73 \times 10^{0}$
Ackley	Std	0.00	$1.81 \times 10^{- 15}$	$2.02 \times 10^{- 15}$	0.00	$1.95 \times 10^{- 1}$	$1.14 \times 10^{0}$
Griewank	Mean	0.00	0.00	$2.51 \times 10^{- 4}$	0.00	$4.72 \times 10^{2}$	$4.78 \times 10^{0}$
Griewank	Std	0.00	0.00	$1.37 \times 10^{- 3}$	0.00	$5.54 \times 10^{1}$	$2.09 \times 10^{0}$
Zakharov	Mean	0.00	0.00	$6.35 \times 10^{2}$	$7.03 \times 10^{- 135}$	$9.39 \times 10^{2}$	$9.30 \times 10^{1}$
Zakharov	Std	0.00	0.00	$1.60 \times 10^{2}$	$3.85 \times 10^{- 134}$	$1.79 \times 10^{2}$	$6.72 \times 10^{1}$
Levy	Mean	$3.22 \times 10^{- 3}$	$8.18 \times 10^{- 1}$	$6.16 \times 10^{0}$	$1.03 \times 10^{0}$	$1.06 \times 10^{2}$	$7.01 \times 10^{0}$
Levy	Std	$1.67 \times 10^{- 2}$	$7.22 \times 10^{- 2}$	$1.55 \times 10^{1}$	$2.69 \times 10^{- 1}$	$1.34 \times 10^{1}$	$3.33 \times 10^{0}$
Rothyp	Mean	$2.42 \times 10^{- 2}$	$4.25 \times 10^{- 1}$	$1.02 \times 10^{- 1}$	$2.55 \times 10^{0}$	$1.72 \times 10^{4}$	$5.98 \times 10^{0}$
Rothyp	Std	$1.11 \times 10^{- 2}$	$1.19 \times 10^{- 1}$	$6.00 \times 10^{- 2}$	$2.00 \times 10^{0}$	$4.64 \times 10^{3}$	$4.33 \times 10^{0}$
Easom	Mean	$- 1.00 \times 10^{0}$	$- 9.99 \times 10^{- 1}$	$- 1.00 \times 10^{0}$	$- 1.00 \times 10^{0}$	$- 1.00 \times 10^{0}$	$- 1.00 \times 10^{0}$
Easom	Std	$3.57 \times 10^{- 17}$	$1.45 \times 10^{- 3}$	$5.45 \times 10^{- 17}$	$4.50 \times 10^{- 5}$	0.00	0.00

Table 2. The initial conditions of the simulation.

State	Altitude	Longitude	Latitude	Velocity	Flight-Path Angle	Heading Angle
$x_{0}$	120.9 km	237°	$- 25 °$	7500 m/s	$- 1.2 °$	45°
$x_{f}$	30 km	279°	28.61°	900 m/s	$- 6 °$	90°

Table 3. Hyperparameter configurations for hybrid model and baseline LSTM.

Parameter	Value/Boundary	Definition
Population Size	20	Number of feasible solutions explored during optimization. Determines the search space coverage capability of the metaheuristic algorithm.
Iterations	100	Process of completing one full pass through the training dataset.
Batch Size	32	Number of samples selected from the training set to update model weights in a single parameter update step.
Hidden Layer Neurons	[10, 200]	Number of computational units in the hidden layer. For hybrid model: optimized parameter determining model capacity and feature extraction capability. For baseline model: fixed architecture with two layers (100 and 50 neurons).
Initial Learning Rate	[0.001, 0.02]	Base step size for parameter updates. For hybrid model: optimized parameter balancing convergence speed and stability. For baseline: fixed at 0.001.
Multi-head Attention	[4, 16]	Parallel attention heads capturing feature relationships in different subspaces. For hybrid model: optimization parameter. For baseline: fixed at 8 heads.

Table 4. Model Performance Metrics.

Hybrid Model	Data Type	RMSE	MAE	$R^{2}$
MWOA-AM-LSTM	Training dataset	0.0535	0.0241	0.9964
MWOA-AM-LSTM	Test dataset	0.0641	0.0261	0.9956
HHO-AM-LSTM	Training dataset	0.0672	0.0282	0.9942
HHO-AM-LSTM	Test dataset	0.0784	0.0316	0.9933
WOA-AM-LSTM	Training dataset	0.0898	0.0341	0.9921
WOA-AM-LSTM	Test dataset	0.0920	0.0357	0.9911
AM-LSTM	Training dataset	0.1123	0.0405	0.9852
AM-LSTM	Test dataset	0.1174	0.0432	0.9846
LSTM	Training dataset	0.1211	0.0421	0.9838
LSTM	Test dataset	0.1224	0.0479	0.9833

Table 5. Terminal errors of different models under nominal conditions.

Model	Relative Error (%)
Model	Terminal Altitude	Terminal Longitude	Terminal Latitude
MWOA-AM-LSTM	0.1969	0.2043	1.47
HHO-AM-LSTM	0.3208	0.2545	2.19
WOA-AM-LSTM	0.4790	0.3441	3.06
AM-LSTM	0.7218	0.4229	3.27
LSTM	0.7874	0.5448	3.68

Table 6. The impact of lift and drag deviation coefficients on terminal position errors.

Case	Coefficient Deviation		Relative Error (%)
Case	Lift ( $ξ_{L}$ )	Drag ( $ξ_{D}$ )	Altitude	Longitude	Latitude
case1	0	0	0.1969	0.2043	1.58
case2	+20%	+20%	4.5243	1.4524	3.24
case3	−15%	+15%	2.2728	0.8521	2.03
case4	+10%	+20%	3.2486	1.1243	2.42
case5	+10%	−10%	0.4325	0.2452	1.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, E.; Cai, G.; Fan, Y.; Xu, H.; Wei, H.; Li, X. Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network. Aerospace 2026, 13, 283. https://doi.org/10.3390/aerospace13030283

AMA Style

Dai E, Cai G, Fan Y, Xu H, Wei H, Li X. Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network. Aerospace. 2026; 13(3):283. https://doi.org/10.3390/aerospace13030283

Chicago/Turabian Style

Dai, Encheng, Guangbin Cai, Yonghua Fan, Hui Xu, Hao Wei, and Xin Li. 2026. "Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network" Aerospace 13, no. 3: 283. https://doi.org/10.3390/aerospace13030283

APA Style

Dai, E., Cai, G., Fan, Y., Xu, H., Wei, H., & Li, X. (2026). Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network. Aerospace, 13(3), 283. https://doi.org/10.3390/aerospace13030283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reentry Trajectory Optimization of Hypersonic Vehicle Based on Multi-Strategy Improved WOA Optimized Attention-LSTM Network

Abstract

1. Introduction

2. Entry Problem Formulation

2.1. Trajectory Dynamics

2.2. Reentry Constraints

2.3. Trajectory Optimization Problem

3. Online Trajectory Planning Framework Based on MWOA-AM-LSTM Network

3.1. Principles of Whale Optimization Algorithm

3.1.1. Encircling Prey

3.1.2. Bubble-Net Attacking Method

3.1.3. Search for Prey

3.2. Multi-Strategy Improved WOA

3.2.1. Circle Chaotic Mapping

3.2.2. Nonlinear Decay Factor

3.2.3. Dynamic Gold Sine Mutation Strategy

3.3. Principle of LSTM

3.4. Principle of Multi-Head Attention Mechanisms

3.5. Reentry Trajectory Planning Method

3.5.1. Offline Dataset Generation

3.5.2. Offline Network Training

3.5.3. Online Trajectory Planning

4. Numerical Simulation and Analysis

4.1. Testing and Analysis of MWOA Algorithm

4.2. Neural Network Prediction Performance Experiment

4.3. Online Trajectory Planning Performance Verification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Convexification and Discretization of the Optimal Control Problem

Appendix A.1. Convexification of the Problem

Appendix A.2. Discretization of the Problem

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI