1. Introduction
Hypersonic vehicles represent a class of aircraft operating in the near-space region with flight velocity exceeding Mach 5, characterized by high maneuverability and extended operational range [
1,
2]. Due to the complex flight environment and numerous constraints, trajectory optimization is critical to ensuring the successful flight of hypersonic vehicles under multiple terminal and path constraints [
3,
4]. However, reference trajectories in engineering practice are typically generated through offline optimization methods, which struggle to accommodate diverse uncertainties during flight operations. Consequently, real-time reentry trajectory optimization remains a significant technical challenge.
Trajectory optimization is fundamentally a nonlinear optimal control problem, primarily addressed through indirect and direct methods. The indirect method, leveraging Pontryagin’s Minimum Principle, transforms the problem into a Hamiltonian boundary value problem, thereby offering theoretical guarantees of global optimality [
5,
6,
7]. In contrast, the direct method discretizes the continuous problem into a nonlinear programming (NLP) formulation, which is then solved numerically. This approach has been successfully demonstrated in diverse applications, including atmospheric reentry [
8,
9], rocket landing [
10,
11], and planetary missions [
12,
13,
14]. Despite their respective strengths, both methods face certain challenges in trajectory optimization. The indirect method grapples with problem size that grows exponentially under complex constraints, while the direct method often incurs prohibitive computational costs for large-scale NLPs.
In recent years, with the advancement of computational capabilities, deep learning has demonstrated tremendous application potential in the field of trajectory optimization [
15,
16,
17]. Leveraging the powerful nonlinear mapping capabilities of neural networks, researchers have effectively reduced online computational burdens through offline training on historical trajectory datasets, thereby providing a promising alternative for real-time trajectory optimization in hypersonic vehicles [
18,
19,
20,
21]. Work [
22] developed a dual-phase hypersonic reentry planner, incorporating offline fuzzy-optimized trajectory planning and online DNN execution. This strategy achieves real-time trajectory execution while maintaining solution feasibility under complex flight constraints, demonstrating superior computational efficiency compared to conventional optimization-based methods. Work [
23] proposed an offline-trained Deep Neural Network (DNN) controller for hypersonic flight that learns state-to-optimal-action mappings from homotopy-generated trajectory data, enabling real-time near-optimal control with stable convergence. Work [
24] proposed a DNN-constrained trajectory generator coupled with adaptive reinforcement learning control for air-breathing hypersonic vehicles, which achieves precise tracking control while satisfying path constraints. Although the aforementioned strategies achieve real-time computation through offline training, the static feedforward architecture of its DNN struggles to effectively capture the inherent temporal dependencies in trajectory data.
In this article, Long Short-Term Memory (LSTM) networks combined with multi-head attention mechanisms are proposed as the core architecture for trajectory planning. As a representative variant of recurrent neural networks, LSTM demonstrates exceptional capability in capturing complex temporal features within sequential data, exhibiting significant potential for predictive applications [
25,
26]. This specialized recurrent structure has proven effective in modeling data with strong nonlinearities and temporal dependencies. Furthermore, the multi-head attention mechanism improves prediction accuracy of LSTM by adaptively weighting critical information in a sequence [
27,
28]. The outstanding predictive capabilities of LSTM networks integrated with multi-head attention mechanisms have been documented across multiple disciplines, including anomaly detection [
29,
30], medical diagnosis [
31,
32], hydrology and water resources [
33,
34], and civil engineering applications [
35,
36].
Despite the proven efficacy of LSTM and multi-head attention mechanisms in temporal modeling, their hyperparameter configuration is still limited by a heavy reliance on empirical knowledge and inefficient tuning processes [
37,
38]. Such manual tuning paradigms fail to unlock the full potential of neural networks, proving inadequate for the rigorous demands of complex trajectory optimization tasks. While multiple approaches exist for hyperparameter optimization, the adoption of swarm intelligence algorithms represents a particularly promising solution [
39,
40]. Their principal advantage stems from replacing traditional gradient calculations with probabilistic search to find the global optimum solution in high-dimensional spaces.
This paper presents an online trajectory planning framework for hypersonic vehicles based on a multi-strategy improved whale optimization algorithm and an attention-enhanced Long Short-Term Memory (MWOA–AM-LSTM) model. The framework is designed to enable real-time onboard trajectory generation in complex reentry aerodynamic environments by learning an expert state–command mapping from offline solutions, while maintaining comparable solution quality with substantially reduced online computational cost. Specifically, the main contributions are:
- (1)
Online learning-based trajectory generation guided by an offline expert database. We propose an integrated MWOA–AM-LSTM framework for hypersonic vehicle reentry trajectory planning, where sequential second-order cone programming (SOCP) is used offline to generate a reference trajectory–command dataset under bounded aerodynamic uncertainties. The AM-LSTM is trained in a supervised manner to approximate the expert state–command mapping—i.e., to infer the next-step bank-angle command from a short history of flight states—thereby enabling real-time online rollout with comparable performance to the SOCP-generated references. The resulting trajectory is propagated via numerical integration under admissible control bounds, allowing constraint-related quantities to be monitored during online execution and improving practical robustness in disturbed aerodynamic conditions.
- (2)
Automated and robust hyperparameter tuning for AM-LSTM via an improved WOA. To avoid manual and empirical hyperparameter selection, we develop a multi-strategy improved whale optimization algorithm to automatically tune the AM-LSTM architecture. By incorporating circle chaotic mapping for diversified initialization, a nonlinear convergence factor for balancing exploration and exploitation, and a dynamic golden-sine mutation strategy to mitigate premature convergence, the proposed MWOA enhances the efficiency and robustness of hyperparameter search in high-dimensional spaces, thereby improving the reliability of the learned mapping for real-time deployment.
The article is structured as follows:
Section 2 formulates the reentry trajectory optimization problem for hypersonic vehicles.
Section 3 details a novel trajectory planning methodology based on the MWOA-AM-LSTM framework.
Section 4 analyzes and evaluates simulation results.
Section 5 provides concluding remarks.
3. Online Trajectory Planning Framework Based on MWOA-AM-LSTM Network
3.1. Principles of Whale Optimization Algorithm
WOA constitutes a metaheuristic optimization methodology rooted in swarm intelligence principles, modeled after the distinctive bubble-net predation tactics of humpback whales. The algorithm is designed to balance global exploration and local exploitation in complex optimization through the emulation of three distinct behavioral patterns exhibited by whales: prey encircling, bubble-net foraging, and random search.
3.1.1. Encircling Prey
It has been demonstrated that humpback whales possess the cognitive ability to identify the whereabouts of their prey and subsequently encircle it. Since the optimal position in the search space remains unknown, the WOA algorithm thus presumes the current best candidate solution’s location as the target prey position. Following the determination of the target prey position, the remaining search agents coordinate encirclement maneuvers through positional updates governed by the following mathematical formulation:
where
t is defined as the current number of iterations,
and
represents the coefficient vectors,
is the current position of the solution,
is the location of the current optimal solution. The control coefficient
undergoes linear reduction from 2 to 0. rd is a random number from 0 to 1.
3.1.2. Bubble-Net Attacking Method
The bubble-net attacking strategy of humpback whales is computationally modeled through two synergistic mechanisms: prey encircling via shrinking encircling mechanism and spiral trajectory updating for position refinement. The Shrinking encircling mechanism is controlled through the coefficient
. As the coefficient
A is bounded within
, its fluctuation amplitude contracts proportionally with the reduction of
a. The spiral update position mechanism employs a spiral function to connect the whale and its prey, replicating the helical bubble-net feeding behavior of a humpback whale. Its formulation is as follows:
where
quantifies the distance between the
i-th search agent and the incumbent optimal solution. Parameter
b determines the curvature topology of the logarithmic spiral trajectory, while
l represents a random variable uniformly distributed over
.
Assuming that the two mechanisms are executed with equal probability, the position update operator is written in the following piecewise form to account for both the shrink-encircling and spiral-feeding behaviors:
where
p is a random number in the interval
.
3.1.3. Search for Prey
Beyond the bubble-net attacking strategy, humpback whales also perform random prey exploration within the algorithm. Again, this is accomplished by changing the value of
A. In instances where the absolute value of
A exceeds 1, the whale will deviate from the trajectory of the target prey. Distinct from the Bubble-net attacking phase, this mode designates a random selected position vector as the update reference instead of the current global optimum. This exploration-oriented phase demonstrably augments the global search capacity, as formalized:
where
is the location of a random individual in the whale population.
3.2. Multi-Strategy Improved WOA
The Whale Optimization Algorithm (WOA) is an efficient swarm-intelligence method that has demonstrated strong potential for complex optimization problems due to its concise mathematical formulation and robust global search capability. However, the canonical WOA still faces inherent limitations when tackling high-dimensional, nonlinear, multimodal, and dynamic optimization tasks. These limitations typically manifest as reduced population diversity, limited convergence accuracy, and a tendency to become trapped in local optima. Such issues are particularly pronounced in LSTM hyperparameter optimization within deep-learning frameworks, where WOA can be highly sensitive to parameter settings and may suffer from low search efficiency. To address these challenges, we propose a multi-strategy Improved WOA (IWOA) to enhance optimization performance. The pseudo-code is provided in Algorithm 1.
3.2.1. Circle Chaotic Mapping
The conventional WOA employs random initialization for the purpose of generating populations. However, this stochastic initialization strategy may result in an uneven distribution of search agents across the solution space, thereby compromising the algorithm’s global search space coverage capability.To address this limitation, we propose enhancing the population initialization strategy through the application of a Circle chaotic map. The mathematical formulation of the Circle chaotic mapping is defined as:
where
,
are denoted as the current individual as well as the subsequently generated individuals.
| Algorithm 1 Multi-Strategy Improved Whale Optimization Algorithm (MWOA) |
| Require: Objective function f, dimension D, population size N, maximum iterations |
| Require: Lower bound , upper bound , damping factor b |
| Ensure: Optimal solution |
| 1: Initialization Phase: |
| 2: Generate initial population using Circle chaotic map: |
| 3: |
| 4: |
| 5: ▹ Map to parameter space |
| 6: |
| 7: for
to
do |
| 8: Update coefficient a |
| 9: for each individual do |
| 10: |
| 11: if then |
| 12: |
| 13: if then |
| 14: Update position using shrinking encircling by Equation (17) |
| 15: else |
| 16: Randomly select individual |
| 17: Update position using searching for prey by Equation (20) |
| 18: end if |
| 19: else |
| 20: Update position using spiral update by Equation (18) |
| 21: end if |
| 22: if then |
| 23: Perform golden sine mutation by Equations (24)–(26) |
| 24: end if |
| 25: if then |
| 26: |
| 27: if then |
| 28: |
| 29: end if |
| 30: end if |
| 31: end for |
| 32: end for |
3.2.2. Nonlinear Decay Factor
In the conventional WOA, the linear decay pattern of the convergence factor fails to effectively reconcile the algorithm’s distinct phase-dependent requirements: intensive global exploration during initial iterations and refined local exploitation in later stages. To address the limitations inherent in the linear decay mechanism of the convergence factor, this study proposes a dynamically adjusted non-linear decay factor based on an exponential function. The function is as follows:
where
t represent the iteration index and
defining the terminal iteration count. The image of this function is shown in
Figure 1.
3.2.3. Dynamic Gold Sine Mutation Strategy
The WOA algorithm adopts a fixed spiral update mechanism during the local development stage. This makes it susceptible to falling into local optima and lacks an effective escape mechanism. Additionally, the general mutation strategy uses a fixed probability, making it challenging for the algorithm to balance global exploration and local development during later iterations. To overcome this defect, this paper introduces the dynamic golden sine mutation strategy. This strategy is based on a dynamic probabilistic perturbation triggering mechanism within an iterative process. It fuses the characteristics of the golden ratio and the sine function, thereby enhancing the algorithm’s searching ability at different stages. It also adaptively adjusts the searching step size and direction, balancing the algorithm’s global exploration and local exploitation abilities.
The steps are as follows. After the spiral update mechanism, a dynamic probabilistic perturbation mechanism is added to determine whether the golden sinusoidal mutation is performed. The expression for the mutation probability decaying over time is as follows:
The golden sinusoidal variation mechanism first generates two sets of coefficients based on the golden ratio. These coefficients are then used to regulate the contraction and expansion of the sinusoidal function parameters, respectively. This process is expressed mathematically as follows:
where golden section ratio
,
a and
b are the lower and upper bounds of the search space, respectively. Subsequently, the current solution is perturbed using a sinusoidal function, which is combined with the global optimal solution to adjust the current individual position:
where,
and
.
After the spiral update with the golden sine mutation mechanism, a reflection boundary treatment needs to be introduced to ensure that the newly generated solution
satisfies the predefined feasible domain constraints. The expression is as follows:
3.3. Principle of LSTM
LSTM represents a specialized deep learning architecture for processing sequential data, distinguished from conventional RNNs by three gating mechanisms: the input gate, forget gate, and output gate. These gates, coupled with a cell state that supersedes traditional hidden units, enable LSTM to model long-range temporal dependencies effectively.
As illustrated in
Figure 2, the core components operate as follows: The cell state
functions as the central memory conduit, preserving information across sequential timesteps. At each timestep, a candidate cell state
is generated via tanh activation, representing potential updates to the memory. Concurrently, three gating units regulate information flow: The forget gate
modulates retention of historical cell state
, the input gate
controls assimilation of candidate state
, and the output gate
gates the current cell state to yield hidden state
, which transmits temporal dependencies as the external output. The calculation process can be expressed as follows:
where
denotes trainable weight matrices,
B represents corresponding bias terms, and
signifies the sigmoid activation function. These parameters collectively govern the gating mechanisms and state transformations.
3.4. Principle of Multi-Head Attention Mechanisms
Multi-head attention employs parallelized attention mechanisms to project input data into orthogonal feature subspaces, enabling selective feature weighting for critical information enhancement. The calculation process is shown as follows:
- (1)
Input projection. The LSTM output sequence X through three linear layers:
- (2)
Subspace projection. Each matrix is partitioned into h heads and independently projected:
- (3)
Parallel attention computation. Each head calculates the attention weights and weights the value vectors, using softmax to normalize and generate an attention weight matrix:
- (4)
Output fusion. Heads are concatenated and linearly transformed:
Through the above process, the input sequence is mapped into h independent subspaces, where each head learns distinct feature patterns. The outputs of all heads are then concatenated and integrated via linear transformation, forming a global representation that captures complex dependencies between time steps, significantly enhancing the modeling capability for complex sequential relationships.
3.5. Reentry Trajectory Planning Method
While convex optimization provides theoretical solutions to trajectory optimization problems, its practical implementation is limited by computational burden. Convex optimization is employed to generate offline datasets supplying training samples for neural networks. Subsequently, the approximation capabilities of neural networks for complex nonlinear functions are leveraged to construct models mapping vehicle state variables to control commands. This study consequently proposes a hybrid framework enabling real-time trajectory planning and online command acquisition for hypersonic vehicles. The overall design scheme is shown in
Figure 3.
3.5.1. Offline Dataset Generation
Deviations from expected aerodynamic parameter values occur during the entry of hypersonic vehicles into the atmosphere, which may lead to discrepancies between the actual flight path and the predefined ideal path. The deviation in aerodynamics can be modeled using a Gaussian distribution, and according to the rule of the Gaussian distribution, the deviation range of aerodynamic parameters (including drag and lift) is set to . The drag deviation coefficient and lift deviation coefficient can be represented as and , respectively. Based on this deviation model, 500 different sets of drag and lift coefficients are generated for further analysis. The drag coefficient and lift coefficient can be expressed respectively as , . Subsequently, the cvx solver was used to solve this problem and 500 trajectories with aerodynamic uncertainties were obtained. The convergence condition can be satisfied after seven iterations, and the entire SOCP solution process requires 18.779 s.
3.5.2. Offline Network Training
The proposed hybrid architecture integrates multi-head attention mechanisms with LSTM networks to enhance sequential data modeling. While LSTM’s gated structure and cell state design provide inherent advantages in gradient stability and long-term dependency capture, its performance remains highly sensitive to hyperparameter configurations. Critical parameters include: Hidden layer size (determining model capacity and feature extraction), Learning rate (controlling gradient descent convergence), and number of attention heads (controlling multi-scale feature weighting). The MWOA algorithm has been developed for the purpose of optimizing the hyperparameter. The resulting hybrid model MWOA-AM-LSTM, has the capacity to be utilized for the purpose of predicting time series data.
The overall training procedure of the proposed MWOA–AM-LSTM model is summarized in Algorithm 2.
| Algorithm 2 MWOA–AM-LSTM model training process |
| Require: Trajectory dataset with aerodynamic uncertainties; MWOA configuration; AM-LSTM model structure. |
| Ensure: Trained MWOA–AM-LSTM hybrid model; evaluation metrics (MSE, MAE). |
| 1: Step 1: Data preprocess. |
| 2: Split into training set and test set . |
| 3: Apply min–max normalization to all features for dimensional consistency. |
| 4: Step 2: Hyperparameter optimization. |
| 5: Use MWOA to optimize the AM-LSTM hyperparameters: learning rate, number of hidden neurons, and number of attention heads. |
| 6: Define the objective function as the model loss on a validation subset of . |
| 7: Obtain the optimal hyperparameter set and configure the AM-LSTM network. |
| 8: Step 3: Model integration. |
| 9: Combine the optimized AM-LSTM and the MWOA-based tuning strategy to form the MWOA–AM-LSTM hybrid model. |
| 10: Train the optimized AM-LSTM on to obtain the final deployed model. |
| 11: Step 4: Control command prediction. |
| 12: Given historical flight states , predict the future bank-angle command . |
| 13: Step 5: Performance validation. |
| 14: Compare predicted and reference/measured quantities on . |
| 15: Quantify model performance using MSE and MAE. |
3.5.3. Online Trajectory Planning
The Runge–Kutta method is employed to simulate online trajectory planning. Commencing from the initial state, the trained neural network predicts control commands for the next time step. These commands are then numerically integrated to obtain subsequent flight states. Iteratively repeating this process enables rapid generation of the complete flight trajectory.
5. Conclusions
This paper proposes an online trajectory planning framework for hypersonic vehicles based on a multi-strategy improved whale optimization algorithm and an attention-LSTM network. Considering aerodynamic uncertainties, trajectory samples are provided offline by the sequential second-order cone programming method. The hyperparameters of the AM-LSTM network, including the number of hidden layer neurons, the initial learning rate, and the number of multi-head attention heads, were first specified within predefined boundaries. Then, by minimizing the model loss function, the MWOA optimized these hyperparameters to their global optimum. Thus, the constructed MWOA-AM-LSTM model was able to generate optimal control commands online based on historical flight states and demonstrates outstanding generalization capabilities. Subsequently, it was used as a real-time trajectory generator for the hypersonic vehicle. Numerical simulations demonstrate remarkable performance of the proposed framework in computational efficiency and planning precision under both nominal and perturbated conditions.
In the future, we will investigate more complex reentry trajectory-planning scenarios and evaluate alternative deep learning backbones under the same training and onboard inference constraints.