An Online Generation Method of Terminal-Area Trajectories for Wave-Rider Using Deep Neural Networks

: This paper presents a deep neural network-based online trajectory generation method for the aerodynamic characteristic description and terminal-area energy management of wave-rider aircrafts. First, the ﬂ ight dynamics equations in the energy domain are linearized and discretized to generate numerous aircraft trajectory samples with sequential convex optimization (SCO) meth-ods. Then, an optimization objective function is designed to promote the smoothness of the control variables and improve the trajectory similarity. Compared to the nonlinear programming (NLP), the proposed trajectory sample generation method is more suitable for the training of deep neural networks (DNNs). Finally, deep neural networks are formulated and trained for the control variables and state variables, using the generated obtained trajectory samples, so that the reference trajectories can be obtained online during the energy management process of the wave-rider’s terminal phase. Numerical simulations validate the high accuracy of the trajectories generated with the deep neural network. Meanwhile, this proposed method enables smaller storage usage, which is highly suitable for integration into on-board ﬂ ight control systems.


Background
The concept of wave-rider was first proposed by Nonweiler [1] in 1959. A wave-rider is generally designed for hypersonic flight conditions. During the hypersonic flight, an aircraft rides on top of shock waves, creating a high-pressure flow field on the lower surface, which generates significant lift. The body of the aircraft is typically flat, resulting in low drag. Therefore, a wave-rider aircraft generally has a high lift-to-drag ratio at the design point for hypersonic flight. However, for the off-design points, such as in subsonic conditions, the aerodynamic performance will deteriorate with a lower lift-to-drag ratio. Figure 1 shows a typical wave-rider aircraft.
The research on the design method of wave-riders has made significant progress. In order to solve the problems of excessive negative dihedral angles and insufficient volume ratios, the theory of a cone-guided wave-rider is proposed. To further improve the control of the shock wave shape, the theory of a tangential wave-rider is proposed. Subsequently, based on the tangential theory, some scholars domestically and internationally have obtained a series of new tangential-type wave-rider configurations with more complex configurations and specific purposes [2], such as the variable Mach number wave-rider, double-sweep wave-rider and dual/multistage wave-rider. The wave-rider aircraft requires energy management techniques to adjust its flight state during the terminal area before approaching and landing. The United States' space shuttle was the first unit which employed the energy management techniques in the end phase and successfully conducted 135 unpowered gliding approaches and landings. In 2011, the space shuttle "Atlantis" completed the final flight. Following the space shuttle, the United States' X-37B utilized terminal-area energy management techniques and successfully completed multiple landing missions at the Vandenberg Air Force Base in California and Kennedy Space Center [3,4].
The terminal-area energy management segment of the spacecraft is situated between the reentry phase and landing phase. It typically starts from an altitude of H = 26 km and Mach = 2.5, and it ends at an altitude of 1.5-3 km [5]. During the reentry phase, the spacecraft experiences significant uncertainties in aerodynamic forces and flight conditions due to traversing a large airspace and speed range. As a result, the carrier vehicle may deviate from the nominal trajectory, leading to significant errors of position, heading, and energy at the end of the reentry phase. After the reentry phase, the energy management segment aims to regulate the energy of the aircraft, guiding it along an appropriate trajectory for descent. This ensures that the spacecraft achieves the required altitude and velocity windows for the approach and landing phase while aligning the flight heading with the centerline of the runway, ensuring the safety of the approach and landing process. The trajectory adjustment is more difficult for wave-rider aircraft compared to space shuttles. A detailed comparative analysis is presented in Section 2.1. Therefore, wave-rider aircraft require more precise reference trajectories in the terminal area to reduce the additional trajectory adjustment required by deviations.
The most typical method for designing reference trajectories is the altitude-dynamic pressure profile method [6,7]. However, this method often leads to oscillations or even abrupt changes of the attack angle during the descent trajectory [7]. This is because the traditional altitude-dynamic pressure profile method relies on empirical assumptions, whereas the altitude-dynamic pressure profile is typically designed as a constant or linear profile. The reference trajectory is then obtained by using inverse calculation with the designed profile. However, the assumed profile cannot guarantee the existence of a solution, and the solution process often encounters situations where no solution can be found. Even when a solution exists, it is challenging to ensure the smoothness of the attack angle.
Deep neural networks have gradually been used for the analysis or generation of motion trajectories. For example, in [8], a deep convolutional neural network was employed to classify the trajectories of ships. Reference [9] used a deep neural network to obtain trajectories of high-speed aircraft, and the training samples for the network were obtained through a nonlinear programming method. However, extensive trajectory planning practices have shown that there still exist two unresolved issues with nonlinear programming methods. Firstly, the complex aerodynamic force model of the aircraft brings difficulty of algorithm convergence. Therefore, nonlinear programming methods are more suitable for solving trajectories of space vehicles without aerodynamic forces in the exosphere [10,11], or trajectories with well-behaved aerodynamic models, such as the reentry trajectories of hypersonic aircraft [12][13][14][15][16][17]. However, in the terminal-area energy management segment, the aircraft passes through the transonic regime where the lift-to-drag ratio shows significant changes. That is, nonlinear programming methods converge slowly and require longer computational time in this transonic regime. Secondly, trajectories obtained with nonlinear programming methods may lead to oscillations of control and state variables. The oscillations in trajectories can reduce the fitting accuracy of deep neural networks to the samples. In conclusion, there have been challenges such as long computational time, low efficiency, and low fitting accuracy when using nonlinear programming methods to generate a large number of terminal-area energy management trajectory samples.
Motivated by the above observations, this paper proposes a deep neural network-based online trajectory generation method for wave-rider aircraft in the terminal-area energy management. The main contributions of this work can be summarized as follows.
(a) The trajectory samples are generated based on sequential convex optimization (SCO). Compared to the most commonly used nonlinear programming (NLP) methods, sequential convex optimization (SCO) methods can converge quickly and obtain a large number of trajectory samples with less computation time and higher efficiency, especially when dealing with complex aerodynamic models. Moreover, we incorporate a term in the optimized objective function to suppress oscillations of attack angle. This ensures that the control variables obtained from the sequential convex optimization are smoother, ensuring stronger shape similarity to trajectory samples. Consequently, blending sequential convex optimization (SCO) with deep neural networks (DNNs) enables a more effective sample generation scheme.
(b) We establish deep neural networks separately for the control variable of angle of attack and the state variables of flight path angle, altitude, and velocity. With this method, each deep neural network has only one output: either the control variable of angle of attack or a specific state variable. This simplifies the network structure, reduces the network complexity, and improves the fitting accuracy of network. Moreover, the number of hidden layers in each deep neural network is designed according to the complexity of the sample trajectories. This approach effectively avoids excessive hidden layers, which would waste storage space and potentially lead to overfitting. As a result, this method requires less storage space, lower computational requirements, and thus provides highly accurate trajectories which are suitable for use in onboard computers.
The rest of this paper is organized as follows. Section 1 provides a detailed introduction to the sequential convex optimization (SCO) method used for generating trajectory samples. Section 2 generates trajectory samples using the nonlinear programming (NLP) in comparison with samples generated using sequential convex optimization (SCO) through a comparative analysis. Section 3 establishes the deep neural networks (DNNs) and presents the training method. Section 4 introduces the online trajectory generation method. Finally, Section 5 provides the concluding remarks.

Sequential Convex Optimization (SCO) Method
When the optimization objective function and inequality constraint functions of an optimization problem are convex with linear equality constraint functions, the optimization problem is referred to as a convex optimization problem. This problem can converge and be solved quickly. It is insensitive to initial guesses, and it typically obtains the global optimal solution within 10 to 100 steps. Therefore, they are suitable for online applications. This method has explicit function type restrictions on the objective and constraint functions and needs to be transformed into a convex optimization problem before solving [18]. Such convex optimization is usually solved using primal-dual interior point methods [19][20][21][22][23][24].
Flight dynamics equations have highly nonlinear equality constraints, but convex optimization methods require the equality constraints to be linear functions. Thus, linearization approximation is usually applied to the flight dynamics equations, only keeping the linear parts as equality constraints. The linearization of the dynamics equations is performed within the neighborhood of the optimized trajectory obtained in the previous optimization iteration. Solving a trajectory optimization problem requires conducting multiple rounds of convex optimization sequentially, which are named sequential convex optimization [25][26][27][28][29][30][31][32][33][34][35][36][37]. In this work, considering the need for efficient and fast optimization to generate a large number of trajectory samples for training deep neural networks, the sequential convex optimization method is employed.

Wave-Rider Aircraft Aerodynamics and Modeling
In terms of aerodynamic performance, compared to the space shuttles, wave-rider aircraft has a higher lift-to-drag ratio during the hypersonic flight regime, but their lift-to-drag ratio is similar to the space shuttle at low speeds. Therefore, in terms of body lift and drag characteristics, a wave-rider aircraft has similar energy management capabilities to the space shuttle. However, in terms of aerodynamic layout, there are more differences. On one hand, the space shuttle is equipped with dedicated speed brakes, which provide strong speed control capabilities. On the other hand, the large-sized split rudder can also contribute to the deceleration. In contrast, a wave-rider aircraft has a flatter body shape and a smaller tail section, making it challenging to install a servo system for speed brakes. Moreover, the weight of the speed brake servo system would shift the center of mass rearward and increase the static instability of the wave-riding aircraft. The vertical tail of the wave-rider aircraft is much smaller in size compared to the space shuttle, resulting in weaker speed control capabilities. These aerodynamic limitations make the trajectory adjustment more difficult for wave-rider aircraft compared to space shuttles.
In this paper, a type of typical wave-rider aircraft is chosen as the research object, and this wave-rider aircraft is designed for flight testing. The shape of the wave-rider is mainly designed based on the shock-fitting method, and the volume ratio is improved [38,39]. The designed flight condition of the wave-rider is less than 35 km, and the Mach number is less than 4. The designed range is less than 300 km. Furthermore, the aircraft is unpowered and is delivered to the desired flight condition by a booster rocket. A typical flight profile of the aircraft is shown in Figure 2.  The aerodynamic performance evaluation was conducted using the numerical simulation software developed independently. The main algorithm of the software is based on reference [40]. The non-structured Cartesian grid was selected, and the grid was refined near the surface of the wave-rider body with a grid count of approximately 6.2 million. The spatial discretization was implemented using the SD-SLAU numerical scheme, and the time discretization was carried out using the LU-SGS implicit method. The variable reconstruction employed the MUSCL method. The lift coefficient, drag coefficient, and lift-to-drag ratio of this aircraft are shown in Figure 4a-c respectively, where the unit of the angle of attack is degree.

The Longitudinal Dynamic Equations in the Energy Domain
Trajectory planning is performed in the longitudinal plane. A two-degree-of-freedom point mass dynamic model is adopted in the launching coordinate system. The optimization model is dimensionless. The dimensionless dynamic model is shown in Equations (1) (5), with which Equations (6) and (7) are obtained. By combining Equations (1)-(4) and (7), the dimensionless dynamic equations in the energy domain can be obtained as Equations (8)-(10).

Equation Constraints
Equations (8)-(10) can be rewritten as where x = [̄,̄,γ] T , and u = α. The optimal trajectory obtained from the previous optimization round is denoted as (x k , u k ), where k represents the optimization round. (x k+1 , u k+1 ) represents the trajectory to be optimized in the current round. In Equation (12), the right-hand side term f(x, u) in Equation (11) is the Taylor expanded along the trajectory (x k , u k ), and the left-hand side term in Equation (11) is discretized in the energy domain.
Here, j denotes the index of the discrete points, Δe = (ef − e0)/N, where N is the number of discrete energy segments. e0 represents the initial energy, and ef represents the terminal energy. vj represents a virtual control term with unrestricted magnitude. vj ensures the feasibility of Equation (12) throughout the optimization process and should be minimized. Ej is the coefficient matrix for vj, which allows full controllability of x k and is typically chosen as the identity matrix. Matrices A, B, and C are defined as shown in Equation (13). The specific calculation methods for matrices A and B are shown in Equations (14) to (30).  cos (19) In Equations (22) and (23) During the optimization process, the lift and drag coefficients, as well as their derivatives with respect to Mach number and angle of attack, are directly calculated using the original aerodynamic data table. There is no need for data fitting, and this ensures better adaptation to aerodynamic characteristics with sudden changes in the transonic region and thus enables higher trajectory accuracy. The initial states of the trajectory should satisfy the equality constraint as shown in the following equation:

Path Constraints
To maintain the convergence of the sequential convex optimization, the current optimized trajectory should not deviate far from the trajectory obtained in the previous iteration. Therefore, it is necessary to satisfy the inequality constraint (32). Here, ε can be chosen as a constant parameter based on experience.
During the flight, the rate of change of the angle of attack should be kept below a certain threshold to maintain the stable flight attitude. The constraint on the angle of attack rate can be described as Equation (33). Here, Δt = Δ (R0/g0) 0.5 . From Equations (7) and (33) In Equation (34), the calculation of Δe is given as (35)- (37).
In addition to limiting the angle of attack rate, the magnitude of the angle of attack should also be restricted. Excessive negative or positive angles of attack can cause the failure of aircraft control. The angle of attack constraint is shown in the following equation: During the flight, the aircraft should not descend below a certain altitude: ̄ ≥̄ , = 1,2, . . . , + 1 The trajectory of the aircraft during the flight can be designed below a certain limit value , as shown in the following equation: ≤ , = 1,2, . . . , + 1 (40)

Optimization Objective Function
The optimization objective is represented as a weighted sum of five terms, as shown in Equation (41). The first three terms aim to make the final state variables ̅ , and γ of the optimal trajectory as close as possible to their design values. These terms are included in the objective function as soft constraints to seek the closest feasible solution when the desired values cannot be achieved exactly. The fourth term aims to minimize the virtual control variable v as much as possible, where v = [v1, v2,...,vN]. The fifth term effectively reduces the accumulated rate of change of the control variables along the entire trajectory. When the final state variables reach the design values and the virtual control variable is small, the fifth term becomes the primary optimization objective in the objective function. It helps to suppress the oscillation of control variables. In Equation (41), , , , and are positive real numbers that can be adjusted based on the optimization results.
So far, Equations (12) to (46) formulate the complete sequential convex optimization (SCO) problem. This problem can be solved using the CVX software package.

Trajectory Samples Design
This work focuses on the trajectory samples design for a wave-rider aircraft. By determining the initial energy e0, initial velocity V0, initial flight path angle γ0, final height yf, final velocity Vf, and final flight path angle γf, and specifying different final ranges xf, a set of trajectories can be obtained. In this scenario, the initial and final states are shown in Table 1. With Table 1, a set of trajectories is obtained, as shown in Figure 5, with each trajectory consisting of 101 nodes. The red curve represents the trajectory for V0 = 300, γ0 = −2°, and xf = 55 km. As shown in Figure 5a, the curves of the x-y profile have a similar shape.
Similarly, the curves of the e-y, e-v, e-gamma, and e-alpha profiles also have a similar shape, as depicted in Figure 5b-

Accuracy and Convergence Analysis of the Optimization Method
In this section, numerical methods are used to analyze the accuracy of the sequential convex optimization algorithm. Using the sequential convex optimization algorithm with a trajectory discretization of N = 100, the optimized trajectory is obtained. The simulated trajectory is obtained by numerically integrating Equations (1) to (4), using the energy-angle of attack profile [e-α]opt of the optimized trajectory. The deviation of the optimization algorithm is defined as the difference between the states of the optimized trajectory and the simulated trajectory at the same x-coordinate as: Figure 6a-c illustrate the height deviation, flight path angle deviation, and velocity deviation of the optimized trajectory, respectively. As shown in Figure 6a-c, the optimization algorithm shows high accuracy. The height deviation is within 110 m, the trajectory inclination angle deviation is within 1.5 degrees, and the velocity deviation is within 4 m/s. These results meet the requirements for engineering applications.   Figure 7 illustrates the convergence process of the optimization objective function J. The definition of J is given in Equation (41). From Figure 7, it can be observed that the values of J for all trajectories converge within 25 iterations. After convergence, J is kept below 1 × 10 -5 .

The Nonlinear Programming (NLP) Method
To demonstrate the superiorities of the proposed trajectory generation method using sequential convex optimization (SCO), this section provides a comparative study with the nonlinear programming (NLP) approach. The NLP approach is implemented using the GPOPS software. The dynamic equality constraints, path constraints, and optimization objective function for SCO are described in Section 2. The NLP approach also has similar constraint functions and optimization objective functions but with some differences. Firstly, in the equality constraints, the NLP approach directly discretizes the nonlinear dynamic equations without linearization. Secondly, the path constraints in the NLP approach do not require Equation (32). This is because the NLP approach does not need to linearize the dynamic equations and therefore does not rely on the results of the previous trajectory optimization. The remaining path constraints are kept the same. Thirdly, the optimization objective function in the NLP approach retains the first three terms of Equation (41). The fourth term is not necessary, since the NLP approach does not linearize the trajectory. The fifth term is updated as Equation (48). The reason is that using Equation (45) in GPOPS tests leads to difficult convergence, whereas using Equation (48) achieves better convergence and smoother control variables.
The GPOPS software is used to generate an equal number of trajectories with the initial and terminal trajectory conditions shown in Table 2. Three representative trajectories are selected for comparison. The initial conditions for all trajectories are the same: energy e0 = 0.996297, V0 = 300, γ0 = −2°, and α0 = 0°. The values of xf are chosen as 35 km, 55 km, and 7 5 km. Figures 8-10 illustrate the comparison of the results obtained using sequential convex optimization (SCO) and GPOPS for these three trajectories.
From Figures 8-10, it is seen that the results obtained from SCO and GPOPS are quite similar, and their trends are consistent. However, compared to SCO, the oscillations in the angle of attack curve are more obvious for GPOPS. Additionally, for the curve with xf = 75 km obtained by GPOPS, the oscillations in the angle of attack lead to significant fluctuations in velocity. Table 2 provides a comparison of the time taken by SCO and GPOPS to solve the three trajectories. It is evident from Table 2 that SCO demonstrates superior efficiency compared to GPOPS.

Building and Training Deep Neural Networks
To ensure more storage of trajectory samples in a limited storage space for onboard computer applications, this work utilizes the deep neural networks (DNNs) to fit the trajectory samples obtained in Section 2.
Four DNNs are designed to fit the flight path angle, angle of attack, velocity, and altitude samples depicted in Figure 4b-e, respectively. Considering the complex shapes of the flight path angle curves, angle of attack curves, velocity curves, and altitude curves, the DNNs adopts multiple hidden layers to achieve accurate fitting. The number of layers is properly tuned for balanced efficiency and effectiveness.
Due to the significant fluctuations in the flight path angle curve, a DNN with 10 hidden layers is selected to fit the flight path angle curve. For the angle of attack curve and velocity curve with relatively moderate fluctuations, the DNNs with five hidden layers are chosen for fitting. As for the altitude curve, which exhibits the slowest variation, a DNN with three hidden layers is sufficient to achieve high accuracy. The first hidden layer of each network consists of 10 neurons, and the remaining hidden layers contain five neurons.
The input of the deep neural network is designed to consist of four variables: initial velocity V0, initial trajectory angle γ0, range xf, and energy e. The output is a single variable, which corresponds to the angle of attack for the angle of attack network, flight path angle for the flight path angle network, velocity for the velocity network, and height for the height network. Therefore, each training sample for the deep neural network consists of five values: the first four are inputs, and the last one is the output, as shown in the following equation: The specific configurations and training methods for the four deep neural networks are shown in Table 3. The network structures for the four networks are depicted in Figure  11. According to Table 3, the flight path angle network only requires storing the data of 306 weights, while the angle of attack and velocity networks require storing 201 weights. The altitude network only needs to store 141 weights. Therefore, the deep neural network database requires less storage space, making it suitable for online applications in on-board computers.  In Figure 12, the training samples used for each network were generated by the sequential convex optimization (SCO) method and the GPOPS software. From Figure 12, it can be observed that in all five trainings of the angle of attack network, the SCO samples outperformed the GPOPS samples. For two times of training, for the flight path angle network, the SCO samples performed better than the GPOPS samples. For all five times of training of the velocity network, the SCO samples show superiority to the GPOPS samples. For four times of training for the altitude network, the SCO samples performed better than the GPOPS samples.
Therefore, compared to the GPOPS software, using the reference trajectory samples generated by the SCO method for training the deep neural networks consistently resulted in smaller values of the performance function F. This indicates that the reference trajectory samples generated by SCO achieve better fitting performance.

Online Trajectory Generation
Online trajectory generation is performed using the deep neural networks trained with SCO samples. The angle of attack network, flight path angle network, velocity network, and altitude network are selected based on the smallest F values shown in Figure  12. The angle of attack network is selected from the first training, the flight path angle network is selected from the third training, the velocity network is selected from the second training, and the altitude network is selected from the first training. Figure 13a-d provide a comparison between the outputs of the deep neural networks and the sample outputs for a typical operating condition with e0 = 0.996297, V0 = 300, γ0 = −2°, and xf = 55km. It can be observed that the designed multi-layer deep neural networks ensure better fitting for curves with complex shapes.   Figure 14 shows the errors between the outputs of the four deep neural networks and the sample outputs under the same set of 141,804 sample inputs. From Figure 14, we can see that for each deep neural network, the majority of the outputs have relatively small errors. Approximately 99.9% of the angle of attack network output errors are kept within 0.5°; 99.1% of the flight path angle network's output errors are within 1°; 99.6% of the velocity network's output errors are within 5 m/s, and 99.6% of the altitude network's output errors are within 150 m. This indicates that the deep neural networks achieve higher fitting accuracy and have promising potentiality in engineering applications.

Summary
This paper presents an online trajectory generation framework for the terminal-area energy management of wave rider aircraft based on sequential convex optimization (SCO) and deep neural networks (DNNs). The sequential convex programming is used to generate massive training samples for the deep neural networks. The deep neural networks are constructed separately for each control and state variable to simplify the structure and improve fitting accuracy.
The superiority of sequential convex optimization (SCO) is demonstrated through comparative studies. Training samples for the deep neural networks are obtained using both the sequential convex programming (SCO) and nonlinear programming software GPOPS. The optimization criteria for both methods are designed to ensure that the terminal states of the trajectories meet the design values and to minimize oscillations in the control variable α. Compared to GPOPS software, the trajectories obtained using sequential convex optimization (SCO) provide smoother control variable α, similar shapes among trajectories for the same state variable, and smaller solution time with higher computational efficiency. These characteristics make the trajectories obtained from sequential convex optimization suitable for training the deep neural networks.
A separate deep neural network is established for each control and state variable, including the flight path angle network, angle of attack network, velocity network, and altitude network. The number of hidden layers and neurons in each layer of the deep neural networks is designed based on the complexity of the trajectory curves to achieve higher fitting accuracy with minimal storage space usage. The calculation of output errors for the networks indicates that the generation method based on deep neural networks achieves high trajectory accuracy, with low storage space requirements, so it is more suitable for online applications in onboard computers.