1. Introduction
In recent years, hypersonic vehicles have become one of the development directions in the aerospace field. A hypersonic vehicle is a vehicle that moves through the atmosphere at a height of below 90 km at a speed of above Mach 5. Under extreme and variable flight conditions, such as nonlinear aerodynamic parameters and high heat load, the dynamical system of a hypersonic vehicle is uncertain, coupled, and highly nonlinear. Accordingly, how to manipulate and control a hypersonic vehicle to meet particular requirements denotes a highly constrained nonlinear optimization problem.
In general, trajectory optimization of a hypersonic vehicle represents a process of designing a trajectory that minimizes (or maximizes) certain performance measures, while satisfying a set of constraints. Many numerical methods have been proposed to transform the continuous-time optimal control problem into an approximate, finite space, and precision range optimization problem in a certain way. Typically, there are two types of traditional methods to solve the optimal control problem: indirect methods and direct methods [
1]. The indirect methods transform the optimal control problem into a Hamilton Boundary Value Problem (HBVP) using the Pontryagin minimum principle, and an optimal numerical solution of a trajectory can be obtained by solving the boundary value problem. Indirect methods have been used for solving hypersonic vehicle trajectory planning problems, which could provide a high accuracy solution [
2,
3,
4,
5]. However, due to the well-known drawbacks of complex implementation and high sensitivity to the initial condition of the indirect methods, direct methods have been widely used since they do not require optimal necessary conditions. Namely, the direct methods discretize and parameterize the continuous optimal control problem and use numerical methods to find the optimal performance index [
6]. Several popular direct methods, including the collocation method [
7], and the pseudo-spectral method [
8,
9,
10,
11], have been extensively used for solving a variety of trajectory optimization problems. The direct methods have the advantages of a robust convergence domain and flexible applicability to practical complex problems. However, dealing with transformed numerical equations on each of the collocation points introduces much computation load, which cannot meet the computational efficiency requirements of online trajectory generation applications.
Due to the increasingly high demand on real-time engineering, how to provide a significant improvement in the algorithm calculation speed has become a challenge. Many studies have focused on exploration and improvement in real-time trajectory optimization based on the existing numerical methods. Antony [
12] developed a graphical processing unit accelerated indirect ballistic optimization method using the multiple shot method and the extended method, which can maximize the computational efficiency, while taking full advantage of the parallelism characteristic of the indirect targeting method. To improve the computational efficiency of the Chebyshev pseudo-spectral method, Wang [
13] used the differential flatness theory to solve the trajectory problem of hypersonic vehicles by reducing kinetic differential constraints, and the results showed that the solution time of a single trajectory was effectively reduced, compared with the traditional pseudo-spectral methods. In recent years, convex optimization techniques have attracted great attention due to their advantages of efficient solution and convergence property [
14,
15,
16,
17,
18,
19,
20]. Wang [
21] proposed two improved algorithms for the hypersonic vehicle’s reentry trajectory optimization, named the line search sequence convex optimization and the trust domain sequence convex optimization, using the predictive correction method to find the initial 3D trajectory, which improves the convergence of the solution process. In addition, a robust trajectory optimization method combining chaotic polynomials and convex optimization techniques was proposed in [
22,
23]. This method exploits the high accuracy of chaotic polynomial algorithms for solving highly nonlinear dynamics problems and the high efficiency of convex optimization algorithms for solving optimal control problems. However, the convexification of the trajectory planning problem is still a challenge, especially for systems with high nonlinear dynamics and constraints. As mentioned above, most studies have improved the algorithm solution efficiency through mathematical processing using convex optimization methods, pseudo-spectral methods, or indirect methods. The principle of the improved algorithms still relies on the iterative convergence framework, where the selection of the iterative initial conditions directly affects the algorithms’ convergence. Moreover, these solutions limit the online application of the algorithm to a certain extent.
Recently, taking the advantages of good generalization ability and rapidity, many mature machine learning methods have been proposed to achieve onboard application in order to meet the requirements for high autonomy, required optimality, and real-time performance [
24,
25,
26]. Yin [
27] proposed a DNN- (Deep Neural Network) based method for low-thrust orbit transfers, where the fast generation of optimal trajectories was achieved by the advantages of high computational efficiency and reliability. For the online trajectory planning for moon landings, Furfaro [
28] proposed a deep convolutional neural network model to predict fuel-optimal control actions, using raw images taken by onboard optical cameras. Shi [
29] proposed a deep learning-based approach for real-time trajectory optimization of hypersonic vehicles, and the trained DNN-based trajectory was demonstrated to be capable of generating optimal control commands onboard, while achieving good real-time performance and stable convergence. However, only a 2D flight dynamics model was considered, but it cannot fully describe 3D trajectories of hypersonic vehicles. Moreover, the terminal states of the trajectory planning problem were set as certain values, where the uncertainties of terminal states in different flight missions were ignored.
In this study, following the success of the machined learning method in the fast generation of optimal controls, a real-time DNN-based method is proposed to solve the optimal trajectory generation problem of a three-DOF (Degrees of Freedom) hypersonic vehicle reentry model. The proposed method has the generalization capability that satisfies the accuracy requirements and meets the demands of online real-time trajectory optimization better than the traditional trajectory optimization.
The contribution of this work is threefold. First, a DNN-based optimal control method that has the potential to address the long-standing challenge of solving highly nonlinear trajectory optimization problems for hypersonic vehicles, while achieving good real-time performance is proposed. Second, the pseudo-spectral method is used to generate optimal trajectories for network training efficiently. Third, extensive simulation results are provided to validate the performance of different DNN-based models in learning the nonlinear relationship to solve the trajectory optimization problem, and the accuracy of the trained DNN models is verified through the comparison with the direct approaches. The reference [
29] proposed a real-time trajectory optimization method for hypersonic vehicles based on DNN models, which is potentially capable of near-optimal control with real-time performance and stable convergence. However, the proposed method only focused on the 2D (two-dimensional) trajectory optimization problem, and the trajectory end point was set to be fixed. The method proposed in [
29] is limited to 3D trajectory optimization with random endpoint cases. To solve the problem, this paper proposed the 3D real-time trajectory optimization method based on the pseudo-spectral method and the DNN models, where the pseudo-spectral method is used to generate large-scale 3D optimal trajectory training data, and DNN models are designed and trained to predict optimal actions according to the flight states.
The remaining paper is organized as follows.
Section 2 presents a continuous-time optimal control problem of a three-dimensional (3D) hypersonic flight, with nonlinear dynamics and terminal constraints, and introduces the research idea for solving the trajectory optimization problem of hypersonic vehicles.
Section 3 describes the DNNs trained using the optimal trajectories obtained by the pseudo-spectral method.
Section 4 provides the numerical simulation results to evaluate the performance of the proposed DNN-based trajectory optimization method.
Section 5 concludes the paper and presents future work directions.
2. Materials and Methods
2.1. Three-DOF Dynamic Model Development
In this paper, the trajectory of a hypersonic vehicle is considered as a three-DOF reentry motion model of a rotating sphere, where the sideslip angle is zero. The position parameters, including the geocentric distance
, longitude
, and latitude
, are defined in the geocentric spherical fixed coordinate system. The velocity parameters include the velocity
, track angle
, and course angle
. The undynamic three-DOF reentry motion equations expressed by the above-listed parameters are as follows:
where the Earth rotation acceleration is assumed to be zero, and
represent the gravitational acceleration, roll angle, lift, and drag, respectively.
In order to improve the efficiency of the optimization process, a dimensionless method is applied to the undynamic three-DOF reentry model. The dimensionless geocentric distance
z, velocity
u, and flight time
τ are, respectively, defined as:
is the radius of the earth, and
is the gravitational acceleration. The dimensionless three-DOF reentry equations can be obtained by substituting the above variables into Equations (1)–(6), which yields:
The dimensionless lift and drag are, respectively, defined as follows:
,, and represent the air density, the mass, aerodynamic reference area, lift and drag coefficients of the aircraft, respectively, and . The control vector is expressed as , which represents the generalized lift coefficient and heeling angle, respectively, and the fight trajectory can be generated after designing the changing curve of the control vector.
2.2. Problem Statement
The trajectory planning problem for a typical hypersonic vehicle is considered in this paper. It can be described as an optimization problem, the core of which is to choose optimal or suboptimal control parameters such that the objective function is minimized, while under constraints including boundary constraints, path constraints and constraints of control.
It is worth pointing out that the initial and final states in this research are considered random, which is more closer to the actual flight environment. Namely, the initial conditions , which represent initial geocentric distance, longitude, latitude, velocity, track angle and course angle, respectively, and the final conditions , which represent the terminal longitude and latitude, respectively, are given as random values within a certain range, and the solutions of the problem are proposed to gain the optimal or suboptimal trajectory based on the random cases.
In general, several types of performance indices to specify different optimization objectives exist, such as the maximum range, minimum heat load, and minimum time. In this paper, for the mission to reach the desired area fast, the total flight time is considered to be an important performance index, and the objective function is given by .
The process constraints mainly include the dynamic pressure constraint, heat flow constraint, and overload constraint. In view of the severe flight environment of a hypersonic vehicle, the following constraints need to be satisfied rigorously.
2.2.1. Dynamic Pressure Constraint
Dynamic pressure refers to the kinetic energy of a fluid per unit volume. In the field of hypersonic vehicles, the dynamic pressure is proportional to the aerodynamic force and torque. Considering the influence of the dynamic pressure on the requirement for lateral stability of the control system, the dynamic pressure in the reentry process needs to meet the following constraint:
2.2.2. Heat Flow Constraint
Considering the stagnation point is an area where a vehicle is heated more severely, the heat flow of the stagnation is generally taken as a constraint. The heat flow constraint is given by:
2.2.3. Overload Constraint
The overload constraint needs to be considered in the reentry process for the purpose of structural safety. The overload constraint is defined as follows:
2.3. Research Ideas
In this paper, the DNN-based real time trajectory planning method is proposed. The whole process of the DNN-based real-time trajectory optimization is shown in
Figure 1. First, the Chebyshev pseudo-spectral method is used to generate the optimal state–action samples [
x,a]. In this way, the generation of large-scale optimal sample data, which is time consuming, is performed offline. Moreover, by normalizing and interpolating the discrete state and action data, the resulting optimal samples are obtained and sent to the neural network. Finally, the network is designed to learn the nonlinear functional relationship between the state and action. With the training process, the network that can output the optimal controls in accordance with the current flight state is derived. Based on the derived deep neural work, the trajectory planning and control can be performed online, since the calculation load of a network is quite acceptable as real-time output.
3. Sample Data Generation Method Based on Chebyshev Pseudo-Spectral Method
3.1. Chebyshev Pseudo-Spectral Method
The basic solution steps of the Chebyshev pseudo-spectral method are as follows. Choose discrete continuous-time state and control variables over a series of CGL (Chebyshev–Gauss–Lobatto) points and construct the Lagrange interpolation polynomials using these discrete points as nodes to approximate the real state and control. Next, approximate the derivatives of the state variables over time by deriving global interpolated polynomials to convert differential equation constraints to algebraic constraints. Then, integrate the terms in the efficacy indicators, calculated by Clenshaw–Curtis numerical integration. Using the Chebyshev pseudo-spectral method, the optimal control problem can be transformed into an NLP (Nonlinear Programming) problem with a set of algebraic constraints.
Time-domain transformation:
The CGL points in the Chebyshev pseudo-spectral method are in the interval of
, so the time variable
can be transformed to
as follows:
Calculation of discrete nodes:
In the Chebyshev pseudo-spectral method, discrete nodes are selected as extremal points of a Chebyshev polynomial of the
Nth order, i.e., the CGL points that are unevenly distributed in the range of
. For the standard CGL points, the definition of Legendre–Gauss point
is as follows:
Approximate interpolation of state and control variables:
The Lagrange interpolation polynomial is constructed as an approximation of the above state and control variables at (
N + 1) discrete points. The approximate expressions of the real state and control variables are, respectively, as follows:
The Lagrange interpolation base function is defined as:
In Equation (22), represents the CGL points. Based on the nature of the Lagrange interpolation, the state approximation at a discrete node is equal to the actual state, while the control approximation is equal to the actual control.
Dynamic constraint processing:
Based on Equation (20), an approximate expression of the derivative of the state vector at time
is given as:
where
represents elements in a row
k and column
j of a
differential matrix
that is expressed as:
The derivatives of the substituted state variables over time can be obtained by Equation (23) and discretized at the interpolation node. Thus, the kinetic differential equation constraints of the original optimal control problem can be converted to the algebraic constraints for
as follows:
where
f represents the state equation of the system. For the process constraints defined by the above equation, strict satisfaction at the discrete nodes is required.
Approximate integration of performance indicators:
When there is an integral term in the optimization performance metric, the Clenshaw–Curtis numerical integration can be used to approximate it. For a continuous function over the interval of [−1, 1], its integration can be summed and approximated by the function at (
N + 1) discrete points of the CGL as follows:
where
denotes the Clenshaw–Curtis weight.
where
is the first-order derivative of the conformal map,
is the Clenshaw–Curtis weight, and it holds that:
In Equation (28), the two apostrophes above the summation symbol indicate that the first and last expressions should be divided by two.
3.2. Training Data Generation
The pseudo-spectral method was used to generate plenty of optimal trajectories. The minimum flight time of the hypersonic vehicle was considered as the optimization target, and the generalized lift coefficient and bank angle are considered as variables to be optimized.
Optimal trajectories generated with random initial and terminal states:
Considering the varied and different flight missions of hypersonic vehicles, the information of the initial point and terminal point cannot be determined before take-off. The hypersonic vehicle needs to generate optimal controls in the light of the current mission and flight environment information; to address the problem of autonomous intelligent behavior planning of hypersonic vehicles in uncertain flight environments, it is necessary to design the trajectory generator with strong robustness to generate an optimal or suboptimal trajectory with uncertain initial and terminal states. In this paper, a deep neural network is developed to perform as the real-time trajectory generator with high accuracy and strong stability. In this sense, a sufficient number of optimal trajectory data samples are required to train the deep neural network to predict the optimal controls. Therefore, for the training data generation, the states of the initial point and terminal point for each sample trajectory are randomly chosen in a certain range, based on which a large number of optimal trajectories are generated using the Chebyshev method.
The generation of massive optimal trajectories:
For each optimal trajectory generated by the pseudo-spectral method, we obtain the optimal discrete sequence of control and state variables with respect to discrete CGL time points. To gain more optimal state–action pairs as the training samples, random initial and terminal states are set for the Chebyshev method. On account of the inconformity of time label for each optimal trajectory, each optimal trajectory is interpolated about time.
4. Neural Network Design and Training
The DNN is proposed to predict the optimal trajectory control actions for a hypersonic vehicle based on its flight mission and current state. The proposed DNN is designed as a fully connected, feed-forward neural network with one input layer, multiple hidden layers, and one output layer. The neural network input consisted of six current position state quantities, six trajectory start position state quantities, and six terminal position state quantities; that is, . The neural network output consisted of the trajectory control variables, the generalized lift coefficient and inclination angle, which is given as .
It is worth pointing out that the input and output of the network should be normalized for effective training and fast convergence. The normalization process was as follows:
where
denotes the training dataset,
and
denote the maximum and minimum in
, respectively, and
is the normalized training dataset.
The activation function in the neural network model is the sigmoid function, which performs better than the ReLU function in the problem. The Adam accelerator was used for its high computational efficiency, and the loss value was calculated as the average of the expected output value and the squared sum of the errors. The loss value was the mean squared error and was calculated by:
where
is the total number of training samples, and
and
are the predicted and true values, respectively.
The flowchart of the neural network training process is shown in
Figure 2.
The pseudo-code of the Algorithm 1 used in this paper is shown below. Where and represent the weights and bias of the neural network, represents the learning rate of the neural network, represents the total training batch, represents the number of samples contained in a training batch; the number of training sessions per training batch is determined by dividing the total number of training samples by rounded up. represents the index value of the training batch, the network input contains the initial value of the state , final value of state volume and current state , the network output includes the generalized lift coefficient and inclination angle . represents the range angle and subfunction environment () represents the hypersonic vehicle reentry segment model, where the input is the current control and the output is the state at the next moment.
Algorithm 1 Imitation learning |
1: Initialize network weighting values and
2: Set 3: for do 4: for do 5: obtain the optimal sequence of pseudo-spectral method ballistic 6: data feature extraction and normalization 7: update network parameters using Adam algorithm: 8: end for 9: Randomly generate a ballistic path by pseudo-spectral method set up data buffering ℜ 10: do 11: 12: 13: , to , update 14: end |
5. Simulations and Result Analysis
The experiments were conducted to verify the effectiveness and generalization ability of the proposed neural network. The models of a hypersonic gliding vehicle named the high-lift common aero vehicle (CAV-H) were used to test the effectiveness of the proposed algorithm. The mass of CAV-H was 907 kg, and its aero reference area was 0.4839 m2. The CAV-H had a high maximum lift-to-drag ratio of E* = 3.24, and the corresponding lift coefficient was 0.45. The pneumatic reference area was = 0.8. The gravitational acceleration was= 9.8 m/s2, and the Earth radius was considered to be = 6378 km.
The parameters of the starting and terminal points of the glide section of a hypersonic vehicle are given in
Table 1. The constraints that the ballistic optimization needs to meet are listed in
Table 2.
In
Table 2,
denotes maximum heat flow density,
represents the maximum dynamic pressure, and
is the maximum normal overload.
5.1. Generation of the Training Data
The Chebyshev pseudo-spectral method was used to generate 5000 trajectories, and the serial variations geocentric distance, longitude, latitude, velocity, control volume, generalized lift coefficients, and inclination angles are shown in
Figure 3 and
Figure 4. The ballistic data were interpolated to obtain the ballistic states at 1-s intervals, and the 5000 trajectory data samples were summed to form a total data sample. The sample size was approximately 7.5 million ballistic data states.
5.2. Training Process of the DNN
The loss value for 10,000 training epochs is shown in
Figure 5. When the neural network was trained using the sigmoid activation function, the loss value could converge quickly and converge in 0.001. The data of 5000 trajectories were divided into a training set consisting of 4000 trajectories and a test set consisting of 1000 trajectories. In addition, the sigmoid and ReLU activation functions were used for comparison.
The loss values for the ReLU and sigmoid activation functions are shown in
Figure 5, respectively. It can be seen that the loss values were larger on the testing set, but the overall loss value was stable and at a relatively low level. The results showed that the loss value on the test set for the sigmoid function was near 0.001, while that of the ReLU function was above 0.05. Thus, the sigmoid activation function made the loss function converge to a smaller value, which is chosen as the activation function for the network.
5.3. Random Single Trajectory Error Analysis
In the simulations, the initial and terminal states of the trajectory are randomly generated in a certain range, and the state sequence is used as the network input. The trained deep neural network is used to predict the values of the trajectory control variables (generalized lift coefficient and inclination angle), and the predicted values are compared to the expected values that were obtained by the pseudo-spectral method to verify the effectiveness of the neural network. The comparison results of the predicted and expected output values are shown in
Figure 6 and
Figure 7, where it can be seen that the predicted and expected output values coincided well during the whole flight, and the error is basically under 0.02, which verified the deep neural network’s capability in online planning and the prediction of the generalized lift coefficient and inclination angle values.
5.4. Validation with Vehicle Dynamics Model
The three-DOF model of the hypersonic vehicle reentry phase was used to further verify the prediction performance of the proposed model. The neural network consisted of eight layers, each of which had 500 neurons. There is a total of 40 batches in training, and the number of samples per batch was set to 256. A single trajectory is taken as an example, and a random trajectory was generated by the pseudo-spectral method. The start and end position conditions set by the pseudo-spectral method are substituted into the trained neural network for testing, and the comparison of the flight paths estimated by the pseudo-spectral method and those predicted by the neural network is used to analyze the output error of the neural network model. The generalized lift coefficient and inclination angle are presented in
Figure 8 and
Figure 9, respectively, where it can be seen that the predicted and estimated values coincided well. The error curves of the generalized lift coefficient and inclination angle are presented in
Figure 10 and
Figure 11. Based on the results, the error of the generalized lift was within ±0.01°, and the error of the inclination angle was within ±0.02°. The numerical values of the errors of the neural network prediction are given in
Table 3. As shown in
Table 4, the geocentric distance error was within 1 km, the longitude and latitude errors were 0.1° and 0.03°, respectively, and the velocity error was 4 m/s.
5.5. Monte Carlo Simulation Verification
In order to demonstrate the generalization ability of the developed neural network model, the Monte Carlo ballistic simulation and error analysis were carried out. In the simulations, random ballistic beginning and end state parameters were used, and there were 1000 target trajectories. The Monte Carlo simulation was performed using an online planning method based on the neural network.
The analysis results are shown in the following table.
6. Conclusions
In this study, a deep neural network-based method is developed to achieve fast prediction of optimal trajectories for a hypersonic vehicle. First, the reentry phase of a hypersonic vehicle is formulated as an optimal control problem, and the pseudo-spectral method is developed to provide optimal solutions for DNN training. The developed DNN model is optimized on the test set regarding the numbers of layers and neurons, learning rate, and activation functions. Based on the optimized DNN model, the DNN-based method and improvement techniques are developed and employed to solve the optimal trajectory problem. The proposed method is verified by numerical simulations, and the results demonstrate that the DNN-based method has the advantages of fast solving speed and excellent convergence.
The proposed method provides an original idea for the online trajectory optimization of a hypersonic vehicle, and the trajectory optimization of the entire trajectory can be accomplished accurately in only a few seconds. Similarly, the proposed method can be applied to other models in the aerospace field, such as lunar landing and asteroid detection models. In future work, more complex flight missions and more rigorous constraints, including no-fly zones, are considered to verify the effectiveness of the proposed method. We will also adopt more elaborate network structures to enhance the learning accuracy.