3.1. Generation of Sample Set
Due to the high production volume of low-cost hypersonic glide vehicles, random deviations exist between the actual thrust and the nominal thrust of different booster engines during the boost phase, leading to certain uncertainty in the initial glide state. In this paper, based on the modeling in
Section 2.1, the hp adaptive pseudospectral method [
32] is used to solve the trajectory optimization problem established in
Section 2.3 with height and speed as the initial dispersion conditions.
The hp comes from the h method (refined mesh element) and the p method (increasing the order of polynomials in the element) in the finite element method. The hp adaptive pseudospectral method is mainly divided into two parts: discretization and mesh refinement. Firstly, the optimal control problem in
Section 2.3 is discretized and parameterized.
The time
is divided into
subintervals,
is used to represent the
grid,
is the number of collocation points of the
-th subinterval, and then the time interval is converted to
, so the following time domain transformation is made.
The state variable in the
-th subinterval can be expressed as
Among them, the Lagrange interpolation polynomial is:
In the formula: is the approximate value of the state on the -th grid, which is a function of . is the number of Lagrange-Gauss collocation points on the -th grid. is the node of the -th grid, is the value of the state variable at , and is the Lagrange polynomial at .
Then the system state equation can be expressed as
The Formula (21) is the Gauss pseudospectral differential matrix in the subinterval , which is a matrix of , and is the value of the control at .
In this paper, we only consider the performance index of integral type. The integral term of the performance index function is approximated by Gauss integral, and the objective function can be expressed as follows:
where
is the Gaussian weight and
is the original integral term.
After discretization, the optimal control problem is transformed into a nonlinear programming problem. The core of the hp adaptive pseudospectral method is to determine the values of h and p, that is, the number of subintervals and the order of the interpolation polynomial.
The degree of satisfaction of the constraint conditions is evaluated. The maximum allowable deviation is set to , and whether the error is less than at the collocation point is tested. If it is satisfied, the solution obtained in this interval is considered to be feasible. If the deviation is greater than , it is necessary to subdivide the subinterval or increase the order of the interpolation polynomial.
When the grid needs to be re-refined, it is first necessary to determine whether
p or h should be increased. Let the curvature function of the
-th state component in the mesh be
The maximum and average values of the curvature function are
and
, respectively. Let
Define judgment indicators . If , then p is increased, otherwise h is increased.
After discretizing the original optimal control problem, the sequential quadratic programming (SQP) algorithm is used to solve it. The core idea is that at each iteration point, the original problem is approximated as a quadratic programming sub-problem, and the search direction is obtained by solving the sub-problem, and then one-dimensional search and iteration are carried out until convergence. The solution process reference [
33].
This paper takes the trajectory optimization problem as the starting point, and clarifies the optimal control problem to be solved and its constraints. On this basis, the height and velocity parameters of the aircraft are scattered to construct a variety of different initial states to cover the actual tasks. Then, for each dispersion case, the pseudospectral method is used for numerical solution. By discretizing the continuous optimal control problem into a nonlinear programming problem, the optimal trajectory satisfying the accuracy requirement is obtained through the GPOPS [
34] in MATLAB. After solving, the state sequence and control sequence are extracted respectively to form a single trajectory data
. Finally, the state and control sequences of all distributed cases are summarized to form a trajectory data set
. where
represent the data points contained in a single trajectory, and
represents the total number of trajectories. The solution process is shown in
Figure 3.
3.2. Establishment and Training of RBF Neural Network Model
Radial basis function neural network (RBFNN) is a machine learning model with strong nonlinear mapping ability. The core idea of RBF neural network is to simulate complex nonlinear functions by local response neurons. It adopts a three-layer feedforward structure: first, the input layer sends the original data to the network; subsequently, the hidden layer uses the radial basis function to perform nonlinear transformation on the data. Each hidden layer neuron represents a center. Only when the input is close to the center, the neuron will be activated and respond. The response intensity decays rapidly with increasing distance, reflecting the characteristics of local perception. Finally, the output layer performs a linear weighted sum of the responses of all neurons in the hidden layer to obtain the final result.
Figure 4 shows the structure of the trajectory generation neural network.
The unit of the hidden layer is activated by the basis function. In this paper, the Gaussian basis function is used. The output of the
-th hidden layer is:
where
is the
-th input data,
is the basis function center of the hidden node, and
is the width of the radial basis function, also known as the spread factor. The center point
determines the function position, and the width
determines the range of action. The radial basis function image is shown in
Figure 5.
The output layer linearly combines the output of the radial basis function hidden layer to generate the expected output. The
-th output layer.
where
is the weight.
Randomly extract 70% of the total number of samples from the database as a training set to train the RBF neural network. The remaining 30% of the samples are used as test samples to test the accuracy of neural network calculation. In the training of this paper, in order to avoid the loss of accuracy of the trajectory data caused by the large difference in the order of magnitude of the input and output data, the input data and the output data are normalized.
The key to the training of RBF neural network lies in the selection of its function center point and the calculation of parameter weight. In this paper, the network is trained by referring to the orthogonal least square method in Chen [
35], and the parameters are updated according to the output data matrix.
The hidden layer output matrix
, the target output can be linearly represented by the
column vector of
. However, because the contribution ratio of
column vectors of
to
is obviously different,
vectors can be found in turn according to the contribution size to form
, so as to meet the error requirements, that is
In the formula, is the optimal weight that satisfies the error requirement. If different values of are selected, the approximation errors are different. Once is determined, the RBFNN data center will be determined. The weight matrix from the hidden layer to the output layer can be obtained by solving the inverse.
The training process of RBF neural network: Normalize the data by subtracting the mean value from the real value and dividing it by the standard deviation. The basic principle of the orthogonal least squares method is to add neurons one by one, and select the center point with the largest error reduction each time; the training termination condition is that the network output error is less than or the number of neurons reaches the upper limit.
In this paper, cross-validation is used to search the grid in the range of and the number of neurons . When , the network exhibits overfitting and the test set error increases; When , the network tends to be smooth and cannot capture complex dynamic characteristics. Similarly, when the number of neurons is less than 100, the model fitting is insufficient; after more than 250, the training time increases significantly and the accuracy improvement is limited. By analogy, considering the accuracy of the model and the efficiency of online generation, this paper selects and the number of neurons as the final configuration. This combination ensures that the network has the smallest generalization error for the test set while maintaining a lightweight structure.
The input layer of the above RBF neural network is set to the dispersion data of height and speed, namely
,
denotes the total number of discrete points,
denotes each discrete point. and the output layer of the neural network is set to the state sequence
control sequence
from the current state to the terminal point,
represent the data points contained in a single trajectory, and
represents the total number of trajectories. The training pseudo-code of the radial basis neural network trajectory generation model is shown in
Table 1, where
are the input and output matrices for unnormalized training respectively;
are the normalized input and output matrices for training;
are the mean values of the input and output matrix for training;
are the standard deviation of the input and output matrix for training;
is the input matrix for the normalized test;
is the neural network output matrix, which is the normalized value;
is the output matrix of the neural network after inverse normalization;
is the output matrix for testing;
is the width of the radial basis function, and
is the maximum number of neurons.
The establishment of the trajectory generation method in this paper ensures the ability of the aircraft to select the trajectory during flight, which can reduce the amount of calculation and quickly generate the trajectory online compared with the traditional optimization method.