1. Introduction
Hypersonic morphing vehicles with a variety of sweep angles have strong maneuverability [
1]. The vehicles can perform a variety of missions under different flight conditions with excellent flight performance [
2]. The research on this type of vehicle is mainly focused on structural design [
3,
4,
5], trajectory planning [
6,
7], and attitude control [
8,
9,
10], among which trajectory planning methods represent a very important research topic [
3].
A morphing vehicle is a kind of multi-purpose and multi-mode aircraft that can adopt deformations according to the environment and mission requirements. The trajectory, altitude, and speed of the aircraft are adjustable, so the aircraft can be adapted for multiple missions [
11,
12,
13,
14]. Most research on morphing aircraft has been carried out at low speeds. In the hypersonic realm, the Defense Advanced Research Projects Agency (DARPA) has proposed the Morphed Aircraft Structure (MAS) project [
15]. NASA has proposed the National Aerospace Plane Program and designed a manned, horizontal-takeoff-and-landing, single-stage-to-orbit, airbreathing launch vehicle [
16]. Takama, of the Japanese Space Agency, proposed a wave-rider with a wing configuration allowing improved drag lift performance at lower speeds [
17]. The study of hypersonic morphing vehicles will be an important research direction in the future.
Trajectory planning for hypersonic vehicles is usually divided into reference trajectory methods [
18,
19] and predictor–corrector methods [
20,
21]. Predictor–corrector algorithms have strong online planning ability, and the method and its improvements are often used to guide the re-entry of hypersonic vehicles. Liu et al., in [
22], using both the bank angle and attack angle as control variables, obtained much higher terminal altitude precision. M. Xu et al., in [
23], proposed a novel quasi-equilibrium glide auto-adaptive guidance algorithm based on the predictor–corrector concept that was able to meet the terminal position constraints. W Li, in [
24], proposed a guidance law using an extended Kalman filter to estimate the uncertain parameters for the re-entry flight of the X-33, which was of great value in reconfiguring auto-adaptive predictor–corrector guidance. Z. Liang, in [
25], proposed a guidance algorithm based on the reference trajectory and the predictor–corrector algorithm for the re-entry of vehicles that required less computing time, while offering high guidance precision and good robustness. Jay W. McMahon et al., in [
26], discussed recent developments in robust predictor–corrector methodologies for addressing the stochastic nature of guidance problems. Current predictor–corrector trajectory planning methods for aircraft usually consist of three steps: (1) determine the attack angle scheme, which is usually a linear transition mode; (2) calculate the size of the bank angle according to the range error; and (3) calculate the bank angle sign according to the aircraft heading. For the hypersonic morphing vehicle in this paper, in order to improve the trajectory performance, it is necessary to design the sweep and angle scheme. The research described above aimed to obtain the trajectory of the hypersonic vehicle and combine the findings with those obtained using other methods in order to improve its trajectory performance. However, for morphing aircraft, this research was not able to provide a guidance scheme; therefore, it is necessary to improve it, which will be the focus of this paper.
The reinforcement learning [
27] and deep learning [
28] methods have found many applications in trajectory planning algorithms due to their high levels of intelligence and efficiency. Z. Kai, in [
29], used a backpropagation neural network trained using the parameter profiles of optimized trajectories taking different dispersions into consideration in order to simulate the nonlinear mapping relationship between current flight states and terminal states. Using this guidance method based on trajectory, the neural network was able to satisfy both the path and terminal constraints well, offering good validity and robustness. Y. Lv, in [
30], presented a trajectory planning method based on Q-learning to solve the problem of HCVs facing unknown threats. Brian Gaudet, in [
31], used reinforcement meta-learning to optimize an adaptive guidance system suitable for the approach phase of gliding hypersonic vehicles, enabling trajectories to be obtained that would bring the vehicle to the target location with a high degree of accuracy at the designated terminal speed while satisfying heating rate, load, and dynamic pressure constraints. Monte Carlo reinforcement learning [
32] is a reinforcement learning approach used to control behavior [
33]. This method has been applied to solve many decision problems [
34,
35]. According to the aforementioned research, reinforcement learning has been proven to be applicable in aircraft guidance and is able to improve the guidance performance and trajectory performance of aircraft. Therefore, in this paper, the use of reinforcement learning is considered with the aim of improving the trajectory planning method in order to obtain a better flight trajectory, thus improving the mission performance of the aircraft.
This article is divided into four sections:
The motion model of the aircraft is established.
The basic predictor–corrector algorithm is given. The Q-learning algorithm is used to obtain an attack and sweep angle scheme that enables the crossing of no-fly zones from above. The B-spline curve method is used to solve the locations of flight path points to ensure that the aircraft can cross no-fly zones via these points. The size of the bank angle is solved on the basis of the state error of the aircraft arriving at the target and flight point. The changing logic of the bank angle sign is determined to ensure that the aircraft can fly safely to the target.
The Monte Carlo reinforcement learning method is used to improve the predictor–corrector algorithm, and a Deep Neural Network is used to fit the reward function.
The effectiveness of the algorithm is verified by a simulation.
3. Basic Predictor–Corrector Guidance Algorithm
This section introduces the basic predictor–corrector guidance algorithm, which can be used to steer aircraft to reach the desired final position while fulfilling the no-fly zone constraint. The results of the basic algorithm serve as the input of the improved algorithm learning network as a sample, providing training and evaluation data. The basic algorithm includes an attack angle and sweep angle scheme, a flight path point plan, and a bank angle scheme.
3.1. Attack Angle and Sweep Angle Scheme
In this section, the Q-learning algorithm is used to generate the attack angle and sweep angle commands to avoid type 2 no-fly zones.
3.1.1. Q-Learning Principles
In the Q-learning algorithm, the immediate reward rt = R(st,at) is first calculated after state st performs the action. Then, the state-action value function discount value γmaxQ(st+1, a) is calculated for the next state st+1. Then, the value function Q(st,at) in the current state can be estimated. If there are m states and n actions, the Q-table is an m × n matrix.
The objective of the algorithm is to find the optimal strategy π* by estimating the value of the state-action value function Q(st,at) in each state. The rows of the Q-table represent the states in the environment, and the columns of the table represent the actions that the aircraft can perform in each state. In the process of trajectory planning, the environment will provide feedback to the aircraft through reinforcement signals (reward function). During the learning process, the Q-value of the actions that are conducive to completing the task becomes larger with the number of times they are selected, while those not conducive to task completion will become smaller. Through multiple iterations, the action selection strategy π of the aircraft will converge to the optimal action selection strategy π*.
The rule for updating
Q-values is:
where max
Q(
st+1,
a) is the
Q-value corresponding to action a with the largest
Q-value found in action set
A when the aircraft is in state
st+1. The iterative
Q-value process of the
k-th iteration can be obtained as follows:
where
α ∈ (0,1) is the learning efficiency to control the rate of learning, i.e., the converging rate is proportional to the magnitude of
α. Generally, alpha is set as a constant value. It takes the form of a constant.
Q-learning approximates the optimal state-action value function
Q*(
s,
a) by updating the strategy.
Q*(
s,
a) is the maximum
Q-value function among all policies
π, represented by:
where
Q(
st,
at) is the state-action value function of all strategies
π and
Q*(
s,
a) is the maximum value function, corresponding to the optimal strategy
π*. According to the Bellman optimality equation, there is:
where
represents the immediate reward obtained by executing action at in state
st and reaching state
st+1. The greedy strategy is used in this paper.
The basic process of the Q-learning algorithm is as follows:
Selection of algorithm parameters: α ∈ (0,1), γ ∈ (0,1), and maximum iteration steps tmax;
Initialization: for all s ∈ S and a ∈ A(s), initialize Q(s, a) = 0 and t = 0;
For each learning round:
Initialize state st;
Using the strategy
π, randomly select
at at
st, and update Q:
- 4.
Reach the termination state, or t > tmax.
3.1.2. Q-Learning Algorithm Setting
The Q-learning network takes the aircraft motion model and the environment as inputs to obtain attack and sweep angle schemes. The parameters are set as follows:
The state in the algorithm needs to be determined based on the flight process. Considering that the range during the flight process usually varies monotonically, using it as a state variable can make the state variables exhibit a one-dimensional trend, which can avoid random changes between state variables and reduce the dimension of state variables to simplify the algorithm. The initial expected range of the aircraft is 6000 km; with every 300 km taken as a state, there can be 20 states: S (S1, S2, …, S20) = {0 km, 300 km, …, 6000 km}. At this time, there is no need to set a state transition function, and the state transition is Si → Si+1 (i = 1 … 19).
- 2.
Action set
Set the action set to Ai = (χ, α), which includes the sweep angle and attack angle. The sweep angles include 30°, 45°, and 80°, and the range of attack angle values is 5°~25°. Taking 5° as the interval, five conditions can be taken, namely 5°, 10°, 15°, 20°, and 25°, respectively, to obtain 15 actions. The action set can be expressed as A = {A1(30°, 5°), A2(30°, 10°), …, A15(80°, 25°)}.
- 3.
Reward function
The setting of the reward function is crucial, as it relates to whether the aircraft can avoid no-fly zones and reach the target. The rationality of the reward directly affects learning efficiency. Based on the environment, the reward function is set as follows:
where
Rb and
Rn are the rewards obtained by the aircraft when entering the no-fly zone and during normal flight, respectively,
Rb is set as a constant less than 0 to guide the aircraft to avoid the no-fly zone, and
Rf is set as a reward related to the aircraft’s velocity to enable the aircraft to store more velocity when reaching the target;
Rt is the reward for the arrival to the target, and setting it as a constant greater than 0 can guide the aircraft to reach the desired range; and
Rc is the reward when the aircraft does not meet flight constraints, and setting it to a constant less than 0 can ensure the safety of the aircraft’s flight performance.
In this section, the avoidance of type 2 no-fly zones has been achieved through the attack and sweep angle scheme, while the type 1 zones need to be avoided through lateral flight. The following is the lateral trajectory scheme. The attack and sweep angle schemes obtained in this section will be provided as inputs to the lateral planning algorithm.
3.2. Flight Path Point Plan
For the no-fly zones present in the environment, it is necessary to design avoidance methods. In the analysis in the last section, it can be seen that the type 2 zone can be avoided by pulling up the trajectory, while the type 1 zone cannot. Therefore, the type 1 zone needs to be avoided through lateral maneuvering, and it is necessary to plan the lateral trajectory. The B-spline curve is used to obtain flight path points, and the lateral guidance of the aircraft is realized by tracking the points.
3.2.1. B-Spline Curve Principle
The B-spline curve is composed of a starting point, an ending point, and control points. By adjusting the control points, the shape of the B-spline curve can be changed. B-spline curves are widely used in various trajectory planning problems due to their controllable characteristics [
37]. The B-spline curve is expressed as:
where
Pi is the control point of the curve,
P0 is the starting point,
Pn is the endpoint, and n is the order of the curve. As long as the first and last control points of the two B-spline curves are connected and the four control points at the connection are collinear, it can be ensured that the curve has the same position at the connection and the first derivative of the curve is the same. The concatenated curve will still be a B-spline curve. The lateral trajectory planning of the aircraft can be realized using this property.
3.2.2. No-Fly Zone Avoidance Methods
Considering the horizontal environment model, the no-fly zone is projected from a cylinder in a circle. A 2D B-spline curve is designed that satisfies the constraint, and then flight path points are obtained according to curve control points. The planning method is divided into the following steps:
- 1.
Based on the location of the circles, choose an appropriate direction to obtain the tangent points of the circles, and then select different combinations of tangent points to obtain the initial control points. If the initial point and target line pass through the threat zone, at least one tangent point is selected as the control point, and at most one tangent point is selected for each zone.
- 2.
Augment the initial control point set. The initial augmentation control point is located on the initial heading to ensure the initial heading angle and the intermediate augmented control points are located on both sides of the tangent points; then, the control point set is obtained. The initial position
P0 and end position
Pn of the curve correspond to the initial position of the aircraft and the target. In order to ensure that the aircraft can avoid the threat area, the aircraft must be on the other side of the threat area’s tangent line. Therefore, the B-spline curve is designed to be tangent to the circle of the zone. According to the characteristic of the curve, the tangent point can be the middle point of three collinear control points. Then, adjust the distance
d1 and
d2 between the two adjacent control points to control the curvature of the curve near the tangent point so that it does not intersect the circle, as shown in
Figure 3. In the figure,
P0~
P4 are the control points, and the red spline curve is tangent to the no-fly zone, preventing the curve from crossing the zone.
Choose the tangent point (P2) of the circle as the initial control point and augment the two control points (P1, P3) on both sides of the tangent point. The augmented control points are given by the distance (d1, d2) from the tangent point.
- 3.
Take the distance between the tangent point and the augmented point as the optimization variable. Take the spline curve length and mean curvature as the performance indicators. The optimal curve is obtained through a genetic algorithm, and the control points are obtained. The optimization model is as follows:
where
J1 and
J2 are two performance index functions,
Lb is the equivalent length of the curve, and
nb represents the mean curvature of the curve. The equations are as follows:
It should be noted that the curve is not the lateral trajectory of the aircraft, so its length cannot represent the flight range and its curvature cannot represent the overload of the aircraft. However, as characteristics of the curve, these elements can be used to evaluate the performance of the curve. The optimal B-spline curves are obtained through optimization. Curves that cross the no-fly zones are discarded, and then the one with the best performance index from all curves is selected.
- 4.
Simplify the control points to obtain the flight path points.
The simplification rules are as follows. (1) Simplify from the beginning point to the endpoint and delete the augmented control point of the starting point. (2) If multiple points are located on one line segment, delete the intermediate points and leave the two endpoints. (3) If there are four consecutive control points (P0~P3), after deleting the second control point P1, the angle of connecting lines through P0-P2-P3 is bigger than the original and does not cross the no-fly zone, so delete the second control point P1. (4) When the simplification is repeated until two consecutive point sets are identical, the simplification process is finished.
3.3. Bank Angle Scheme
The bank angle scheme includes the size scheme and the sign scheme, obtaining the value and the sign of the bank angle, respectively.
3.3.1. Bank Angle Size Scheme
The bank angle size scheme is achieved through the predictor–corrector algorithm. First, the horizontal error of the flight path point is predicted based on the attack and sweep angle scheme, and then the amplitude and size of the bank angle are corrected.
Based on the attack angle, sweep angle, and the initial bank angle, the equation of motion is integrated until the vehicle reaches the next path point. Then, the latitude position error
eϕ and the velocity error
ev are obtained. Using the secant method, the amplitude of the bank angle |
σmax| is corrected by
ev, and the size of the bank angle |
σ| is corrected by
eϕ. When the aircraft is between two points
Pn and
Pn+1, there is a relationship as shown in
Figure 4.
The correction process for the size of the bank angle is given as follows:
- (1)
Taking an initial value σ0 = 20°, integrate the equations of motion to the longitude of the target, and calculate the ev.
- (2)
For intermediate path points, if ev is less than 10% of the expected speed, correction is completed; otherwise, σ0 = σ0 + sgn(ev), so return to step (1). For the trajectory endpoint, no correction is required, so take σ0 = σ0 + 1.
To avoid large overshoots of position when the aircraft passes through the path point line, the bank angle size is set to be related to
ψ1 and
ψ2 in
Figure 4. This will reduce the bank angle as the aircraft approaches the path point connection line. The scheme is as follows:
where
ke > 0 is the coefficient of the bank angle error, which is determined by
eφ. The correction process is as follows:
- (1)
Taking an initial σ0 satisfying |σ0| < |σmax|, obtain ke1 at this time, integrate the motion equations to the longitude of the target and obtain eϕ1.
- (2)
Taking σ0 = 0 and ke0 = 0 at this time, integrate the equations of motion to the longitude of the next path point and calculate eϕ0.
- (3)
ke is obtained using the correction equation:
- (4)
Integrate the motion equations to the longitude of the target and then calculate eϕ.
- (5)
If eϕ < 0.01, the correction process is completed; otherwise, eϕ1 = min(eϕ1, eϕ0). Update ke1 and take ke0 = ke, eϕ0 = eϕ; return to step (3).
The above is the scheme of the bank angle size.
3.3.2. Bank Angle Sign Scheme
After obtaining the set of flight path points, each point should be tracked to ensure the correct heading of the aircraft. At this time, it is necessary to give the change rule of the bank angle sign.
The heading angle
ψp of the connecting line at points (
λ1,
ϕ1) and (
λ2,
ϕ2) is:
where
ψ1 =
ψs −
ψp and
ψ2 =
ψt −
ψp are the heading angle of the aircraft and the connecting line between the front and back path points. It is known that
ψ1 and
ψ2 have different signs. If the aircraft is located on the left side of the path point line (as shown in
Figure 4), then
ψ1 < 0 and
ψ2 > 0. The aircraft needs to increase the heading angle, and the bank angle is a positive sign. If the aircraft is located on the right side of the waypoint line (as shown on the other side), then
ψ1 > 0 and
ψ2 < 0. The aircraft needs to reduce its heading angle, and the bank angle is a negative sign. The bank angle’s sign-changing logic is:
where sgn(·) is a sign function.
The above is the entire process of the basic predictor–corrector guidance algorithm.
5. Simulation
The initial altitude of the aircraft is h0 = 68 km; the longitude and latitude are (λ0 = 0°, ϕ0 = 0°); the velocity is v0 = 5300 m/s; the initial ballistic inclination angle of the aircraft, the attack angle α0, and the bank angle σ0 are all 0°; the initial heading angle is ψ0 = 85°; the target point is located at (λt = 53.8°. ϕt = 5.4°); and the expected range s = 6000 km. There are two type 1 no-fly zones, with centers located at (23°, 4.5°) and (37°, 1.5°), and two type 2 no-fly zones, with centers located at (30°, 3°) and (45°, 6°). C language was used in the simulation, and the simulation environment was vs2021.
5.1. Simulation of Attack and Sweep Angle Scheme
According to the environment, two type 2 zones are set up, and
σ = 20°,
α = 0.01, and
γ = 0.99 are taken based on engineering experience. The change in total reward after 50,000 studies is shown in
Figure 6 and the flight process statuses are shown in
Figure 7,
Figure 8,
Figure 9 and
Figure 10.
The total reward increases rapidly before 20,000 iterations; then, the increase tends to converge, and the value is about 165. Because the reward for each (st, at) is set to be less than 1 and some actions cannot be selected, the more actions that can be selected, the greater the value, but it will not be greater than 300(st × at).
The trajectory shown in
Figure 7 could avoid type 2 no-fly zones, and the total reward of the algorithm tends to converge after 30,000 learning iterations. The longitudinal trajectory of the aircraft can avoid the no-fly zones and fly to a range of 6000 km. At a range of 3000 km and 4500 km, the aircraft changes both the attack and the sweep angle. And because the height of the aircraft at this time is less than 40 km, there is a relatively large air density, which can generate large lift, so the height of the aircraft began to increase. The trajectory rose to more than 35 km, successfully crossing the no-fly zone.
Figure 8 is the speed curve. In this curve, the time of flight is 1670s and the final speed is 1350. The speed decreases because the energy decreases throughout the flight, but the speed of this decrease alternates between fast and slow. This is because when the height of the aircraft increases, the kinetic energy is converted into gravitational potential energy, and the speed decreases rapidly. When the height of the aircraft decreases, the gravitational potential energy of the aircraft is converted to kinetic energy. Although the aircraft is subjected to drag, the speed change is still small. The schemes of the attack and sweep angle are shown in
Figure 9 and
Figure 10. The changes in the two angles determine the aerodynamic lift and drag during flight, which make the altitude of the trajectory change. It is under the action of these two angles that the aircraft can fly over the no-fly zones.
5.2. Flight Path Point Planning Results
Based on the method in
Section 3.2, the evaluation functions J of these eight trajectories are shown in
Table 1. All curves generated using single and double tangent points are shown in
Figure 11.
It can be seen that both the single and double tangent points generate four curves, and trajectory 5 is obtained as the optimal solution. The augmented and simplified points are shown in
Table 2.
Now, the path points required for trajectory planning are obtained.
5.3. Simulation of Network Training
Set the maximum number of iterations of network training to 1000, the minimum performance gradient to 10
−7, the maximum number of confirmed failures to 6, the target value of error limit to 0, and the learning rate to 0.05. The parameter settings in the reward value function are shown in
Table 3 and are based on engineering experience and the order of magnitude estimate.
The learning effect of the training process is shown in
Figure 12 and
Figure 13. Part of the sample (data from group 600 to group 800) was randomly selected for testing, and the test results were compared with the sample results, as shown in
Figure 14.
In the above results, it can be seen that when the number of iterations reaches 1000, the mean square error of the network converges to 9.2425 × 10
−7, which meets the requirement. The sample regression performance indicator R = 1 indicates strong data regression. As shown in
Figure 14, the test results basically coincide with the sample. The above results demonstrate the good fitting ability of the DNN, which can achieve an accurate and fast estimation of the rewards.
In
Figure 12, it can be seen that when the number of iterations reaches 1000, the mean square error of the network converges to 9.2425 × 10
−7, which meets the requirement. The sample regression performance indicator R = 1 indicates strong data regression, as shown in
Figure 13. As shown in
Figure 14, the test results basically coincide with the sample. The above results demonstrate the good fitting ability of the DNN, as it can achieve an accurate and fast estimation of the rewards.
- 2.
MC Algorithm Simulation
In this paper, α = 0.01, ε = 0.1, and
γ = 0.99 are set based on engineering experience. After 50,000 studies, the total reward is shown in
Figure 15.
The total reward increases rapidly before 800 iterations; then, the increase tends to converge after 1000 iterations. The value is about 78.
5.4. Simulation of the Trajectory Planning Algorithm
5.4.1. Scenario 1
In this scenario, the basic and improved algorithms are used for simulation. The parameters of the bank angle are shown in
Table 4, which are given by the order of magnitude estimate.
The results are shown in the following figures. Among them, trajectory 1 is the basic algorithm’s result, and trajectory 2 is the improved algorithm’s result.
According to the 3D, longitudinal, and lateral trajectories shown in
Figure 16,
Figure 17 and
Figure 18, the aircraft can reach the target using both methods. The trajectories can cross two type 2 zones (light red) from the top, avoid two type 1 zones (dark red) from the side, and fly through the planned path points. These results indicate the effectiveness of the attack and sweep angle scheme, path point curve scheme, and bank angle scheme. The improved method has a shorter trajectory. In both cases, the improved method takes less time and the trajectory makes fewer turns.
Figure 19 shows the bank angle curves, which show that there is a difference between the two methods: the trends are the same but the values are different. It can be considered that the improved method is an optimal solution of the basic method. The basic method sets a gradient optimization artificially and outputs the result if the aircraft reaches the target. In the process of MCRL, the whole process is deemed optimal, so its trajectory is better.
Figure 20 is the speed curve, and it can be seen that the final velocity of the improved method is 1350 m/s, which is bigger than the 700 m/s velocity of the basic method. The trend of velocity change is the same as in
Figure 8. Due to the same attack and sweep angle command, the flight path angles shown in
Figure 21 of the two trajectories are almost the same. It can be seen in the figure that the flight path angle curve has oscillatory changes, but the absolute value is not greater than 10°, indicating that the altitude of the aircraft will not change very dramatically. The difference in bank angle makes the heading angle vary greatly, as shown in
Figure 22. The change in the heading angle of the basic method is larger. The heading angle of trajectory 1 changes more dramatically after 900 s and reaches 156° at the end time, while trajectory 1 is only 110°, which indicates the advantage of the improved method. According to the
h-v flight profile in
Figure 23, it can be seen that the
h-v curves of the aircraft are all above the three overloads, heating rates, and dynamic pressure curves, indicating that the trajectories obtained by both methods meet the performance constraints of the aircraft. Finally, the improved method obtains a better end state for the trajectory.
5.4.2. Scenario 2
In order to explore the influence of changing the swept wing to avoid the no-fly zones on the trajectory, there are two trajectories in this scenario. In trajectory 1, the improved algorithm is used for simulation. In trajectory 2, the aircraft avoids the no-fly zones only through lateral maneuvers; that is, only the bank angle is adjusted, while the attack angle is fixed at 5° and the sweep angle is fixed at 45° during flight, producing the maximum lift–drag ratio. The locations of the no-fly zones are different in scenario 2. There are two type 1 no-fly zones, with centers located at (15°, 4.5°) and (50°, 1°), and two type 2 no-fly zones, with centers located at (25°, 4.5°) and (35°, 4.5°). All the parameters of this scenario are shown in
Table 5. The flight path point is obtained via the B-spline curve scheme, and the command of the bank angle is obtained via the MCRL method.
According to the 3D, longitudinal, and lateral trajectories shown in
Figure 24,
Figure 25 and
Figure 26, the aircraft can reach the target and avoid the no-fly zones using both methods. Trajectory 1 can cross two type 2 zones (light red) from the top and avoid two type 1 zones (dark red) from the side. Trajectory 2 can avoid all zones from the side. Both trajectories can fly through the planned path points. Because trajectory 2 flies around all the no-fly zones, its trajectory has a larger turn than trajectory 1.
Figure 27 shows the curves of the attack angle, and
Figure 28 shows the curves of the sweep angle. The curve of trajectory 1 changes in order to avoid the type 2 zones, which are obtained via Q-learning. The curves of trajectory 2 are fixed sets.
Figure 29 shows the bank angle curves, which are obtained using the bank angle scheme.
Figure 30 shows the speed curves, and it can be seen that the final velocity of trajectory 1 is 1650 m/s, which is smaller than the 2980 m/s velocity of trajectory 2. The trend of velocity change is the same as in
Figure 8. This indicates that in order to fly over the top of the no-fly zones, more energy must be consumed, so the speed will be lowered, and the trajectory will take more time. The flight path angles of the two trajectories are shown in
Figure 31. It can be seen in the figure that the flight path angle curves both show oscillatory changes. The absolute value of trajectory 1 is not greater than 8°, while the absolute value of trajectory 1 is not greater than 5°. This is because trajectory 1 needs to fly higher to avoid the no-fly zone, so its path angle will be larger to raise the altitude of the trajectory, which causes the decrease in speed. The heading angle is shown in
Figure 32. The change in the heading angle of trajectory 2 is larger than trajectory 1, indicating that in order to achieve the avoidance of the no-fly zones, its trajectory needs to have a larger turn, so a more drastic course angle change is generated. The
h-v flight profile is shown in
Figure 33. It can be seen that the
h-v curves of the aircraft are all above the three overloads, heating rates, and dynamic pressure curves, indicating that the two trajectories both meet the constraints of the aircraft.
6. Conclusions
Aiming to address the safe trajectory planning problem of hypersonic morphing vehicles, this paper designed a trajectory planning algorithm using the predictor–corrector method, including a basic and an improved algorithm. In the basic algorithm, the angle of attack and sweep angle commands, flight path points, and bank angle commands are generated. The problem of aircraft trajectory planning is thereby solved. In the improved algorithm, MCRL and DNN are used to improve the predictor–corrector method, which reduces the planned turning angle and increases the final speed. The improved method produces a better trajectory by consuming more energy while ensuring safe flight and that the target is reached. The current work can be enriched in the future from the following aspects. The trajectory planning method in this paper is carried out under ideal conditions and without considering the influence of errors, such as sensor noise. At the same time, the trajectory planning method of the aircraft in this paper is only a feasible method, whereas the optimality of trajectory under various conditions is not guaranteed. In future research, trajectory planning methods considering errors and multiple constraints will be an important topic. This can improve the flight performance of the vehicle, allowing for accomplishing a wider variety of missions.