Next Article in Journal
Airworthiness Compliance Methods for Low-Cost Wet Composite Structures in General Aviation Aircraft
Previous Article in Journal
Vector Field-Based Robust Quadrotor Landing on a Moving Ground Platform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones

1
Defense Innovation Institute, Chinese Academy of Military Science, Beijing 100071, China
2
Department of Bomber and Transport Aircraft Pilots Conversion, Air Force Harbin Flying College, Harbin 150088, China
3
Intelligent Game and Decision Laboratory, Beijing 100071, China
4
School of Astronautics, Beihang University, Beijing 100191, China
5
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
*
Authors to whom correspondence should be addressed.
Aerospace 2025, 12(7), 591; https://doi.org/10.3390/aerospace12070591
Submission received: 28 May 2025 / Revised: 25 June 2025 / Accepted: 27 June 2025 / Published: 30 June 2025
(This article belongs to the Section Aeronautics)

Abstract

Hypersonic morphing vehicles (HMVs), renowned for their adaptive structural reconfiguration and cross-domain maneuverability, confront formidable reentry guidance challenges under multiple no-fly zones, stringent path constraints, and nonlinear dynamics exacerbated by morphing-induced aerodynamic uncertainties. To address these issues, this study proposes a hierarchical framework integrating an A-based energy-optimal waypoint planner, a deep deterministic policy gradient (DDPG)-driven morphing policy network, and a quasi-equilibrium glide condition (QEGC) guidance law with continuous sliding mode control. The A* algorithm generates heuristic trajectories circumventing no-fly zones, reducing the evaluation function by 6.2% compared to greedy methods, while DDPG optimizes sweep angles to minimize velocity loss and terminal errors (0.09 km position, 0.01 m/s velocity). The QEGC law ensures robust longitudinal-lateral tracking via smooth hyperbolic tangent switching. Simulations demonstrate generalization across diverse targets (terminal errors < 0.24 km) and robustness under Monte Carlo deviations (0.263 ± 0.184 km range, −12.7 ± 42.93 m/s velocity). This work bridges global trajectory planning with real-time morphing adaptation, advancing intelligent HMV control. Future research will extend this framework to ascent/dive phases and optimize its computational efficiency for onboard deployment.

1. Introduction

The hypersonic morphing vehicle (HMV), a class of advanced aerospace systems capable of autonomously adjusting their structural configurations and aerodynamic profiles in real time to align with mission-specific requirements and performance objectives, represents a transformative integration of hypersonic flight dynamics and adaptive morphing technology [1,2,3]. By enabling continuous shape transformation, this innovative design paradigm facilitates substantial improvements in aerodynamic efficiency, maneuvering agility, and energy utilization [4,5]. Combining the velocity advantages of hypersonic vehicles with the adaptability of morphing structures, the HMV demonstrates dual capabilities: executing long-endurance, high-maneuverability flights while maintaining the capacity to self-adapt to complex mission demands and cross-domain environments [6]. However, the HMV suffers from complex flight environments, severe force–thermal constraints, and diverse missions in the glide phase. These factors induce significant fluctuations in aerodynamic load distributions and dynamic system parameters, thereby introducing pronounced uncertainties and strong nonlinear behaviors in flight control dynamics, which collectively escalate the complexity of reentry guidance strategies [7].
The reentry guidance of the HMV directs the vehicle with a smooth flight trajectory from the initial reentry point to the terminal area energy management (TAEM) interface at the target precisely without passing through the no-fly zones, provided that the process constraints and terminal constraints are satisfied. Reentry guidance methods include reference trajectory tracking guidance [8,9], predictor–corrector guidance [10], and quasi-equilibrium gliding condition (QEGC) guidance [11,12]. Due to their special reentry trajectory characteristics, the resulting QEGC guidance method is the focus of current research on hypersonic vehicles. By decomposing the reentry guidance into longitudinal and lateral guidance, so that the angle of attack and bank angle are jointly used as control inputs, the QEGC guidance method shows a high control accuracy and excellent robustness in performance with the help of advanced control methods [13]. However, its control of velocity is underactuated and requires the correction of terminal velocity by analytic prediction. Sliding mode control can be used for longitudinal and lateral control in QEGC guidance because of its fast response, good robustness, and simple physical implementation [14,15].
No-fly zones are geographic areas that must be avoided due to air defense threats and political restrictions. Reentry trajectories with no-fly zones require the consideration of complex space constraints. Commonly, no-fly zone avoidance methods include energy-optimal waypoint planning [16], lateral flip logic avoidance [17], the artificial potential field method [18,19], and direct trajectory optimization methods [20,21]. These methods keep away from no-fly zones by planning nominal trajectories. The vehicle is confronted with difficulty flying to the target location when no-fly zones are changed. Search solving is one of the important methods of early artificial intelligence (AI) algorithms. Inspired by the ideas of ground robot and UAV path planning, graph search-based algorithms are also applied to the reentry trajectory planning problem [22,23]. The A* search algorithm in a graph search has an efficient search performance, and it searches feasible paths by setting evaluation functions [24]. The A* algorithm is one of the crucial algorithms for solving the shortest path search problem, capable of exploring based on the spatial information of the problem area and utilizing auxiliary information for heuristic searching [25].
Designing a morphing policy and using morphing ability to improve flight performance are important works of HMV flight control research. Most of the previous studies on HMVs have focused on attitude stability and control under morphing [26,27], while there are relatively few studies on morphing policy. Common morphing policy-solving methods include optimization algorithm [28,29], integrated guidance attitude control solving [30], and reinforcement learning (RL) methods [31,32]. In recent years, with the development of AI represented by RL and deep learning (DL) [33], this also provides new technical directions for exploring the intelligent flight control technology of the HMV [34], and it has been successfully applied to reentry trajectory planning for hypersonic vehicles [35]. Deep reinforcement learning (DRL) techniques are mainly used to solve the continuous decision-making problem of an agent. Consequently, DRL is very suitable for learning the morphing policy of the HMV by continuously interacting with the environment and improving the continuous action policy according to the reward or punishment feedback.
Inspired by the above research, an intelligent coordinated reentry guidance method based on the A* search algorithm and RL is proposed. The intelligent coordinated reentry guidance scheme is shown in Figure 1, and the framework of the reentry guidance is divided into three layers: the planning layer, guidance layer, and output layer. The planning layer includes two tasks. One is to plan the waypoints under multiple no-fly zones through the A* search algorithm and obtain an avoidance policy. The other is learning the morphing policy based on the RL algorithm to obtain the morphing command under different flight states. By combining the A* algorithm with DRL, utilizing A* to tackle the no-fly zone issues and employing DRL for morphing control, we can specifically address the trajectory planning and guidance problems of HMVs under no-fly zone conditions, enhancing the efficiency of online intelligent guidance. The guidance layer receives the waypoints and morphing command from the planning layer and calculates the longitudinal and lateral guidance inputs through the QEGC guidance law by the sliding mode control method. In the output layer, the guidance control input and the morphing command are solved jointly to obtain the corresponding angle of attack, bank angle, and sweep angle, and the input constraints are added to complete the coordinated reentry guidance for the HMV with no-fly zones. The main contributions of this paper are as follows:
  • The energy-optimal avoidance reentry trajectory search approach based on A* algorithm, effectively solves the challenge of finding optimal waypoints under the intricate constraints of multiple no-fly zones.
  • Leveraging DRL algorithms, an intelligent autonomous morphing policy network for HMVs is trained. This network intelligently utilizes morphing commands to adaptively control terminal velocity, resulting in a significant optimization of reentry guidance performance.
  • A QEGC guidance law that relies on a continuous switching sliding mode control is proposed. This approach transforms path constraints into direct guidance input constraints, enabling high-precision guidance with a robust performance.
The rest of this paper is organized as follows: Section 2 establishes the motion models of the HMV. Section 3 presents the no-fly zone avoidance policy. Section 4 describes the QEGC guidance law. The morphing policy based on DRL is discussed in Section 5. The results are tested in Section 6. Finally, Section 7 summarizes the main work.

2. Models of the HMV

2.1. Morphing Mode

The morphing mode of a class of a variable-sweep hypersonic vehicle that we studied is presented in Figure 2. Let the sweep angle of the wing be χ , the morphing of the HMV is the simultaneous change in both wing trailing sweep angles, and χ _ χ χ ¯ , where χ ¯ ,   χ _ are the upper and lower bounds of the sweep angle, respectively. Define the morphing rate as
k = χ χ _ χ ¯ χ _
When k = 0 , the sweep angle is the smallest value χ _ ; when k = 1 , the sweep angle is the largest value χ ¯ .

2.2. Motion Models

The reentry kinematic equations of the HMV are as follows:
r ˙ = V sin θ ϕ ˙ = V cos θ cos σ / r λ ˙ = V cos θ sin σ / ( r cos ϕ )
where r is the geocentric distance, ϕ is the geocentric latitude, and λ is the longitude. The three-degree-of-freedom dynamics equations for the HMV in the reentry coordinate system are as follows:
a = V ˙ V θ ˙ V σ ˙ cos θ = 1 m F a + G + F e + F k + F T + F m
where a is the acceleration vector, V is the flight velocity, θ is the flightpath angle, σ is the heading angle measured from the north direction, m is the mass of the HMV, G = m g V g θ g σ T is the Earth’s gravitational force vector, F e = F e V F e θ F e σ T is the centrifugal inertia force vector, F k = F k x F k y F k z T is the Coriolis inertia force vector, F T = F T x F T y F T z T is the implication force resulting from the conversion of the ballistic coordinate system to the reentry coordinate system, and F m = F m x F m y F m z T is the vector of morphing-added forces due to shape change. The specific expressions of the above force vectors in the reentrant system are given in Ref. [7]. F a = F a V F a θ F a σ T is the aerodynamic force vector with the following expressions:
F a = F a V F a θ F a σ = q S 0 C D C L cos υ C N sin υ C L sin υ + C N cos υ
where q is the dynamic pressure, S 0 is the reference area, υ is the bank angle, C L is the lift coefficient, C D is the drag coefficient, and C N is the lateral force coefficient. The following part analyzes the effect of morphing on the aerodynamic coefficient of the HMV.

2.3. Effects of Morphing on the Aerodynamic Performance of the HMV

The effects of morphing on the longitudinal aerodynamic coefficient of the HMV in our research are shown in Figure 3, where K is the lift-to-drag ratio. It can be seen that the morphing can significantly change the drag coefficient, lift coefficient, and lift-to-drag ratio of the HMV at a different velocity, so the aerodynamic characteristics of the HMV can be changed by adjusting the sweep angle to achieve the optimal aerodynamic performance to meet the current flight requirements.

2.4. Modeling of Constraints

The constraints of altitude, velocity, range, and heading error need to be satisfied when the vehicle reaches the TAEM interface as follows:
h f = h T Δ L R f = Δ L R T V f = V T Δ σ f Δ σ T
where h f is the terminal altitude, Δ L R f is the range between the terminal position and the target position, V f is the terminal velocity, Δ σ f is the terminal heading error, h T is the target terminal altitude, Δ L R T is the range between the desired terminal position and the target point, V T is the target terminal velocity, and Δ σ T is the target terminal heading angle error. The path constraints include heat rate Q ˙ , overload n , dynamic pressure q constraints, and QEGC constraints as follows:
Q ˙ = k h ρ 0.5 V 3.15 Q ˙ m q = ρ V 2 2 q m n = D 2 + L 2 m g 0 n m θ ˙ = m ( g 0 v 2 r ) cos θ L cos υ = 0
where k h is the heat flow density coefficient, ρ is the atmospheric density, Q ˙ m is the maximum constraint of the heat flow density, q m is the maximum constraint value of the dynamic pressure, D and L are the drag and lift force, g 0 is the gravitational acceleration, n m is the maximum constraint value of the overload, and θ ˙ is the flightpath angle rate. The bottom surface of no-fly zones is the no-fly circle with the no-fly center as the circular point. In order to avoid the no-fly zones, it is necessary to make the HMV’s position projection on the ground outside the no-fly circle. So,
L R i R Z i
where L R i is the spherical distance between the position projection and the center of the i -th no-fly zones, and R Z i is the radius of the first no-fly circle. Considering attitude stability requirements and the limitations of the morphing and control mechanisms, the attack angle α , the bank angle υ , and the sweep angle χ of the HMV need to be limited to a certain range as follows:
α min α α max υ min υ υ max χ min χ χ max
where α min , υ min , and χ min are the minimum constraint values of the angle of attack, bank angle, and sweep angle, respectively; α max , υ max , and χ max are the maximum constraint values of the angle of attack, bank angle, and sweep angle, respectively.

3. No-Fly Zone Avoidance Policy Based on the A* Search Algorithm

3.1. No-Fly Zone Avoidance Way

As shown in Figure 4, O is the current waypoint of the HMV, T is the next waypoint, Z 1 is the center of the circle of the no-fly zones, and R Z 1 the radius of the no-fly zones. In order to ensure the safety of the avoidance flight, a safety factor will be added to the radius of the no-fly zones to appropriately expand the no-fly zones as follows:
R Z 1 = 1 + ϖ R Z 1
where ϖ is the safety factor; Z 11 and Z 12 are the two points of tangency O with respect to the safety radius R Z 1 of the no-fly circle Z 1 ; σ O T , σ O Z 11 , and σ O Z 12 are the azimuthal angles of the trajectory O T , O Z 11 , and O Z 12 , respectively; and the azimuthal angle σ c is expressed as
σ c = arctan sin λ i λ 0 cos ϕ 0 tan ϕ i sin ϕ cos λ i λ 0
where λ 0 and ϕ 0 are the current longitude and latitude of the HMV, and λ i and ϕ i are the target longitude and latitude, respectively. Define f l a g Z 1 O T as the relative position marker between the trajectory O T and the no-fly zone Z 1 as follows:
f l a g Z 1 O T = 0 if       σ O T σ O Z 12 , σ O Z 12 1 else
For N Z no-fly zones: if Π j = 1 N z f l a g Z j Z n i Z n + 1 i = 1 , the next waypoint to be flown Z n + 1 i and the flightpath remains unchanged; if Π j = 1 N z f l a g Z j Z n i Z n + 1 i = 0 , the next waypoints have to be adjusted. An optimal set of waypoints need be solved to avoid all the no-fly zones and satisfy the minimum energy consumption of the HMV.

3.2. Design of Evaluation Function

As shown in Figure 5, for the avoidance flight problem under multiple no-fly zones, the HMV starts from the initial position and searches for the next feasible waypoint in the known graph at each current waypoint, so that the flight trajectory can avoid all no-fly zones until it reaches the target position. Finally, the avoidance trajectory with an optimal objective function is obtained. The A* algorithm performs a heuristic search by the evaluation function, which is set to estimate the cost that the current waypoint needs to spend to reach the target. The evaluation function is designed as follows:
J n = J b n + J n
where n denotes the serial number of the node, J b n is the path’s cost from the initial node O to node n , and J n is the estimated value of the cost from node n to the target. The avoidance flight of the HMV with multiple no-fly zones will lead to a large additional energy loss. However, in order to complete the long-range mission to reach the target, the HMV needs to have sufficient energy. Therefore, the additional energy loss for avoidance flight must be considered, and the optimal set of waypoints needs to satisfy the minimum energy consumption as follows:
min J b n = min i = 2 N Δ E i
where N denotes the number of waypoints, and Δ E i is the additional energy consumed when passing from the i 1 th waypoint to the i -th waypoint. To express the additional energy loss directly using the modeling state in Figure 4 and Figure 5, reference [11] treats Δ E i as the line of sight azimuth angle increment Δ σ c i as
J b n = i = 1 N 1 Δ σ c i , Δ σ c i = σ c i σ c i 1
where σ i denotes the line-of-sight azimuth angle when passing the i -th waypoint and σ c 0 = σ 0 . The line-of-sight azimuth angle increment Δ σ c i between the current waypoint to the target T is regarded as the estimate of the remaining cost as follows:
J n = Δ σ c T , Δ σ c T = σ c T σ c n
From Equations (14) and (15), the optimal set of waypoints between multiple no-fly zones is equivalent to the sum of the increments of the line of sight azimuth angle minimized.

3.3. The A*-Based Waypoint Search Algorithm

With the graph modeling and design of the evaluation function, the A*-based multiple no-fly zones waypoint search algorithm is as follows (Algorithm 1):
Algorithm 1: A*-based multiple no-fly zones waypoint search algorithm
Initialization: input the starting position O , target T , and no-fly zones location and radius parameters, and create two empty tables A1 and A2;
if f l a g Z j O T = = 1 , j = 1 , 2 , , N Z :
No operation. Direct flight from O to T .
 else: Store the starting position O in A1
 while f l a g Z j Z n i T = = 0 or A 1
  for i = 1 : N A 1 ( N A 1 is the number of waypoints in A1)
   if Π j = 1 N z f l a g Z j Z n i Z n + 1 i = 1
    Calculated Δ σ c i from Equation (10);
    Combine Equations (13), (15) and (16) to get J i
   end
  end
   Z n i = arg min J n ;
  Extended Z n i to the set of all waypoints satisfied Π j = 1 N z f l a g Z j Z n i Z n + 1 i = 1 ;
   C Z n i A 1 , Z n i A 2 , C Z n i is stored in A1 and the waypoints that A1 has expanded are stored in A2: C Z n i A 1 , Z n i A 2 .
  end
 end
Output the optimal set C o p t i m a l of the waypoints in A2
The above A*-based multiple no-fly zones waypoint search algorithm can solve the waypoints under multiple no-fly zones, thus completing the search solution of the no-fly zone avoidance policy in the planning layer of the HMV. The obtained optimal set C o p t i m a l of no-fly waypoints will be used for the guidance scheme design.

4. QEGC Guidance Law Based on Continuous Switching Sliding Mode

4.1. Longitudinal Guidance

The dynamic equation of range angle L R is given by
L ˙ R = V cos θ r
Combining Equation (16) with Equation (2) yields
L ˙ R r ˙ = 1 r tan θ
The long range in reentry flight is much larger than the change in altitude, so its longitudinal motion can be regarded as an approximate isometric flight, and the flightpath angle is small and can be regarded as constant. Therefore, by integrating Equation (17), the total range angle to the target TAEM interface is obtained as
L R = ln r T / r tan θ
where r T is the geocentric distance at the TAEM interface. The range angle between the current point and the target point can be found from the knowledge of spherical geometry as
L R T = acos sin ϕ T sin ϕ + cos ϕ T cos ϕ cos λ T λ S T A E M
where S T A E M is the spherical angle corresponding to the radius of the TAEM interface. By combining Equation (19) with Equation (18), the command of the flightpath angle can be obtained as
θ c = arctan ln r T / r L R T
Considering Equation (3), the control-oriented flightpath angle model is expressed as follows:
θ ˙ = f θ x + g θ x u θ
where
x = r λ ϕ V θ σ T
f θ x = 1 m V G θ + F θ e + F θ k + F θ T + F θ m
g θ x = ρ V S 0 2 m
u θ = C L α , χ cos υ
The sliding mode surface is selected as
s θ = e θ
where e θ is the tracking error of the flightpath angle command as follows:
e θ = θ θ c
In the traditional sliding mode convergence law, the switching function designed by segmented noncontinuous derivable symbolic functions sgn ( · ) or saturated functions sat ( · ) leads to an unsmooth control input and is not suitable for derivation. The hyperbolic tangent function tanh · is continuous and smooth, so it can effectively overcome the chatter of the sliding mode control. The sliding mode convergence law is designed as
s ˙ θ = k θ 1 s θ k θ 2 tanh s θ / ε θ
where k θ 1 and k θ 2 are the positive gains, and ε θ > 0 is the boundary layer thickness. Combining Equations (21) and (26) with Equation (28), the control input of longitudinal guidance is designed as
u θ = g θ 1 k θ 1 s θ k θ 2 tanh s θ / ε θ + θ ˙ c f θ
The first-order SMC with hyperbolic tangent is chosen based on key considerations for hypersonic morphing control. It effectively handles bounded disturbances to meet a strict tracking precision without unnecessary complexity. Its minimal computational load suits real-time constraints, preserving resources for the DDPG policy and balancing efficiency.

4.2. Lateral Guidance

The lateral guidance has to keep the flight direction aligned with the target direction, and therefore the error between the heading angle and the line of sight azimuth angle shown in Equation (10) needs to be eliminated. The control-oriented heading angle model is obtained from Equation (3) as follows:
σ ˙ = f σ x + g σ x u σ
where
f σ x = 1 m V cos θ G σ + F σ e + F σ k + F σ T + F σ m
g σ x = ρ V S 0 2 m cos θ
u σ = C L α , χ sin υ
The sliding mode surface is selected as
s σ = e σ
where e σ is the tracking error of the heading angle command as follows:
e σ = σ σ c
The sliding mode convergence law is designed considering the time to fly as follows:
s ˙ σ = k σ 1 T g s σ k σ 2 T g tanh s σ / ε σ
where T g is the time to fly, and it can be approximated calculated by
T g = L R T R 0 V cos θ
where k σ 1 and k σ 2 are the positive gains, ε σ > 0 is the boundary layer thickness, and L R T is the flight range-to-go. By adding the T g in Equation (36), the convergence law is slower when the HMV is far away from the target position, so as to prevent difficulty in maintaining the flight altitude when the bank angle is large due to excessive lateral commands. By contrast, the convergence law is faster when the HMV is near the target position to ensure the control accuracy of the heading angle. Combining Equations (26), (28) and (30), the control input for lateral guidance is obtained as
u σ = g σ 1 k σ 1 T g s σ k σ 2 T g tanh s σ / ε σ + σ ˙ c f σ

4.3. Stability Analysis

Define the Lyapunov function of guidance system as
U = 1 2 s θ 2 + 1 2 s σ 2
Taking the derivative on both sides of the Equation (39) yields
U ˙ = s θ s ˙ θ + s σ s ˙ σ
Substituting Equation (29) into Equation (40) yields
s θ s ˙ θ = s θ k θ 1 s θ k θ 2 tanh s θ / ε θ = k θ 1 s θ 2 k θ 2 s θ tanh s θ / ε θ
where s θ tanh s θ / ε θ 0 holds for all s θ ; thus we have
s θ s ˙ θ k θ 1 s θ 2
and by the same token
s σ s ˙ σ   = k σ 1 T g s σ 2 k σ 2 T g s σ tanh s σ / ε σ k σ 1 T g s σ 2
Substituting Equations (42) and (43) into Equation (39) yields U ˙ k U , where k = 1 2 min k θ 1 , k σ 1 T g . Therefore, the close control loop system of the fight path angle and heading angle is asymptotically stable.

4.4. Conversion of Control Input

After obtaining the control input for longitudinal and lateral guidance as in Equations (33) and (38), the bank angle and lift coefficient can be solved by Equations (25) and (33) as
υ = arctan 2 u θ , u σ C L α , χ = u θ 2 + u σ 2
The lift coefficient C L α , χ is determined by both the angle of attack and the sweep angle, and the sweep angle needs to be known in order to solve the angle of attack. For the HMV, the morphing policy needs to be planned to improve the flight performance.

5. DRL-Based Morphing Policy

The proposed method decouples global trajectory planning from local morphing adaptation to achieve full-trajectory optimality without dynamic interaction. The A* algorithm optimizes macro-scale energy consumption by generating waypoint sequences that minimize cumulative velocity heading changes—a state-dependent objective independent of morphing dynamics between waypoints. Meanwhile, the DDPG-based morphing policy optimizes local aerodynamic performance within these fixed waypoint constraints.

5.1. The MDP Model for Morphing Policy Learning

The mathematical model of RL is usually described by the Markov decision process (MDP) model, which generally consists of five elements ( S , A , P , R , γ ) , where S and A are the state space and action space of the agent, respectively; P is the environment dynamic transfer function; R is the reward function; γ is the discount factor; and γ 0 , 1 . In the morphing policy learning problem of the HMV, the agent is the sweep angle morphing mechanism, and the environment is the flight environment, the dynamics model, and the guidance law of Section 4. For the reentry guidance of the HMV, its control requirements of velocity and altitude are strict, so the morphing can be used to enhance the longitudinal flight performance. Therefore, the state s of RL is set as follows:
s = r V Δ θ L R α T
The action a of the agent is the sweep angle of the HMV:
a = χ
The real-world environmental dynamics transfer function is 1. The discount factor γ determines the role of future rewards on the current cumulative rewards. Because of the long-time step of the reentry flight, γ of a larger value less than 1 is appropriate to ensure that future rewards play a role in the current decision.

5.2. Design of Reward Functions

From the effect of morphing on the aerodynamic characteristics of the vehicle in Section 2.3, the regulation of morphing on the guidance performance is reflected in the following three aspects: (1) by adjusting the drag coefficient through morphing, it is possible to adjust the velocity loss and thus satisfy the terminal velocity constraint; (2) by adjusting the lift coefficient through morphing, the longitudinal motion control can be adjusted, thus improving the tracking performance of the flightpath angle; (3) adjusting the lift-to-drag ratio by morphing is similar to adjusting the drag coefficient, which can adjust the energy loss during flight and realize the optimization of velocity or range.
For the reentry flight of the HMV with no-fly zones, the following guidance requirements are necessary. (1) During the avoidance flight, the HMV needs to minimize the velocity loss between every two no-fly zone waypoints to keep enough energy to cope with the subsequent no-fly zone avoidance tasks and complete the reentry successfully. (2) When there is an error in the flightpath angle tracking, it is necessary to quickly adjust the lift coefficient to achieve rapid convergence and improve the accuracy of longitudinal tracking guidance. (3) At the end of the reentry flight between the last no-fly zone waypoint and the target position, the velocity loss needs to be adjusted according to the flight status and remaining range so that the terminal velocity constraint is satisfied. Therefore, based on the above analysis, the reward function is set as follows in order to realize the improvement of guidance performance by morphing:
R = c 1 Δ θ the   target   waypoint   is   not   T   and   is   not   reached   c 2 V V Z the   target   waypoint   is   not   T   and   is   reached c 3 V V T the   target   waypoint   is   T   and   is   reached
where c 1 , c 2 , and c 3 are positive coefficients, and V Z is the designed velocity constant. From Equation (47), it can be seen that when the target waypoint is not T and the target waypoint is not reached, the optimization goal of the morphing policy is to reduce the flightpath angle tracking error. When the target waypoint is not T and the target waypoint is reached, the optimization goal of the morphing policy is to increase the velocity when it reaches the last no-fly zone waypoint, which is equivalent to reducing the velocity during the avoidance flight. When the target waypoint is T and the target waypoint is reached, the optimization goal of the morphing policy is to reduce the difference between the terminal velocity and the desired terminal velocity, so as to satisfy the terminal velocity constraint.
The reward function coefficients c 1 , c 2 , and c 3 arose from extensive ablation and sensitivity tests. Ablation tests removing path tracking, energy retention, or terminal velocity terms each caused critical performance degradations, proving all components essential. Sensitivity analyses fine-tuned coefficients across wide ranges, evaluating hundreds of combinations to balance trajectory precision, energy efficiency, and velocity control in low-altitude to hypersonic scenarios. This test process validated the final values c 1 = 1 , c 2 = 0.5 and c 3 = 4 as optimal trade-offs for robust real-world guidance performance.

5.3. Policy Learning Based on the DDPG Algorithm

For problems involving high-dimensional continuous state and action spaces, combining the deterministic policy gradient with the successful experience of the DQN algorithm forms the DDPG algorithm. This allows the solving of multi-dimensional state and continuous action space problems like morphing guidance for HMV. Define the cumulative reward during RL as
G t = k = 0 γ k R t + k + 1
where G t is the sum of all decaying rewards from the moment of time t onwards. Define the action value function Q μ s , a as the expected reward obtained by performing an action a for the current state s in the action policy μ as follows:
Q μ s , a = E μ G t s , a
The deterministic policy is more effective than the stochastic policy in dealing with high-dimensional state space, so the deterministic policy gradient algorithm is used to learn for high-dimensional continuous space problems such as HMV morphing policy planning. A deterministic policy function μ w s is defined to construct a mapping relationship between states s and deterministic actions a , and w is the parameter of μ w s . In this case, Q μ s , a can be calculated by the Bellman equation as follows:
Q μ s , a = E s E R s , a + γ Q μ s , μ s
The deep deterministic policy gradient (DDPG) algorithm can be used to solve the morphing policy of the studied HMV. The DDPG implements “end-to-end” learning directly from the original data, as shown in Figure 6, where θ μ denotes the parameters of the Actor network and θ Q denotes the parameters of the Critic network. When training neural networks, if the same neural network is used to represent the target network (target) and the current update network (online), the learning process will be unstable. Consequently, two separate target networks Q s , a θ Q and μ s θ μ are created, where θ μ denotes the parameters of the target Actor network and μ and θ Q denote the parameters of the target Critic network Q . As shown in Figure 6, the Actor network outputs action based on the deterministic policy network, and the Critic network evaluates the action value function of the Actor network. Then the Actor network optimally updates the deterministic policy network parameters based on the policy gradient of the Critic network. It both exploits the advantages of the policy gradient and deep neural network-based methods for continuous problems and improves the stability of the network by “memory replay”.
After training a batch of data, the DDPG’s online Critic network is updated by minimizing the mean square error:
L = 1 N b i = 1 N b δ i T D 2
where N b is the number of batch samples, and δ T D is the time-series differential error as follows:
δ i T D = r i + γ Q s i + 1 , μ s i + 1 θ μ θ Q Q s i , a i θ Q
y i is calculated by the target Critic network Q and the target Actor network μ to make the learning process of the network more stable and easier to converge. Once the loss function L is obtained, the gradient θ Q L can be expressed as follows:
θ Q L = 1 N i = 1 N δ i T D θ Q Q s i , a i
Then θ Q is updated by gradient descent:
θ Q θ Q + β Q θ Q L
where β Q is the update step, and θ μ is updated by the deterministic policy gradient theorem [36] as follows:
θ μ θ μ + β μ θ μ μ
where β μ is the update step, and the target network parameters are softly updated by the sliding average method:
Soft   update :   θ Q τ θ Q + 1 τ θ Q θ μ τ θ μ + 1 τ θ μ
The flow of the DDPG algorithm is summarized as follows (Algorithm 2):
Algorithm 2: DDPG algorithm
Initialize online Critic network Q s , a θ Q and online Actor network μ s θ μ
Copy the parameters of the online Critic network and online Actor network to the corresponding target network: θ Q θ Q , θ μ θ μ
Initialize the capacity of replay memory D
For episode = 1 to M do
Initialize Gaussian noise distribution N
Initialization state s
 for t = 1 to end do
    a t = Clip Clip μ s θ μ + A σ ρ , a t 1 Δ t υ ˙ max , a t 1 + Δ t υ ˙ max , 0 , υ max
   Execute the action a , get the reward r and the new state s
   Store s , a , r , s to memory D
   n samples s i , a i , r i , s i are randomly selected from D
   Calculate δ i T D = r i + γ Q s i + 1 , μ s i + 1 θ μ θ Q Q s i , a i θ Q , i = 1 , 2 , , n
   Update the Critic network to the minimized Critic loss:
    θ Q L = 1 N b i = 1 N δ i T D θ Q Q s i , a i
    θ Q θ Q + β Q θ Q L
   The Actor network is updated by the policy gradient ascent:
    θ μ μ = 1 N b i N b a Q s , a θ Q s = s i , a = μ s i θ μ μ s θ μ s = s i
    θ μ θ μ + β μ θ μ μ
   Update the parameters of the Target network Q s , a θ Q and μ s θ μ
    θ Q τ θ Q + 1 τ θ Q θ μ τ θ μ + 1 τ θ μ
 end for
end For
By extensive sensitivity tests, each coefficient is optimized across a wide range of values. The DDPG parameters were chosen through systematic tuning to balance performance and computational efficiency: four hidden layers with 100 (Actor) and 120 (Critic) neurons using ReLU activation mitigate overfitting while capturing nonlinear dynamics; M = 3600 episodes ensure policy convergence, with γ = 0.99 prioritizing long-horizon rewards and a batch size of N b = 500 optimizing gradient accuracy; learning rates β μ = 2.5 × 10 6 and β Q = 0.001 prevent policy collapse, and τ = 0.001 stabilizes target network updates; Gaussian noise A σ = 36 ° scales to the sweep angle range for effective exploration.

6. Tests and Analysis

6.1. Solutions for the Path Constraint and Angle of Attack

The path constraints will be converted into the constraints of the angle of attack based on the QEGC and the relationship between the lift coefficient and the angle of attack. A transformation of Equation (6) yields
ρ ρ Q ˙ m = Q ˙ m k h V 3.15
ρ ρ q m = 2 q m V 2
C L C L n m = 2 m g 0 n m K ρ V 2 S 0 K 2 + 1
C L = 2 m ( g r V 2 ) cos θ ρ S 0 V 2 r cos υ
Substituting Equation (58) with Equation (60) yields
C L C L Q ˙ m = 2 m k h V 1.15 ( g r V 2 ) cos θ Q ˙ m S 0 r cos υ
C L C L q m = m ( g r V 2 ) cos θ q m S 0 r cos υ
Therefore, the lower bound of the lift coefficient can be obtained from Equations (61) and (62) as
C L down = min C L Q ˙ m , C L q m
Equation (59) determines the upper bound of the lift coefficient as
C L up = C L n m
Therefore, from Equations (63) and (64), we can obtain the lift coefficient constraint when the path constraint is satisfied as
C L down C L n m C L up
Considering the sweep angle is output by the policy network in Section 5, the commanded value of the angle of attack is inversely solved from the lift coefficient in Equation (44) with constraining by Equation (65) as follows:
α d = C L 1 max min u θ 2 + u σ 2 , C L down , C L up , χ
Recalling Equation (8), the actual angle of attack command can be limited as
α = max min α d , α min , α max
The stability and safety of the learned morphing policy are ensured through multiple mechanisms. Hard morphing constraints, enforced by network and denormalization, keep sweep angles within safe limits. In the guidance framework, coordinated control of sweep, attack, and bank angles maintains trajectory tracking, with Lyapunov analysis validating stability. Path constraints are transformed into angle-of-attack bounds, further safeguarding flight safety.

6.2. Setup of Simulation Parameters

The physical variables are m = 500 kg and S 0 = 5.66 m2. The initial flight states and target states are set as follows: H 0 = 70 km, λ 0 = 0°, ϕ 0 = 0 ° , V 0 = 6800 m/s, θ 0 = 0 ° , σ 0 = 0 ° , H T = 30 km, λ T = 90 ° , ϕ T = 0 ° , V T = = 2500 m/s, Δ L R T = 0 ° , Δ σ T = 0 ° , and S T A E M = 0 ° . Lacking physical HMVs and open-source models, simulation parameters were adapted from established hypersonic vehicle references like HTV and CAV, adjusted for our model’s structure and geometry. Therefore, the parameters of path constraints are set as follows: Q ˙ m = 3.8 × 106 kW/m2, q m = 90 kPa, n m = 4 , k h = 9.437 × 10−5.
The parameters of path constraints, control input constraints, and the guidance law are set as follows: α min = 2 ° , α max = 20 ° , υ min = 85 ° , υ max = 85 ° , χ min = 30 ° , χ max = 90 ° , k θ 1 = 1 , k θ 2 = 0.001 , ε θ = 0.001 , k σ 1 = 4 , k σ 2 = 0.001 , and ε σ = 0.001 . No-fly zones parameters are set as follows: λ Z 1 = 28 ° , λ Z 2 = 35 ° , λ Z 3 = 58 ° , λ Z 4 = 60 ° , λ Z 5 = 76 ° , ϕ Z 1 = 10 ° , ϕ Z 2 = 3 ° , ϕ Z 3 = 7 ° , ϕ Z 4 = 12 ° , ϕ Z 5 = 6 ° , R Z 1 = 6 ° , R Z 2 = 7 ° , R Z 3 = 9 ° , R Z 4 = 8 ° , R Z 5 = 5 ° , and ϖ = 3 % .
Simulations employ a fourth-order Runge–Kutta integration method with a fixed 0.05s step size, terminating upon reaching the 30 km TAEM interface altitude.

6.3. Avoidance Search Results Under Multiple No-Fly Zones

To compare the effects of the A* waypoint search algorithm under the multiple no-fly zones, the greedy best-first (GBF) search algorithm in Ref. [37] is selected for comparative simulation. The two searched avoidance trajectories are displayed in Figure 7. The set of waypoints from O Z 22 Z 32 T of the A* algorithm have the minimum value of the evaluation function with J A * , min O = 0.6449 . The search of the GBF algorithm is based on target, rather than full search. GBF prioritizes the expansion of waypoints close to the target, which means a small heuristic function value, so its search speed is faster. The set of waypoints from O Z 21 T of the GBF algorithm have the minimum value of the evaluation function with J G B F , min O = 0.6876 . Therefore, by comparing the minimum evaluation function of the two methods, the avoidance path of the A* algorithm reduced the value of the evaluation function by 6.2%, demonstrating its optimality for solving avoidance path searches in multi-no-fly zone conditions and achieving an optimal energy consumption for the HMV.

6.4. Training Results of RL

Following 3600 episodes of RL training for coordinated reentry guidance, the cumulative rewards trajectory is visualized in Figure 8, where the deterministic policy gradient-based DDPG algorithm is benchmarked against soft Actor–Critic (SAC) algorithm [38] to evaluate their control performance. As depicted in Figure 3, the DDPG algorithm demonstrates superior convergence velocity and stability metrics compared to its SAC counterpart. Quantitatively, the reward function of the DDPG framework progressively converges to a stabilized maximum value of approximately −16, whereas the SAC algorithm plateaus at a significantly lower reward magnitude of −43. This substantial performance disparity indicates that the morphing policy network instantiated by the DDPG algorithm achieves an optimal mission-specific parameterization.
The DDPG training process, comprising 3600 episodes, is executed offline on a workstation equipped with an NVIDIA RTX 4070 GPU, requiring approximately 10 h for completion. Only the finalized policy network architecture featuring four hidden layers with 100 neurons per layer is deployed onboard. Performance benchmarks conducted on an Intel Core i7 CPU using Torch 2.1 demonstrate a model memory footprint of 122.2 kB and a mean inference time of 0.1 ms. This time is significantly below the 50 ms guidance cycle requirement, confirming real-time operational feasibility for deployment on modern embedded systems possessing comparable computational capabilities.

6.5. Effectiveness of the Intelligent Coordinated Guidance Method

In order to verify the effectiveness of the coordinated guidance method, the no-fly zone waypoint search algorithm, the QEGC guidance law, and the morphing policy network are substituted into the reentry flight simulation of the HMV. To compare the effect of the morphing policy of the DDPG algorithm, three control groups, including two with a fixed morphing of χ = 30 ° and χ = 90 ° and the morphing policy of the SAC algorithm [38] are set. The simulation results are shown in Figure 9, Figure 10 and Figure 11. The average tracking errors of terminal range error Δ S f , terminal velocity error Δ V f , terminal velocity of avoidance flight V Z 32 , and the average tracking error of flightpath angle Σ Δ θ of the four morphing policies are shown in Table 1.
Table 1. Comparisons of reentry flight variables.
Table 1. Comparisons of reentry flight variables.
VariablesDDPGχ = 30° χ = 90° SAC
Δ S f km −0.090.010.240.39
V Z 32 m / s 5275.05074.145181.865260.2
Δ V f m / s −0.01−419.3980.6187.23
Σ Δ θ ° s 1.87 × 10 3 1.96 × 10 3 1.89 × 10 3 1.97 × 10 3
where Σ Δ θ is defined as follows:
Σ Δ θ = 0 T Δ θ t d t
where T is the total flight time. Figure 9 and Figure 10 show that the proposed coordinated guidance method can guide the HMV from the starting position to the TAEM interface without crossing the no-fly zones, and the terminal position error is kept below 0.1km. The terminal altitude error requirement is satisfied, since the simulation is cut off by the altitude. As shown in Figure 11 and Table 1, at the last avoidance waypoint Z 32 , the velocity of the RL morphing policy is the largest, 5275.0m/s, indicating that the reward function setting of DDPG can effectively reduce the velocity loss during the avoidance flight. Compared to SAC, DDPG’s deterministic policy network achieves a 0.09 km terminal position error, 75% lower than SAC’s 0.39km error, and a 0.01 m/s terminal velocity error, outperforming fixed morphing’s −87.23m/s error. This benefits from the DDPG algorithm’s greater learning ability. Comparing the terminal velocities of the three groups, the terminal velocity error of the DDPG morphing policy is only 0.01 m/s, while that of the three control groups are −419.39 m/s, 80.61 m/s, and 87.23 m/s, respectively. So, the reward function setting of DDPG can effectively reduce the terminal velocity error. Under the positive flightpath angle tracking error, the tracking error with χ = 30 ° is smaller due to the larger lift of the small sweep angle. By contrast, for negative flightpath angle tracking error, the error under χ = 90 ° is smaller because of the smaller lift of the large sweep angle. Therefore, it can be seen from Table 1 that the average tracking error of the flightpath angle under the DDPG morphing policy is the smallest, 1.87 × 10 3 , because of the adjustable sweep angle according to a different flight state. The errors of the heading angle are gradually eliminated during the flight time between waypoints and satisfy the terminal to heading angle constraint.
According to Figure 12, the angles of attack and bank angles under the three morphing policies are within the constraints of the control inputs and maintain the QEGC gliding flight in the longitudinal and lateral directions. Compared with the two fixed morphing groups, the sweep angle of the DDPG morphing policy is relatively more complex. The sweep angle will be kept at 30° to increase the lift to achieve upward tracking under the large positive tracking error. Otherwise, the sweep angle will be kept at 90° to decrease the lift to achieve downward tracking. During the avoidance flight with no-fly zones, the sweep angle is kept near the optimal sweep angle corresponding to the maximum lift-to-drag ratio, so as to increase the lift-to-drag ratio, reduce the velocity loss during the avoidance flight, and retain more energy for the subsequent mission. Figure 13 shows that the constraints of heat rate density, dynamic pressure, and overload are all kept below the corresponding constraints to keep a safe reentry flight.
By comprehensive timing measurements, pre-trained A* planning completes in under 0.1 ms. The DDPG policy achieves a 0.1 ms inference on Intel processors. The integrated planning–guidance cycle maintains a 1.60 ms average execution per step—just 3.2% of the 50 ms operational window. These metrics validate real-time feasibility for avionics deployment through significant computational margin retention.

6.6. Generalization of Intelligent Coordinated Guidance Method

The above simulation results are trained for a single mission, and in order to check the results of the generalization of the intelligent coordinated guidance method for different missions, other flight targets different from the training mission are given for simulations. Three different groups of target positions are set as T1 (85°, 16°), T2 (80°, −13°), and T3 (88°, −6°). Figure 14 shows that the proposed coordinated guidance method presents a good generalization performance and guides the HMV from the starting point to the TAEM interface of different targets without crossing the multiple no-fly zones. The terminal position and no-fly zone constraints are satisfied. The terminal position errors of T1, T2, and T3 are −0.21 km, −0.24 km, and −0.09 km, respectively. As shown in Figure 15, in three different flight targets, the terminal velocity errors are 5.90 m/s, 1.74 m/s, and 3.67 m/s, which are within the allowed range. This adherence to the allowed range indicates that the DDPG morphing policy can effectively constrain the terminal velocity. The tracking of the velocity bank angle, as well as the heading angle, satisfy the heading angle constraint. From Figure 16, it can be seen that the changes to the angle of attack and bank angle under the three different flight targets are within the input constraints. The DDPG morphing policy can cope well, making an autonomous adjustment to the sweep angle according to the flight missions. Figure 17 demonstrates that the path constraints are satisfied under the three different flight targets.

6.7. Robustness of the Intelligent Coordinated Guidance Method

To test the robust performance of the designed intelligent morphing coordinated guidance method, Monte Carlo experiments are performed under given random deviations of initial state, atmospheric density, and aerodynamic coefficients. In order to improve the adaptability of the deterministic DDPG algorithm to the nondeterministic environment, a correction term Δ χ = k V V V r , k V = 0.04 can be added to the sweep angle output from the policy network, where V r is the reference velocity in the nominal environment. Assuming that the deviations of the initial state, the atmospheric density, and the aerodynamic coefficient have the standard normal distribution, the 3 σ deviations are given as follows: Δ H 0 = 100   m , Δ λ 0 = 0.5 ° , Δ ϕ 0 = 0.5 ° , Δ V 0 = 100   m / s , Δ θ 0 = 0.5 ° , Δ σ 0 = 0.5 ° , Δ ρ = 20 % , Δ C L = 10 % , and Δ C D = 10 % . Gaussian noise with V σ = 10   m / s and H σ = 10   m were added to the velocity and altitude measurements, respectively. To mimic real-world actuator dynamics, a first-order delay ranging from 0.1 to 0.3s was applied to the sweep angle commands.
The obtained trajectories and velocities of the 1000 Monte Carlo experiments are shown in Figure 18. The results show that the designed intelligent coordinated guidance method can accomplish the reentry of the mission with no-fly zones under the deviations of the initial state, atmospheric density, and aerodynamic coefficient. The statistics of terminal position and velocity of the Monte Carlo experiments are presented in Figure 19. The mean value of terminal range error is 0.263 km, and the standard deviation is 0.184 km, while the mean value of terminal velocity error is −12.7 m/s, and the standard deviation is 42.93 m/s. Therefore, the errors of terminal range and velocity are within the allowed range, providing clear visual evidence of the method’s resilience to sensor noise, actuator delays, and parameter uncertainties in the reentry flight. Figure 20 shows the sweep angle variation curve with time delay. Figure 21 shows that the path constraints and flight safety are satisfied in Monte Carlo experiments.
The control framework’s inherent robustness is underpinned by two key mechanisms: DDPG’s adaptive learning capability enabling the policy network to learn and generalize across diverse uncertain dynamics during offline training for effective real-world performance, and a tanh-based sliding mode control layer in the guidance loop ensuring the finite-time convergence of tracking errors even under bounded disturbances for added stability and resilience.

7. Conclusions

This paper introduces an intelligent morphing coordinated reentry guidance method for hypersonic morphing vehicles (HMVs) under multiple no-fly zone constraints. The key contributions and findings are summarized as follows:
  • A hybrid framework integrating A* trajectory planning, DRL-based morphing control, and QEGC guidance law is proposed. The A* algorithm systematically generates energy-optimal avoidance trajectories by resolving waypoints within complex no-fly zones, outperforming the greedy best-first search method (GBF) with a 6.2% reduction in evaluation function value.
  • RLs provide an online implementation of policy network solutions for autonomous optimal morphing decision. The DDPG algorithm trains a morphing policy network to adaptively adjust the sweep angle in real time. The DDPG policy requires 122.2 kB memory and a 0.1ms inference time on an Intel Core i7, meeting the 50ms guidance period time.
  • The coordination guidance law of DDPG and QEGC ensures precise longitudinal and lateral tracking via continuous switching sliding mode control. Compared to fixed morphing policies and a SAC-based control, the proposed approach reduces the terminal position error to 0.09 km and terminal velocity error to 0.01m/s. The reward-driven DDPG policy optimizes velocity retention during avoidance maneuvers, achieving a terminal velocity of 5275.0 m/s and improving by at least 93m/s compared to the fixed sweep. Moreover, it maintains the minimum tracking error.
  • The framework demonstrates robust generalization and adaptability. Monte Carlo simulations validate its robustness, with terminal range errors confined to 0.263 km (mean) ± 0.184 km (std) and velocity errors to −12.7 m/s (mean) ± 42.93 m/s (std). Additionally, tests across three distinct target missions (T1–T3) show consistent terminal position errors below 0.24 km and velocity errors under 5.90 m/s, confirming the method’s capability to generalize beyond training scenarios.
  • This study bridges trajectory planning, adaptive morphing, and robust guidance for HMVs. By synergizing A*’s global optimization with DRL’s real-time adaptability, the method ensures safe reentry under stringent path constraints. Future work will focus on implementing model pruning and computational acceleration to further optimize the policy network for deployment on flight-grade embedded processors.

Author Contributions

The individual contributions of each author are as follow: Conceptualization, C.B. and G.T.; methodology, C.B., X.L. and W.Y.; software, C.B., X.L. and W.X.; validation, C.B. and X.L.; formal analysis, W.Y. and G.T.; investigation, C.B. and W.X.; resources, G.T. and W.Y.; data curation, C.B., X.L. and W.Y.; writing—original draft preparation, C.B., X.L. and W.X.; writing—review and editing, W.Y. and G.T.; visualization, C.B. and W.X.; supervision, W.Y. and G.T.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jin, Z.; Yu, Z.; Meng, F.; Zhang, W.; Cui, J.; He, X.; Lei, Y.; Musa, O. Parametric Design Method and Lift/Drag Characteristics Analysis for a Wide-Range, Wing-Morphing Glide Vehicle. Aerospace 2024, 11, 257. [Google Scholar] [CrossRef]
  2. Dai, P.; Yan, B.; Huang, W.; Zhen, Y.; Wang, M.; Liu, S. Design and aerodynamic performance analysis of a variable-sweep-wing morphing waverider. Aerosp. Sci. Technol. 2020, 98, 105703. [Google Scholar] [CrossRef]
  3. Cheng, L.; Li, Y.; Yuan, J.; Ai, J.; Dong, Y. L1 Adaptive Control Based on Dynamic Inversion for Morphing Aircraft. Aerospace 2023, 10, 786. [Google Scholar] [CrossRef]
  4. Li, D.; Zhao, S.; Da Ronch, A.; Xiang, J.; Drofelnik, J.; Li, Y.; Zhang, L.; Wu, Y.; Kintscher, M.; Monner, H.P.; et al. A review of modelling and analysis of morphing wings. Prog. Aeronaut. Sci. 2018, 100, 46–62. [Google Scholar] [CrossRef]
  5. Chu, L.; Li, Q.; Gu, F.; Du, X.; He, Y.; Deng, Y. Design, modeling, and control of morphing aircraft: A review. Chin. J. Aeronaut. 2022, 35, 220–246. [Google Scholar] [CrossRef]
  6. Cai, G.; Shang, Y.; Xiao, Y.; Wu, T.; Liu, H. Predefined-Time Sliding Mode Control with Neural Network Observer for Hypersonic Morphing Vehicles. IEEE Trans. Aerosp. Electron. Syst. 2025, 1–17. [Google Scholar] [CrossRef]
  7. Bao, C.; Wang, P.; Tang, G. Integrated guidance and control for hypersonic morphing missile based on variable span auxiliary control. Int. J. Aerosp. Eng. 2019, 2019, 6413410. [Google Scholar] [CrossRef]
  8. Zhou, X.; He, R.-Z.; Zhang, H.-B.; Tang, G.-J.; Bao, W.-M. Sequential convex programming method using adaptive mesh refinement for entry trajectory planning problem. Aerosp. Sci. Technol. 2021, 109, 106374. [Google Scholar] [CrossRef]
  9. Dai, P.; Feng, D.; Feng, W.; Cui, J.; Zhang, L. Entry trajectory optimization for hypersonic vehicles based on convex programming and neural network. Aerosp. Sci. Technol. 2023, 137, 108259. [Google Scholar] [CrossRef]
  10. Lu, P.; Brunner, C.W.; Stachowiak, S.J.; Mendeck, G.F.; Tigges, M.A.; Cerimele, C.J. Verification of a fully numerical entry guidance algorithm. J. Guid. Control Dyn. 2017, 40, 230–247. [Google Scholar] [CrossRef]
  11. Zhu, J.; Liu, L.; Tang, G.; Bao, W. Robust adaptive gliding guidance for hypersonic vehicles. Proc. Inst. Mech. Eng. Part G 2018, 232, 1272–1282. [Google Scholar] [CrossRef]
  12. Zhu, J.; Zhang, S. Adaptive Optimal Gliding Guidance Independent of QEGC. Aerosp. Sci. Technol. 2017, 71, 373–381. [Google Scholar] [CrossRef]
  13. Yao, D.; Xia, Q. Finite-Time Convergence Guidance Law for Hypersonic Morphing Vehicle. Aerospace 2024, 11, 680. [Google Scholar] [CrossRef]
  14. Huang, S.; Jiang, J.; Li, O. Adaptive Neural Network-Based Sliding Mode Backstepping Control for Near-Space Morphing Vehicle. Aerospace 2023, 10, 891. [Google Scholar] [CrossRef]
  15. Fazeliasl, S.B.; Moosapour, S.S.; Mobayen, S. Free-Will Arbitrary Time Cooperative Guidance for Simultaneous Target Interception with Impact Angle Constraint Based on Leader-Follower Strategy. IEEE Trans. Aerosp. Electron. Syst. 2025, 1–15. [Google Scholar] [CrossRef]
  16. Xie, Y.; Liu, L.; Liu, J.; Tang, G.; Zheng, W. Rapid generation of entry trajectories with waypoint and no-fly zone constraints. Acta Astronaut. 2012, 77, 167–181. [Google Scholar] [CrossRef]
  17. He, R.; Liu, L.; Tang, G.; Bao, W. Rapid generation of entry trajectory with multiple no-fly zone constraints. Adv. Space Res. 2017, 60, 1430–1442. [Google Scholar] [CrossRef]
  18. Hu, Y.; Gao, C.; Li, J.; Jing, W.; Chen, W. A novel adaptive lateral reentry guidance algorithm with complex distributed no-fly zones constraints. Chin. J. Aeronaut. 2022, 35, 128–143. [Google Scholar] [CrossRef]
  19. Zhang, D.; Liu, L.; Wang, Y. On-line reentry guidance algorithm with both path and no-fly zone constraints. Acta Astronaut. 2015, 117, 243–253. [Google Scholar] [CrossRef]
  20. Wang, S.; Ma, D.; Yang, M.; Zhang, L.; Li, G. Flight strategy optimization for high-altitude long-endurance solar-powered aircraft based on Gauss pseudo-spectral method. Chin. J. Aeronaut. 2019, 32, 2286–2298. [Google Scholar] [CrossRef]
  21. Zhang, R.; Xie, Z.; Wei, C.; Cui, N. An enlarged polygon method without binary variables for obstacle avoidance trajectory optimization. Chin. J. Aeronaut. 2023, 36, 284–297. [Google Scholar] [CrossRef]
  22. Zhang, Y.; Zhang, R.; Li, H. Graph-based path decision modeling for hypersonic vehicles with no-fly zone constraints. Aerosp. Sci. Technol. 2021, 116, 106857. [Google Scholar] [CrossRef]
  23. Radmanesh, R.; Kumar, M.; French, D.; Casbeer, D. Towards a PDE-based large-scale decentralized solution for path planning of UAVs in shared airspace. Aerosp. Sci. Technol. 2020, 105, 105965. [Google Scholar] [CrossRef]
  24. AlShawi, I.S.; Yan, L.; Pan, W.; Luo, B. Lifetime enhancement in wireless sensor networks using fuzzy approach and A-star algorithm. In Proceedings of the IET Conference on Wireless Sensor Systems (WSS 2012), London, UK, 18–19 June 2012; IET: London, UK, 2012; pp. 1–6. [Google Scholar] [CrossRef]
  25. Zhang, Z.; Zhao, Z. A multiple mobile robots path planning algorithm based on A-star and Dijkstra algorithm. Int. J. Smart Home 2014, 8, 75–86. [Google Scholar] [CrossRef]
  26. Dai, P.; Feng, D.; Zhao, J.; Cui, J.; Wang, C. Asymmetric integral barrier Lyapunov function-based dynamic surface control of a state-constrained morphing waverider with anti-saturation compensator. Aerosp. Sci. Technol. 2022, 131, 107975. [Google Scholar] [CrossRef]
  27. Dai, P.; Yan, B.; Han, T.; Liu, S. Barrier Lyapunov Function Based Model Predictive Control of a morphing waverider with input saturation and full-state constraints. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3071–3081. [Google Scholar] [CrossRef]
  28. Chen, X.; Li, C.; Gong, C.; Gu, L.; Ronch, A.D. A study of morphing aircraft on morphing rules along trajectory. Chin. J. Aeronaut. 2021, 34, 232–243. [Google Scholar] [CrossRef]
  29. Fasel, U.; Tiso, P.; Keidel, D.; Ermanni, P. Concurrent Design and Flight Mission Optimization of Morphing Airborne Wind Energy Wings. AIAA J. 2021, 59, 1254–1268. [Google Scholar] [CrossRef]
  30. Bao, C.; Wang, P.; Tang, G. Integrated method of guidance, control and morphing for hypersonic morphing vehicle in glide phase. Chin. J. Aeronaut. 2021, 34, 535–553. [Google Scholar] [CrossRef]
  31. Xu, W.; Li, Y.; Pei, B.; Yu, Z. Coordinated intelligent control of the flight control system and shape change of variable sweep morphing aircraft based on dueling-DQN. Aerosp. Sci. Technol. 2022, 130, 107898. [Google Scholar] [CrossRef]
  32. Xu, D.; Hui, Z.; Liu, Y.; Chen, G. Morphing control of a new bionic morphing UAV with deep reinforcement learning. Aerosp. Sci. Technol. 2019, 92, 232–243. [Google Scholar] [CrossRef]
  33. Hou, L.; Liu, H.; Yang, T.; An, S.; Wang, R. An Intelligent Autonomous Morphing Decision Approach for Hypersonic Boost-Glide Vehicles Based on DNNs. Aerospace 2023, 10, 1008. [Google Scholar] [CrossRef]
  34. Tenenbaum, J.B.; Kemp, C.; Griffiths, T.L.; Goodman, N.D. How to grow a mind: Statistics, structure, and abstraction. Science 2011, 331, 1279–1285. [Google Scholar] [CrossRef] [PubMed]
  35. Bao, C.Y.; Zhou, X.; Wang, P.; He, R.Z.; Tang, G.J. A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles. Aeronaut. J. 2023, 127, 1638–1658. [Google Scholar] [CrossRef]
  36. Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning-Volume 32, Bejing, China, 21–26 June 2014; JMLR.org: Beijing, China, 2014; pp. I-387–I-395. [Google Scholar]
  37. Heusner, M.; Keller, T.; Helmert, M. Best-Case and Worst-Case Behavior of Greedy Best-First Search. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 13–19 July 2018; International Joint Conferences on Artificial Intelligence Organization: Stockholm, Sweden, 2018; pp. 1463–1470. [Google Scholar] [CrossRef]
  38. Wu, Y.; Sun, G.; Xia, X.; Xing, M.; Bao, Z. An Improved SAC Algorithm Based on the Range-Keystone Transform for Doppler Rate Estimation. IEEE Geosci. Remote Sens. Lett. 2013, 10, 741–745. [Google Scholar] [CrossRef]
Figure 1. Intelligent coordinated reentry guidance scheme.
Figure 1. Intelligent coordinated reentry guidance scheme.
Aerospace 12 00591 g001
Figure 2. Morphing mode of HMV.
Figure 2. Morphing mode of HMV.
Aerospace 12 00591 g002
Figure 3. Effect of morphing on the lift-to-drag ratio at different Mach (a) Ma = 6; (b) Ma = 13; (c) Ma = 20.
Figure 3. Effect of morphing on the lift-to-drag ratio at different Mach (a) Ma = 6; (b) Ma = 13; (c) Ma = 20.
Aerospace 12 00591 g003
Figure 4. Schematic diagram of the HMV’s trajectory and the no-fly zones.
Figure 4. Schematic diagram of the HMV’s trajectory and the no-fly zones.
Aerospace 12 00591 g004
Figure 5. Multiple no-fly zones graph modeling.
Figure 5. Multiple no-fly zones graph modeling.
Aerospace 12 00591 g005
Figure 6. Schematic diagram of the DDPG algorithm.
Figure 6. Schematic diagram of the DDPG algorithm.
Aerospace 12 00591 g006
Figure 7. Multiple no-fly zone avoidance trajectory.
Figure 7. Multiple no-fly zone avoidance trajectory.
Aerospace 12 00591 g007
Figure 8. Variation in cumulative reward with episodes.
Figure 8. Variation in cumulative reward with episodes.
Aerospace 12 00591 g008
Figure 9. Three-dimensional reentry trajectory of gliding section.
Figure 9. Three-dimensional reentry trajectory of gliding section.
Aerospace 12 00591 g009
Figure 10. Variations in latitude, longitude, and altitude.
Figure 10. Variations in latitude, longitude, and altitude.
Aerospace 12 00591 g010
Figure 11. Variations in velocity, tracking errors of flightpath angle, and heading angle.
Figure 11. Variations in velocity, tracking errors of flightpath angle, and heading angle.
Aerospace 12 00591 g011
Figure 12. Variations curve of control input.
Figure 12. Variations curve of control input.
Aerospace 12 00591 g012
Figure 13. Variations in the path constraints.
Figure 13. Variations in the path constraints.
Aerospace 12 00591 g013
Figure 14. Variations in latitude, longitude, and altitude.
Figure 14. Variations in latitude, longitude, and altitude.
Aerospace 12 00591 g014
Figure 15. Variations in velocity, tracking errors of flightpath angle, and heading angle.
Figure 15. Variations in velocity, tracking errors of flightpath angle, and heading angle.
Aerospace 12 00591 g015
Figure 16. Variations curve of control input.
Figure 16. Variations curve of control input.
Aerospace 12 00591 g016
Figure 17. Variations of the path constraints.
Figure 17. Variations of the path constraints.
Aerospace 12 00591 g017
Figure 18. Variations in latitude, longitude, and altitude of Monte Carlo experiments.
Figure 18. Variations in latitude, longitude, and altitude of Monte Carlo experiments.
Aerospace 12 00591 g018
Figure 19. Statistics of terminal range error, velocity error, and landing point position.
Figure 19. Statistics of terminal range error, velocity error, and landing point position.
Aerospace 12 00591 g019
Figure 20. Variations curve of sweep angle with delay of Monte Carlo experiments.
Figure 20. Variations curve of sweep angle with delay of Monte Carlo experiments.
Aerospace 12 00591 g020
Figure 21. Variations of the path constraints of Monte Carlo experiments.
Figure 21. Variations of the path constraints of Monte Carlo experiments.
Aerospace 12 00591 g021
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bao, C.; Li, X.; Xu, W.; Tang, G.; Yao, W. Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones. Aerospace 2025, 12, 591. https://doi.org/10.3390/aerospace12070591

AMA Style

Bao C, Li X, Xu W, Tang G, Yao W. Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones. Aerospace. 2025; 12(7):591. https://doi.org/10.3390/aerospace12070591

Chicago/Turabian Style

Bao, Cunyu, Xingchen Li, Weile Xu, Guojian Tang, and Wen Yao. 2025. "Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones" Aerospace 12, no. 7: 591. https://doi.org/10.3390/aerospace12070591

APA Style

Bao, C., Li, X., Xu, W., Tang, G., & Yao, W. (2025). Coordinated Reentry Guidance with A* and Deep Reinforcement Learning for Hypersonic Morphing Vehicles Under Multiple No-Fly Zones. Aerospace, 12(7), 591. https://doi.org/10.3390/aerospace12070591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop