Next Article in Journal
A Phase Transition Control Framework for UAV Swarms Inspired by Pigeon Roosting Behavior
Previous Article in Journal
Optimizing UAV Flight Parameters for Linear Infrastructure Pathology Detection: Assessing Smart Oblique Capture
Previous Article in Special Issue
Initial Weight Modeling and Parameter Optimization for Collectible Rotor Hybrid Aircraft in Conceptual Design Stage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft

Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China
*
Author to whom correspondence should be addressed.
Drones 2026, 10(5), 325; https://doi.org/10.3390/drones10050325
Submission received: 27 February 2026 / Revised: 18 April 2026 / Accepted: 22 April 2026 / Published: 26 April 2026

Highlights

What are the main findings?
  • We developed the EA-TD3 autonomous trajectory planning framework for eVTOL UAV platforms which integrates a stochastic urban wind field model with an energy consumption model derived from battery data. By establishing flight energy boundaries, this framework mitigates safety hazards in autonomous trajectory planning arising from environmental interference and energy constraints.
  • The EA-TD3 framework enables eVTOL UAV platforms to reduce energy consumption by 11.6% in wind conditions and maintain an 87.8% mission success rate even under energy constraints. Regarding energy efficiency and operational robustness, this method outperforms baseline algorithms, including DDPG, SAC, and standard TD3.
What are the implications of the main findings?
  • By embedding environmental and battery energy perception into the autonomous trajectory planning framework, the process evolves from geometric homing to a physics-aware trajectory optimization paradigm. This enhancement improves the energy efficiency and trajectory reliability of autonomous eVTOL UAV operations.
  • Through simulations utilizing the physical parameters of the EH216-S platform, this study demonstrates that energy consumption boundaries provide a safety mechanism to alleviate range anxiety. This offers a technical reference for ensuring the operational reliability of unmanned systems within UAM networks.

Abstract

Autonomous trajectory planning for electric Vertical Takeoff and Landing (eVTOL) Unmanned Aerial Vehicles (UAVs) faces the dual challenges of low-altitude environmental interference and limited onboard energy, which affects the reliability and safety of unmanned missions. To address these challenges, this paper develops the EA-TD3 autonomous trajectory planning framework for eVTOL UAV systems. First, a stochastic urban wind field model is established to simulate low-altitude interference. Then, by integrating eVTOL UAV battery discharge data from Carnegie Mellon University (CMU), a mapping relationship between maneuvers and energy consumption is identified to construct a nonlinear energy consumption model. Finally, an energy boundary penalty function is introduced into the TD3 algorithm to ensure that trajectory planning remains within battery safety margins. Experiments based on the parameters of the EH216-S platform show that EA-TD3 achieves a near 100.00% success rate under ideal conditions and outperforms benchmark algorithms while reducing average energy consumption by 11.6%. Under an energy constraint of 120 J, its success rate remains at 87.80%, which exceeds the performance of the DDPG, SAC, and standard TD3 algorithms. This study optimizes the autonomous trajectory planning of eVTOL UAV platforms in urban air mobility (UAM) to improve the energy perception and power management of the autonomous system.

1. Introduction

As an emerging strategic field, the low-altitude economy is driving the rapid development of urban air mobility (UAM), with electric Vertical Takeoff and Landing (eVTOL) aircraft serving as its core technological carrier. Thanks to their unique flexibility and vertical takeoff and landing capabilities and zero-emission characteristics, eVTOL systems are utilized in last mile logistics [1,2] and medical emergency services and urban patrols. While the eVTOL field encompasses both manned and unmanned platforms, the development of autonomous trajectory planning technologies has made electric Vertical Takeoff and Landing Unmanned Aerial Vehicle (eVTOL UAV) systems the mainstream trend for future urban transportation. In the following sections, autonomous trajectory planning is collectively referred to as trajectory planning. Unlike traditional aviation where pilots adjust flight strategies based on real-time sensory feedback, eVTOL UAV systems lack human in the loop intervention in extreme situations. This necessitates that trajectory planning systems automatically adapt to complex physical constraints. For unmanned platforms, energy-aware mechanisms are fundamental safety requirements ensuring successful trajectory planning. Therefore, the performance of trajectory planning has become a key technology for flight safety. However the limited onboard energy storage of eVTOL UAV remains a key constraint because energy states directly impact planning performance. Consequently, reducing energy-related risks during trajectory planning is a prerequisite for the deployment of eVTOL UAV systems. In this regard, this research focuses on eVTOL UAV and their energy-aware trajectory planning technologies.
In the absence of a human pilot, eVTOL UAV systems must independently plan flight trajectories prior to takeoff by accounting for the destination and the complex urban environment and real-time energy status to ensure autonomous trajectory planning. In high-density metropolitan environments, three primary factors contribute to energy-related risks during trajectory planning [3] including stochastic low-altitude wind fields and maneuver-dependent power consumption and the inherent energy constraints of onboard batteries. Specifically, random wind fields in urban low-altitude airspace induce severe aerodynamic interference which forces the eVTOL UAV to frequently perform trajectory adjustments and triggers unpredictable power surges. Simultaneously, the energy expenditure of these unmanned platforms exhibits significant power variations across different flight phases such as vertical takeoff and landing and cruise while rendering traditional linear energy estimation models inadequate. Furthermore, the intrinsic discharge characteristics of onboard batteries impose strict safety boundaries on trajectory planning endurance [4]. Failure to synergistically optimize these three dimensions not only compromises planning performance but also poses substantial risks of energy depletion and catastrophic system failure within complex urban airspace. Recently, research has been conducted across these three dimensions to enhance the reliability of trajectory planning systems.
Wind serves as a primary environmental factor which directly influences energy consumption during urban flight and renders the trajectory planning of eVTOL UAV systems significantly more sensitive to energy utilization during the transition to unmanned operations. Existing research on the impact of urban wind fields on trajectory planning primarily focuses on leveraging wind characteristics for energy conservation or risk mitigation. In terms of impact analysis, Milcsik et al. [5] utilized Monte Carlo simulations to evaluate how urban wind fields influence trajectory planning. Baskar et al. [6] released an open source computational fluid dynamics (CFD) wind field dataset for authentic urban environments and demonstrated that optimal trajectories for fixed wing UAVs deviate significantly from traditional shortest path strategies when wind effects are integrated. For flight safety, Jiang et al. [7] developed a method to identify urban no fly zones through CFD simulations while Chan et al. [8] proposed a deep learning-based trajectory planning model to ensure safety. Regarding energy efficiency, Frey et al. [9] combined CFD simulations with nonlinear energy-aware strategies to optimize trajectory planning and Rienecker et al. [10] employed Large Eddy Simulation (LES) to reduce consumption by strategically utilizing airflows. However while these strategies enhance efficiency for general UAVs, they often overlook the unique aerodynamic sensitivities and unpredictable energy expenditure that stochastic dynamic wind fields impose on eVTOL UAV trajectory planning [11]. A more robust energy-aware trajectory planning framework is required to address these challenges.
In energy modeling of eVTOL UAV systems, existing research primarily employs physical modeling methods which utilize prior knowledge to mathematically analyze energy consumption and establish dynamic equations to predict energy expenditure under different maneuvers. For example, to assess the feasibility of trajectory planning, Marzougui et al. [12] implemented a rule-based strategy for a hydrogen fuel cell battery hybrid eVTOL UAV to ensure power allocation in search and rescue missions. Similarly, to integrate energy constraints into trajectory planning, Senkans et al. [13] proposed a first principles model that calculates the power required for eVTOL UAV to operate under specified conditions. To assess the correlation between random maneuvers and energy consumption, Jiao et al. [14] introduced a dynamic model-based evaluation method that couples aircraft mass and instantaneous velocity and kinetic energy loss. While physical energy modeling is interpretable, its limitations in trajectory planning are significant. These models often rely on oversimplified assumptions and neglect nonlinear effects of flight speed and vertical motion, which reduces accuracy. Furthermore, static equations fail to capture complex power system interactions and dynamic equipment degradation [15]. In urban low-altitude environments, the accelerating wear of batteries and motors widens the gap between theoretical models and actual performance [16]. For eVTOL UAV systems, this accumulated error creates energy depletion risks [17], which necessitates dynamic correction using flight data for robust trajectory planning.
In the domain of trajectory planning for eVTOL UAV systems, classical algorithms such as A* and rapidly exploring random tree (RRT) have laid the foundation for collision-free trajectory generation [18,19]. However, these methods often treat the aircraft as an idealized point mass and primarily focus on geographic distance or computational time while neglecting trajectory planning constraints. As previously discussed, the trajectory planning performance of eVTOL UAV systems in urban environments is challenged by the coupling of aerodynamic interference and onboard energy availability. Traditional models fail to effectively incorporate stochastic wind field effects and nonlinear energy-aware issues into the planning loop. Without integrating the physical state and energy status of the eVTOL UAV into the decision-making process, a path that is geometrically optimal may lead to risks such as energy depletion before reaching the destination. The connection between theoretical path planning and energy-aware trajectory planning is essential for ensuring flight safety.
In response to the requirements for trajectory planning of eVTOL UAV systems to interact with urban wind fields and maintain real-time energy awareness, deep reinforcement learning (DRL) offers a robust solution. DRL enables agents to autonomously establish mappings between environmental features and action strategies and provides a new paradigm for addressing complex trajectory planning tasks [20]. Regarding risk identification, Primatesta et al. [21] utilized reinforcement learning to identify risk aware optimal paths and demonstrated the potential of this learning mechanism for handling urban complexity. Gao et al. [22] introduced the concept of virtual risk terrain to plan eVTOL UAV trajectories safely through high-risk environments. To address energy constraints, Fu et al. [23] proposed the BiLG D3QN algorithm to optimize trajectory planning under payload-related energy constraints. However, traditional DRL algorithms are prone to overestimating Q values, which leads to performance instability in complex continuous action spaces. In contrast, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm introduces dual critic networks to bound estimation bias and achieves superior convergence stability. This results in smoother and safer maneuver amplitudes for eVTOL UAV systems. Building on this, Chen et al. [24] proposed the TD3 RRT algorithm to enhance planning success and path smoothness. For 3D energy-saving challenges, Lv et al. [25] developed a dynamic energy-efficient trajectory planning method using TD3 while Xie et al. [26] integrated a fluid dynamics-based reward mechanism into a hybrid TD3 algorithm to penalize inefficient maneuvers. Despite the potential of DRL, existing energy-aware strategies for eVTOL UAV systems still largely rely on simplified constant wind fields and theoretical physical models. This reliance poses operational risks for energy critical trajectory planning [27].
To enhance the reliability and safety of trajectory planning, this paper proposes an Energy-Aware Twin Delayed Deep Deterministic Policy Gradient (EA-TD3) framework for eVTOL UAV systems. First, a dynamic turbulence wind field model is integrated into the framework, which characterizes wind disturbances through a combination of multi-frequency sine waves and stochastic noise. Second, to address the discrepancy between theoretical physical models and real-world energy consumption, a data-driven energy expenditure model is constructed by utilizing the Carnegie Mellon University (CMU) eVTOL UAV battery dataset. This model captures the specific power demands of various flight maneuvers. Finally, to mitigate the safety risks associated with limited onboard energy, an energy-aware penalty function is incorporated into the TD3 algorithm to ensure that the trajectory planning remains within permissible energy boundaries. Upon establishing the EA-TD3 framework, 3D simulations were conducted to validate its effectiveness and robustness. The experimental results demonstrate that the proposed framework addresses the limitations of traditional eVTOL UAV trajectory planning. The primary contributions of this work are as follows.
The main contributions of this paper are summarized as follows:
  • Establishment of an adaptive trajectory planning mechanism: We propose a framework for eVTOL UAV systems operating under stochastic wind fields. By integrating a stochastic urban low-altitude wind model, the framework enables the eVTOL UAV to learn from and interact with authentic wind environments effectively.
  • Construction of a data-driven energy-aware model: Leveraging the Carnegie Mellon University (CMU) battery dataset, a data-driven energy model was developed. Compared to traditional theoretical models, this approach significantly enhances the physical authenticity of the trajectory planning environment.
  • Refinement of a survival-aware and energy-aware strategy: By incorporating a survival-aware reward function into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, we designed a trajectory planning strategy with energy boundary constraints to ensure mission completion and system safety.
The overall architecture of the proposed EA-TD3 trajectory planning framework for eVTOL UAV systems is illustrated in Figure 1, which provides a detailed representation of the system inputs and outputs and the methodology employed. The core objective of this framework is to address the safety and reliability challenges of trajectory planning arising from stochastic wind field interference and nonlinear energy consumption characteristics in urban low-altitude environments. By integrating a data-driven energy model with a robust reinforcement learning agent, the framework ensures that the eVTOL UAV can independently make planning decisions while maintaining energy awareness.

2. Method

This section provides a detailed introduction to the proposed energy-aware trajectory planning framework for eVTOL UAV systems, which is established on real data-driven principles. The framework focuses on enhancing the decision-making intelligence of eVTOL UAV systems by integrating authentic flight data into the planning loop. To facilitate an understanding of the following derivation, the mathematical symbols and their corresponding physical meanings used in this study are detailed in Table 1.

2.1. eVTOL Related Instructions

2.1.1. Determine the Research Subjects

To enhance the physical fidelity of the simulation, the EHang EH216-S eVTOL UAV platform is selected as the primary research object of this study, which is sourced from EHang Holdings Limited, Guangzhou, China. The EH216-S is a multi-rotor eVTOL UAV and represents a milestone as a platform to obtain both type certificate (TC) and airworthiness certificate (AC). Its design philosophy emphasizes a pilotless and full redundancy architecture which ensures that components such as propellers and motors and trajectory planning systems and batteries have backups to maintain safe operation in the event of a component failure. The platform is illustrated in Figure 2. This structural complexity and autonomous dependency makes it a candidate for evaluating energy-aware trajectory planning in urban environments.
In this framework, while the energy expenditure is modeled using a data-driven approach based on the Carnegie Mellon University (CMU) dataset, physical and kinematic constraints are derived from the parameters of the EH216-S eVTOL UAV to define the operational boundaries of the trajectory planning system. These parameters ensure that the action space explored by the EA-TD3 algorithm remains within the eVTOL UAV flight envelope. By integrating these constraints, the reinforcement learning agent learns to execute maneuvers that are both energetically feasible and structurally safe for trajectory planning. Basic parameters of the platform are shown in Table 2.

2.1.2. Trajectory Profile Definition

In a typical urban air mobility (UAM) scenario, a complete eVTOL UAV flight generally encompasses five distinct phases which include takeoff and climb and cruise and descent and landing. Existing research indicates that at altitudes exceeding 120 m, the density of urban structures decreases. In the absence of dynamic obstacles, eVTOL UAV systems often achieve linear flight which renders trajectory planning less critical in such high-altitude airspace.
The scope of this research is focused on the low-altitude environment below 120 m. Based on this perspective, the trajectory profile in this study is simplified to include the inclined climb and low-altitude cruise and constrained descent phases while omitting the vertical takeoff and landing segments at extremely low altitudes. Throughout these stages, the eVTOL UAV must execute trajectory planning through dense building clusters for obstacle avoidance while managing power surges induced by stochastic wind fields. This simplified profile as illustrated in Figure 3 allows for a concentrated investigation of the trajectory planning performance and energy resilience of eVTOL UAV systems in urban canyons.

2.1.3. Flight Constraints

In the study of eVTOL UAV trajectory planning for urban environments, identifying and modeling constraints is essential to ensure the feasibility and stability of the trajectory planning system. These constraints reflect the physical and safety and kinematic limitations that eVTOL UAV systems must adhere to during operation. This ensures that the generated trajectory is both robust and feasible. To establish a foundation for the EA-TD3 algorithm, this paper constructs the following constraints for eVTOL UAV systems operating in urban environments.
The kinematic and operational constraints of the eVTOL UAV are defined as follows:
  • Flight Altitude Constraints: The flight altitude of the eVTOL UAV must be regulated to avoid collisions with urban infrastructure and comply with low-altitude traffic management rules. The altitude constraint is defined as
    H min h i H max
    where h i represents the instantaneous altitude at step i while H min and H max denote the minimum and maximum permissible flight altitudes respectively.
  • Climb Angle Constraints: To maintain aerodynamic stability and respect propulsion system performance limits, the climb angle θ must remain within a predefined operational range
    θ i = arctan | z i z i 1 | ( x i x i 1 ) 2 + ( y i y i 1 ) 2 θ max
  • Turning Angle Constraints: To ensure trajectory smoothness in the horizontal plane, the turning angle ψ is constrained. Unlike the vertical climb angle defined in Equation (2), the turning angle ψ represents the change in orientation between two consecutive flight segments in the x y plane
    ψ i = arccos ( x i x i 1 ) ( x i + 1 x i ) + ( y i y i 1 ) ( y i + 1 y i ) ( x i x i 1 ) 2 + ( y i y i 1 ) 2 · ( x i + 1 x i ) 2 + ( y i + 1 y i ) 2 ψ max
    where ( x , y , z ) coordinates denote the eVTOL UAV spatial position. Equation (2) constrains the vertical slope relative to the ground while Equation (3) limits the lateral maneuverability by calculating the inner product of sequential horizontal velocity vectors.

2.2. Energy Consumption Modeling

This paper references technical parameters of the EH216-S to construct a flight environment with realistic physical constraints. The performance parameters of the EH216-S are primarily used to define the aircraft kinematic envelope and spatial maneuverability constraints. For energy consumption modeling, this paper does not limit itself to proprietary battery parameters. Due to the commercial confidentiality of the EH216-S, its specific battery performance data and electrochemical parameters remain undisclosed. Instead, this study utilizes a publicly available power battery dataset released by Carnegie Mellon University (CMU). The core logic behind this choice is to address the lack of open-source industrial battery data while improving the versatility of the trajectory planning algorithm and the reproducibility of scientific research.
First, the CMU dataset is widely regarded as a source for eVTOL UAV energy intensity research. During the experimental design phase, this dataset simulates the energy and power characteristics of typical eVTOL UAV flight through combinations of various discharge rates and temperature conditions. This operational design covers the entire process from high instantaneous loads during vertical takeoff to steady state loads during level flight and its physical essence lies in characterizing the nonlinear loss patterns during flight. Although there is a significant difference in mass between the experimental battery cells at CMU and full-scale eVTOL UAV systems, this paper does not attempt to force a fit on absolute power values. Instead, it utilizes the CMU dataset to construct a general energy intensity proxy model. Therefore, the use of this dataset makes the trajectory planning algorithm universal across platforms. Its core value lies in characterizing the relative energy intensity trend of the battery under typical eVTOL UAV discharge profiles and can serve as a general energy intensity proxy model for various multi-rotor flight platforms. Second, this paper sets physical constraints based on the kinematic dimensions of the EH216-S. This paper references the basic information of the EH216-S model and sets the motion envelope constraints for the aircraft. These constraints define the feasible region of the reinforcement learning agent in the search space to ensure that the generated trajectory meets the airworthiness and safety standards of the EH216-S model in engineering practice.
Under this logic, the energy intensity perceived by the algorithm reflects a characteristic trend, namely the fluctuation pattern of battery efficiency loss under specific maneuvers. This paper aims to verify the adaptive learning ability of the proposed trajectory planning algorithm to energy intensity patterns in environments. This method, which combines a general energy consumption change mechanism with specific aircraft movement constraints, can effectively avoid the problem of decreased algorithm generalization caused by parameter overfitting.

2.2.1. Experimental Data Background

The dataset used in this paper was collected by Carnegie Mellon University (CMU) and published by Alexander Bills et al. [28]. It encompasses over 15 million records including charge–discharge cycles and diverse operating conditions. For eVTOL UAV applications, the experiments utilized Sony Murata 18650 VTC 6 cylindrical batteries (sourced from Murata Manufacturing Co., Ltd., Kyoto, Japan) with 3000 mAh capacity and 3.6 V nominal voltage and 230 Wh/kg specific energy. During testing, batteries were maintained at 25 °C in a temperature-controlled chamber and cycled using a modular cycling device. The experimental conditions and corresponding CMU subdatasets are detailed in Table 3.
Core parameters recorded in the dataset include battery voltage (U), current (I), surface temperature (T), cycle count and cycle segment recorded by the tester ( N s ), time (t), energy balance during charge and discharge phases ( E Charge and E Discharge ), and the amount of charge extracted from the cell ( Q Charge and Q Discharge ). The task parameter range spans 400 to 1000 s. The dataset comprises 22 batteries which simulate eVTOL UAV trajectory profiles through varying cycle counts. As shown in Table 3, the collected dataset is segmented into subdatasets based on test conditions. The baseline condition subdataset follows a trajectory sequence consisting of a 75 s takeoff at 54 W and an 800 s cruise at 16 W and a 105 s landing at 54 W [28]. This dataset captures variations across critical parameters including temperature and cruise duration and aircraft power and charging current, which are encountered during eVTOL UAV flight.

2.2.2. Power Demand Analysis

This study aims to establish a trajectory planning framework for eVTOL UAV systems applicable to general scenarios where battery energy consumption characteristics are primarily determined by flight maneuvers. Because this study focuses on the planning decision-making mechanism within a single flight, the impact of battery aging on energy consumption is considered a quasi-static process which remains constant throughout a single flight. Therefore, this study neglects long-term factors such as battery degradation or extreme environmental conditions. Since the energy consumption portion of this study references the Carnegie Mellon University (CMU) dataset, the flight phase division is based on the parameters of that dataset. This paper decomposes the flight process into three maneuver phases, which include climb and cruise and descent. The climb phase encompasses the process from takeoff to the predetermined altitude while the descent phase encompasses the descent from cruise altitude to the ground. However, this study acknowledges the limitation of omitting the pure vertical takeoff and landing segments. In actual operations, these segments are subject to complex near-ground wind disturbances. Neglecting these factors may lead to an underestimation of the total mission energy and the influence of initial turbulent fluctuations on the trajectory planning process.
Based on the above assumptions, we selected three benchmark battery cycle samples from the CMU dataset including VAH01 and VAH17 and VAH27 to represent baseline energy consumption data. Figure 2 shows the trends of voltage and current and power as a function of flight time for VAH01 during the climb and cruise and descent phases of the tenth flight cycle. Analysis of the voltage U and current I and temperature T curves within a single cycle reveals significant nonlinear characteristics. For eVTOL UAV operation, both the climb and descent phases require high and stable power output. During the climb phase, the propulsion system needs to generate sufficient lift to overcome gravity, which results in high power output. During the descent phase, the rotor needs to maintain a high power output comparable to that during takeoff to generate balancing thrust against airflow disturbances.
This physical behavior is reflected as the voltage U decreases with increasing depth of discharge while the onboard management system compensates by increasing the current I. Despite the fluctuations in instantaneous electrochemical parameters, the measured power exhibited stability across all maneuver phases. In contrast, during the cruise phase, power demand decreases to the steady state range. These power distribution characteristics form the physical basis for the energy intensity assessment indices and the trajectory planning algorithm presented in this paper. This method, which combines a general energy consumption change mechanism with specific aircraft movement constraints, can effectively avoid the problem of decreased algorithm generalization caused by parameter overfitting.
Based on these physical observations, this study employs a stage-integrated averaging method to reduce the dimensionality of the battery data. We calculate the average power for each maneuver stage s { climb , cruise , descent } as follows
P ¯ s = 1 Δ T s 0 Δ T s U ( t ) I ( t ) d t
where Δ T s represents the duration of the respective phase. These integrated values are mapped to maneuver intensity factors ξ s relative to the cruise phase
ξ s = P ¯ s P ¯ cruise
The temporal evolution of the underlying battery parameters and the resulting power distribution across these stages are illustrated in Figure 4. This approach preserves the power distribution observed in the datasets while filtering out transient noise that could hinder the convergence of the EA-TD3 reinforcement learning algorithm.

2.2.3. Modeling of Energy Intensity Factors

To facilitate energy assessment in trajectory planning, this study establishes a relationship between energy consumption and flight path length [30] where the proportionality coefficient is adjusted based on maneuvering actions. Consequently, energy expenditure is decoupled from flight duration and linked to spatial displacement to reflect how trajectory selection impacts eVTOL UAV energy reserves [31]. This spatial mapping approach enables the agent to evaluate the energy cost of path candidates during trajectory planning.
The evolution of the battery state of charge ( S o C [ 0 , 1 ] ) is governed by the following state transition equation
S o C t + 1 = S o C t κ ( a t ) · Δ d E total
where Δ d denotes the spatial displacement magnitude within a single decision step and E total represents the total rated energy capacity of the battery system and κ ( a t ) denotes the energy intensity factor measured in J/m associated with action a t .
By mapping the power characteristics derived from the Carnegie Mellon University (CMU) dataset to the aircraft flight dynamics, the energy intensity factor κ ( a t ) is formulated as
κ ( a t ) = P ¯ s v ( a t ) = κ climb , a t Climb κ cruise , a t Cruise κ descent , a t Descent
In this formulation, P ¯ s is the average power for each maneuver stage s derived from Formula (4) and v ( a t ) is the corresponding average velocity. Since the climb phase demands power output at a lower ascent speed, the resulting κ climb is higher than κ cruise . By integrating this displacement-based cost into the state observation and reward function of the EA-TD3 algorithm, the agent learns to optimize trajectories by minimizing high-intensity maneuvers to ensure mission completion within battery operational boundaries.

2.3. Environmental Formulation

This section details the construction of the three dimensional simulation environment specifically designed for the eVTOL UAV path planning task. By integrating high fidelity urban digital twins with stochastic wind field models, this environment provides a rigorous platform for evaluating the trajectory planning intelligence of the unmanned platform in complex low-altitude airspace.

2.3.1. Continuous 3D Workspace and Obstacle Modeling

In this study, the simulation workspace is defined as a continuous three dimensional Euclidean space W R 3 with dimensions L × W × H . To ensure physical fidelity for the trajectory planning of the EH216-S, we employ an analytical cuboid model to represent the urban landscape. Each building O i O is characterized by its center coordinates ( x i , y i ) , half-length L i , half-width W i , and height h i .
For the eVTOL UAV located at any continuous position P t = ( x t , y t , z t ) , the collision detection function C ( P t ) is formulated as
C ( P t ) = 1 , if O i : | x t x i | < L i + d safe | y t y i | < W i + d safe z t h i + d safe 0 , otherwise
where d safe is the safety buffer derived from the aircraft physical dimensions. To facilitate the agent perception of the spatial distribution of these continuous obstacles, the workspace is digitized into a 3D occupancy grid as shown in Figure 5a. In environment modeling, the determination of search directions follows the geometric logic of neighborhood expansion. Within traditional 2D planar modeling, algorithms typically consider a 3 × 3 area centered on the current node. As shown in Figure 5b, inside this nine grid region, the algorithm can expand in 8 basic directions comprising 4 cardinal and 4 diagonal directions after excluding the center node. When the scenario extends to 3D spatial modeling, the search neighborhood evolves from a planar grid into a 3 × 3 × 3 cubic volume. Consequently, each node connects to the 8 adjacent points within the same plane and the 9 adjacent points in both the layers above and below. The calculation logic is that the three layers of grids contain 27 potential node positions. After subtracting the central node position, the remaining 26 positions form the 26 search directions illustrated in Figure 5c. This 26 direction search topology covers faces and edges and vertices in contact with the center node to evaluate connectivity within 3D environments. Through this approach, the EA-TD3 algorithm combines the efficiency of spatial discretization with the precision of continuous trajectory planning.

2.3.2. Wind Field Model and Stochastic Disturbance

The synthetic wind field in this study is defined on a discrete three dimensional grid consistent with the trajectory planning space where each grid point possesses a three dimensional wind velocity vector. To characterize the aerodynamic environment, the wind velocity at each grid cell is constructed through a superposition of double frequency sinusoidal terms and uniform stochastic noise. The phase for each wind velocity component is independently sampled during the environment initialization and remains constant throughout the life cycle of the environment instance. The baseline wind velocity is formulated using a combination of sine functions with varying frequencies to simulate the multi scale oscillations of urban airflow. By integrating these components, the model captures horizontal wind fluctuations and vertical wind components such as updraughts and downdraughts, which are used for evaluating the energy consumption during the climb and descent phases of the eVTOL UAV.
Building upon this multi-dimensional sinusoidal foundation, this study accounts for the atmospheric boundary layer (ABL) effects to reflect the dependence of wind intensity on altitude. The intensity of the synthetic wind field is modulated by a wind profile parameter according to the power law relationship
v ( z ) = v r e f z z r e f α
where v ( z ) is the wind speed at altitude z and v r e f is the reference wind speed at height z r e f and α is the wind shear exponent determined by the surface roughness. Although this formulation integrates global vertical wind components, the current work does not explicitly model the interaction between the wind field and specific building topologies. This omission means that microscale phenomena including flow separations and wake effects that generate localized updraughts and downdraughts in the vicinity of obstacles are not considered. Such vertical winds may have an impact on energy consumption during the climb and descent phases, which constitutes a limitation of the present simulation environment. In future testing or research, computational fluid dynamics (CFD) methods could be utilized to achieve a simulation of wind conditions near structures or the research findings could be applied to enhance battery capacity and extend flight endurance.
Environmental robustness is critical for the operation of eVTOL UAV systems [32]. In this study, the integrated wind field model assigns wind velocity and direction vectors to each grid cell to facilitate the trajectory planning process. To maintain computational tractability while isolating the impact of directional trajectory planning, we focus on how the relative angle between the wind vector and the aircraft heading modulates the energy intensity. A dimensionless wind angle correction factor is introduced to adjust the baseline energy intensity derived from the Carnegie Mellon University (CMU) dataset. This trigonometric formulation ensures that energy demand is minimized during tailwind conditions due to assistive flow and maximized during headwind conditions to compensate for resistive drag. By treating wind magnitude as a normalized constant in the energy calculation, this approach compels the EA-TD3 agent to perceive the spatial wind field topology and prioritize trajectories with favorable wind angles.

2.4. Reinforcement Learning Algorithms

2.4.1. MDP Formulation

To address the trajectory planning problem under coupled energy and environmental constraints, the task is formulated as a Markov decision process (MDP) [33] defined by the quintuple ( S , A , P , R , γ ) . This MDP is extended to account for energy autonomy and directional aerodynamic disturbances.
The reinforcement learning components are defined as follows:
  • State Space ( S ): The state vector s t S is designed to provide the agent with multi-modal perception. At each time step t, the observation is defined as
    s t = [ d t a r g e t , Δ θ , Δ ϕ , d o b s , S o C t , θ w i n d ]
    where d t a r g e t is the normalized distance to the target P g o a l while Δ θ and Δ ϕ represent horizontal and vertical angular deviations. d o b s denotes the proximity to the nearest obstacle. We integrate the battery state of charge (SoC) S o C t and the local relative wind angle θ w i n d as core variables. This enables the agent to perceive its energy survival boundary and the spatial wind field topology.
  • Action Space ( A ): The action space A is continuous to ensure smooth trajectory control. The action vector a t controls the kinematic update
    a t = [ v t , θ t , ϕ t ] T
    where v t is the velocity and θ t , ϕ t are heading increments. These actions determine the maneuver-specific energy intensity κ ( a t ) .
  • Energy-Aware Reward Function ( R ): The reward function at time step t balances mission efficiency and safety and path optimality and energy consumption
    r t = ω g r g o a l + ω c r c o l l i s i o n + ω d r d i s t + ω e r e n e r g y
    where ω g and ω c and ω d and ω e are weighting coefficients. r g o a l denotes the reward provided as an incentive when the agent reaches the targe t. r c o l l i s i o n represents the collision penalty imposing a negative reward when safety constraints are violated. r d i s t is a distance-based shaping term that encourages the agent to move toward the goal.
The energy-related term r e n e r g y models the propulsion cost during motion and is defined as
r e n e r g y = κ ( a t ) · [ 1 α cos ( θ r e l ) ] · Δ d
where κ ( a t ) denotes the energy intensity associated with the motion mode and α represents the wind sensitivity coefficient and θ r e l is the relative angle between the heading of the eVTOL UAV and the wind direction and Δ d is the traveled distance within the current step.
This formulation introduces an angle-dependent energy penalty through the wind angle correction factor η ( θ r e l ) = 1 α cos ( θ r e l ) . It reflects the insight that energy consumption depends on motion intensity and alignment with environmental wind conditions. Energy cost increases under headwind conditions where θ r e l 180 ° and decreases with tailwinds while maneuvers such as climbing with larger κ ( a t ) incur higher penalties. As a result, the agent is encouraged to discover energy-efficient trajectories that minimize the energy expenditure.
Figure 6 illustrates the operational logic of the proposed framework.

2.4.2. Improved Energy-Aware TD3 (EA-TD3) Algorithm

To train an agent for trajectory planning in three dimensional environments under energy constraints, this study employs the TD3 algorithm [34]. As an extension of the deep deterministic policy gradient (DDPG) framework, TD3 is optimized for high dimensional continuous action spaces. In the context of eVTOL UAV trajectory planning, where velocity and angular control are required alongside energy penalties, the EA-TD3 addresses Q-value overestimation through three mechanisms. The Q-value represents the expected cumulative reward an agent receives by taking an action in a state. Since the agent selects actions with the highest Q-values, overestimation forces the policy to exploit erroneous peaks, which leads to inaccurate flight commands and high-energy maneuvers and oscillatory control outputs. By mitigating this estimation bias, EA-TD3 ensures stable and energy-efficient trajectory planning.
The core mechanisms of the EA-TD3 algorithm are organized as follows:
  • Twin Critic Architecture for Task-Oriented Energy Evaluation: The critic architecture evaluates flight decisions by mapping the state of the eVTOL UAV and its actions to a value. Each critic network uses a multi-layer perceptron (MLP) structure to represent energy consumption rules during mission phases such as climbing or cruising. By calculating future rewards, the critic networks provide the actor network feedback to pick trajectory planning choices that prioritize safety and minimize battery drain throughout the mission. To mitigate overestimation bias caused by function approximation errors, TD3 computes the target value y by taking the minimum output of the two target critic networks
    y = r + γ min i = 1 , 2 Q θ t a r g e t , i ( s , a ˜ )
    where a ˜ is the target action smoothed by random noise. For the eVTOL UAV mission, this conservative estimation prevents the agent from selecting high-risk moves or power-hungry climb segments that might lead to battery exhaustion or mission failure in urban airspaces.
  • Delayed Updates and Target Policy Smoothing: To ensure training stability in urban wind fields, the EA-TD3 introduces the following strategies. The actor network and target networks are updated at a lower frequency than the critic networks. This ensures that the policy gradient is calculated after the value function, which evaluates energy safety trade-offs, has stabilized. To prevent changes in the flight state due to wind disturbances, a clipped Gaussian noise ϵ is added to the target action
    a ˜ = clip ( π ϕ t a r g e t ( s ) + ϵ , a l o w , a h i g h ) , ϵ clip ( N ( 0 , σ 2 ) , c , c )
    This mechanism forces the Q-function to learn that similar trajectory planning actions should yield similar values, which results in smoother trajectory outputs and aerodynamic stability.
  • Experience Replay and Energy Balanced Optimization: The training process utilizes a replay buffer to store transitions [ s t , a t , r t , s t + 1 , d o n e ] . By minimizing the mean squared error (MSE) loss L ( θ i ) = E [ ( y Q θ i ( s , a ) ) 2 ] , the agent optimizes its behavior. Since the reward function r t incorporates the energy intensity κ ( a t ) and the state of charge (SoC) S o C t , the EA-TD3 agent learns a strategy. This strategy prioritizes goal reaching when S o C t is abundant and switches to energy-efficient maneuvers such as optimizing the heading relative to the wind angle θ r e l when energy is scarce. Figure 7 illustrates the structural logic of the algorithm.

2.5. Algorithm Introduction

The operational logic of the EA-TD3 is categorized into four primary stages including state perception and directional decision-making and energy-aware reward calculation and twin critic optimization. The execution flow is presented in Algorithm 1.
The core components and logic of the proposed method are summarized as follows:
  • Initialization and Environmental Robustness: The EA-TD3 algorithm establishes a dual critic framework Q θ 1 , θ 2 and an actor network π ϕ to mitigate the overestimation of action values in stochastic urban airspaces. By resetting the simulation environment at the beginning of each episode, the agent is exposed to a spectrum of wind vectors. This training regime ensures that the learned trajectory planning policy π captures the physical correlation between aerodynamic resistance and energy intensity. Consequently, the eVTOL UAV develops a resilience that prioritizes energy efficiency rather than overfitting to a static geometric route.
  • Angular-Driven Autonomous Trajectory Planning: In alignment with the operational requirements for stable flight in the low-altitude economy, the EA-TD3 algorithm outputs maneuvers a t = [ v t , θ t , ϕ t ] T representing velocity and heading increments. Each action is translated into a spatial displacement Δ d t within the environment. This approach ensures that the reinforcement learning agent identifies the energy-efficient spatial topology within the three dimensional urban workspace while adhering to search connectivity. By prioritizing angular-driven trajectory planning, the framework enables the eVTOL UAV to execute smooth transitions between mission waypoints.
Algorithm 1: EA-TD3: energy-aware autonomous trajectory planning method.
Drones 10 00325 i001
  • Physics-Coupled Energy Mapping: The innovation of this stage lies in the real-time coupling of flight maneuvers with the Carnegie Mellon University (CMU) battery dataset. For each spatial displacement, the algorithm identifies the maneuver type m { c l i m b , c r u i s e , d e s c e n t } based on the vertical component of the action. Simultaneously, it calculates the relative wind angle θ r e l between the heading of the eVTOL UAV and the wind vector. These variables are utilized to compute the energy intensity κ ( a t ) using the wind angle correction factor η ( θ r e l )
    κ ( a t ) = κ m · ( 1 α cos θ r e l )
    This mapping mechanism transforms a spatial movement into an energy depletion metric Δ E t = κ ( a t ) · Δ d t to integrate battery characteristics into the trajectory planning loop. By accounting for these physics-coupled factors, the framework ensures that the planned trajectories are energetically sustainable for long-range missions.
  • Energy-Aware Reward and Policy Evolution: The calculated energy depletion Δ E t is integrated into the multi-objective reward function r t as a penalty term. This feedback loop compels the EA-TD3 agent to explore energy-efficient corridors where the κ ( a t ) factor is minimized such as trajectory segments leveraging tailwinds. Through delayed policy updates, the algorithm stabilizes the learning process against wind noise and converges on a trajectory planning policy that prioritizes trajectories with the lowest cumulative electrochemical cost. This optimization ensures that the eVTOL UAV can achieve mission completion while maintaining energy consumption within the battery safety boundaries to enhance the operational reliability of autonomous systems.

3. Experimental Setup

3.1. Trajectory Planning Problem Description

In urban air traffic systems, the trajectory planning of eVTOL UAV platforms is a task involving stochastic variables and operational constraints. Modeling this process aims to construct a mathematical framework that simulates the operating environment. The objective of the proposed model is to minimize the total energy consumption and flight distance of the eVTOL UAV from its origin to its destination while ensuring flight safety and compliance with low-altitude air traffic regulations and the avoidance of structural obstacles, as illustrated in Figure 8.
Within this framework, the model integrates factors including the flight performance constraints of the eVTOL UAV and the safety requirements for building avoidance. The flight performance constraints encompass operational altitude and climb angle and yaw rate limitations. Meanwhile, environmental constraints are defined by the spatial distribution of urban structures. These constraints detailed in Section 2.1.3 are formulated as mathematical boundaries to ensure the feasibility of the trajectory planning solution. Consequently, the eVTOL UAV trajectory planning process is structured around three core pillars including spatial environment modeling and multi-objective function establishment and the definition of operational constraints.

3.2. Model Assumptions

To ensure the scientific rigor and computational efficiency of the trajectory planning strategy, the following assumptions are established for the eVTOL UAV simulation environment.
1. Deterministic Mission Boundaries. The spatial coordinates of the takeoff point S 0 and landing destination S g o a l are defined and remain fixed throughout the trajectory planning process.
2. Decoupled Kinematic and Geometric Model. The eVTOL UAV is treated as a point mass regarding its inertia and energy consumption calculations to enhance efficiency during the reinforcement learning training process. For the purpose of collision detection and spatial interaction, the aircraft is represented by a three dimensional safety cylinder rather than a geometric point. This equivalent safety envelope W e n v accounts for the physical dimensions of the EH216-S and ensures that the trajectory planning results maintain a safety margin from urban obstacles.
3. Constant Ground Velocity and Its Limitations. The eVTOL UAV is assumed to maintain a constant ground speed V g throughout the mission. This simplification is adopted because the aerodynamic response data and flight control schedules for the EH216-S are not publicly available for this study. By fixing the ground velocity, the simulation evaluates the relationship between energy intensity κ ( a t ) and the three dimensional spatial topology under wind disturbances. This modeling choice ignores the energy consumption during acceleration and deceleration phases, which may lead to an underestimation of the total energy required for missions with frequent maneuvers.
4. Static Obstacle Environment. Urban structures and obstacles are treated as stationary entities within a three dimensional occupancy grid representing an urban canyon for trajectory planning.
5. Profile Compliant Mission Structure. The trajectory planning mission includes mandatory waypoints to ensure distinct climb and cruise and descent phases, which satisfies flight profile requirements.
6. Regulated Low-Altitude Airspace. To comply with low-altitude economy regulations, the cruising altitude is constrained to ensure safety in building shuttle scenarios while adhering to regional airspace management policies.
7. Geometric Scaling and Spatial Rationality. To optimize training efficiency while preserving interaction fidelity, the simulation employs a geometric scaling strategy. According to the airworthiness constraints issued by the Civil Aviation Administration of China (CAAC), the maximum operational altitude for the EH216-S is 120 m. By mapping this real-world boundary to the 20 m numerical simulation workspace, a geometric scaling operator of 1 / 6 is established. Regarding the aircraft dimensions, the physical wingspan of 5.73 m is equivalent to a core fuselage cylinder with a diameter of 3.0 m to enhance search efficiency. Through the 1 / 6 scaling ratio, this physical dimension is represented by a 0.5 m equivalent safety envelope W e n v within the simulation. This alignment ensures that the spatial occupancy and building clearances reflect urban canyon dynamics.
8. Energy Constraints and SoC Safety Red line. The initial energy E 0 is set between 100 and 200 J to construct an energy-constrained trajectory planning task. By combining this setting with a 20 % state of charge (SoC) safety red line, the framework compresses the energy redundancy of the mission to strengthen the sensitivity of the reward function to wind field disturbances. Within the EA-TD3 reward mechanism, breaching this threshold triggers a penalty to prioritize flight safety while evaluating the decision-making performance under limited resource conditions.
Note that the energy metrics in this simulation are presented in normalized Joules to maintain numerical stability during the reinforcement learning training process. These energy values reflect the proportional mapping of the power consumption characteristics derived from the CMU dataset onto the EH216-S kinematic model rather than the absolute kilowatt hour values of the actual aircraft. Consequently, all energy metrics mentioned hereafter are expressed in these normalized Joules to ensure consistency with the energy intensity definitions. This standardized convention ensures that the learned decision logic is evaluated based on the relative energy depletion patterns rather than absolute physical magnitudes, which enhances the robustness of the trajectory planning strategy.

3.3. Key Function Formulations

To ensure that the EA-TD3 agent achieves trajectory planning while maintaining operational safety, we define two mathematical frameworks including the objective function for performance optimization and the energy boundary for safety assurance.

3.3.1. Establishment of Objective Function

The objective function is designed to balance the trade-off between trajectory planning efficiency and energy autonomy. The total cost function f is formulated as a weighted sum of the trajectory distance and energy expenditure
min f = α 1 L + α 2 E
where L = i = 1 n s i s i 1 2 is the cumulative geometric length cost and E = i = 1 n κ ( a i ) · L i is the energy expenditure cost driven by the energy intensity factor κ ( a t ) derived from the CMU dataset.

3.3.2. Energy Boundary and Penalty Function

To prevent mission failure, a state of charge (SoC) safety red line is established at 20 % . This boundary is implemented through a penalty mechanism in the reward structure
R p e n a l t y = 0 , S o C t > 20 % P r e d l i n e , S o C t 20 %
where P r e d l i n e represents a constant penalty. This formulation ensures that the trajectory planning policy prioritizes energy preservation over path shortening when the eVTOL UAV approaches the safety limit.

3.4. eVTOL UAV Parameter Settings

With the expansion of the low-altitude economy, the field of eVTOL UAVs has undergone evolution. Progress remains diverse and an industry standard for configuration has yet to be finalized. In this study, since the eVTOL UAV adopts a decoupled kinematic and geometric model for trajectory planning, the influence of design parameters is integrated into the kinematic constraints. As detailed in Section 2.1, this paper selects the EHang EH216-S platform, which is the first eVTOL UAV to receive a type certificate (TC) as the research object. By utilizing the parameters of this model as preset values, this research aims to evaluate the robustness of the energy-aware trajectory planning method and conduct a comparative analysis. Table 4 summarizes the parameters required to define the operational envelope for this study.

3.5. Energy Consumption Model Results

In this study, three battery cells including VAH01 and VAH17 and VAH27 from the CMU eVTOL UAV battery dataset are selected for characterization and energy intensity modeling.
As presented in Table 5, the dataset provides time series measurements such as terminal voltage and discharge current and cell temperature, which serve as the foundation for the power consumption analysis.
As presented in Table 6, through statistical analysis of the power characteristics throughout the life cycles of the eVTOL UAV batteries, this study calibrated the energy intensity factors κ for the flight phases. To ensure the numerical stability and gradient convergence of the EA-TD3 algorithm, a linear normalization factor of 0.1 is applied to the calibrated values. This adjustment maps the energy metrics into a range for neural network training while preserving the relative physical proportions of energy depletion across different flight phases. The resulting hierarchical pattern reveals that descent energy intensity exceeds climb energy intensity and both remain higher than cruise energy intensity.
The physical significance of κ lies in its representation of energy depletion per unit of spatial displacement. While the power demand during the climb or descent phases is approximately three times that of the cruise phase, the disparity in κ reaches 40 to 80 times. This is governed by the relationship κ = P / v , where the cruise phase benefits from horizontal cruise velocities v c r u i s e , which compresses the time required to traverse a unit distance. Conversely, the climb and descent phases involve lower vertical velocities while requiring power to counteract gravity, which results in energy accumulation per meter of displacement.
The calibrated and normalized values including κ d e s c e n t = 2.032 and κ c l i m b = 1.018 and κ c r u i s e = 0.025 provide a physical foundation for the EA-TD3 trajectory planning agent. This scaling strategy ensures that the learned trajectory planning policy is driven by the coupling between three dimensional trajectory topology and energy depletion patterns rather than absolute physical magnitudes. By maintaining these relative ratios, the simulation preserves the energy constrained nature of the mission within the compressed numerical workspace.

3.6. 3D Urban Workspace and Obstacle Modeling Results

The simulation workspace is defined as a continuous three dimensional Euclidean space with dimensions L × W × H = 20 m ×   20 m ×   20 m. Consistent with the model assumptions established previously, the simulation environment is constructed using proportional scaling to optimize computational resources while preserving the physical logic. Consequently, the physical parameters of the eVTOL UAV including its geometric dimensions and kinematic constraints are scaled. This ensures that the interaction between the eVTOL UAV and the urban obstacles remains consistent with operational dynamics.
To simulate a structured urban landscape, the set of obstacles O is represented using an analytical box model. Each building O i O is defined by its minimum corner coordinates ( x i , min , y i , min , z i , min ) and maximum corner coordinates ( x i , max , y i , max , z i , max ) . The geometric dimensions of these urban architectural obstacles are characterized by
Δ x i = x i , max x i , min Δ y i = y i , max y i , min h i = z i , max z i , min
Therefore, for an eVTOL UAV located at position P t = ( x t , y t , z t ) at time step t, the mathematical expression for the collision detection function C ( P t ) containing the safety buffer d safe can be updated by Formula (8) as follows
C ( P t ) = 1 , if O i O such that ( x i , min d safe x t x i , max + d safe ) ( y i , min d safe y t y i , max + d safe ) ( 0 z t z i , max + d safe ) 0 , otherwise
where d safe = 0.1 m denotes the safety buffer margin, which complements the 0.5 m scaled aircraft dimension. Since all buildings in the urban environment are ground-based, where z i , min = 0 , the region is treated as free space when the flight altitude z t exceeds the expanded obstacle height z i , max + d safe , which enables vertical overflight maneuvers. This margin accounts for aerodynamic disturbances and control uncertainties near building surfaces.
The environment is set in Table 7.
Two intermediate waypoints W 1 and W 2 are integrated into the simulation workspace. First, from a mission-oriented perspective, all waypoints are predefined at a consistent altitude of 12 m, which serves as a safe cruise layer and simulates a low-altitude flight corridor for eVTOL UAV logistics. Second, from an algorithmic perspective, these waypoints function as local sub-goals within the EA-TD3 framework. By decomposing the trajectory planning task into sequential segments, the waypoints mitigate the sparse reward challenge and guide the agent through obstacle-dense regions to accelerate the convergence of the reinforcement learning process.
The configurations of the trajectory planning mission are summarized in Table 8. To ensure an evaluation, the mission requires the eVTOL UAV to traverse from the start position P 0 to the goal P goal within a time horizon T max . The scale factor λ L defines the mapping between the physical dimensions and the simulation workspace, which maintains consistency with the proportional scaling assumption.

3.7. Wind Field Modeling Results

Environmental robustness is a requirement for eVTOL UAV operations [32]. In this study a spatially varying turbulent wind field M ( x ) is defined over a discrete grid of 20 × 20 × 20 nodes, where x = [ x , y , z ] denotes the spatial coordinate vector. To reflect the characteristics of the atmospheric boundary layer, the grid wind velocity is formulated by coupling a temporal sinusoidal base with a height-dependent stochastic noise term. This setup simulates localized flow patterns and wind shear effects encountered during trajectory planning missions.
The baseline wind velocity component w i , j , k , c base ( t ) is constructed using a dual-frequency sinusoidal superposition such that
w i , j , k , c base ( t ) = 0.5 sin ( t + ϕ i , j , k , c ) + 0.3 sin ( 2 t + ϕ i , j , k , c ) · s
where s = 0.5 is the amplitude scaling factor and t is the dimensionless time derived from the step count. To incorporate the atmospheric boundary layer effects, the stochastic noise intensity is coupled with the vertical height through a wind profile power law relationship. The stochastic disturbance η i , j , k , c is defined as
η i , j , k , c U σ · z z r e f δ , σ · z z r e f δ
where σ = 0.2 m/s represents the baseline noise magnitude and z r e f = 20 m is the reference altitude. The exponent δ = 0.35 denotes the wind profile power law coefficient for high-density urban terrain, which dictates that the turbulence intensity increases with flight altitude. The resulting grid wind velocity is formulated as w i , j , k , c = w i , j , k , c base ( t ) + η i , j , k , c .
The energy expenditure of the eVTOL UAV is modulated by the relative angle θ r e l between the wind vector and the heading of the aircraft. Under the constant ground velocity assumption, we introduce a dimensionless wind angle correction factor η ( θ r e l ) to formulate the effective energy intensity κ e f f
κ e f f ( a t ) = κ s · η ( θ r e l )
where κ s { κ c l i m b , κ c r u i s e , κ d e s c e n t } is the calibrated energy intensity and η ( θ r e l ) = 1 α · cos ( θ r e l ) with α = 0.5 as the wind sensitivity coefficient. This trigonometric formulation ensures that a tailwind condition reduces the energy demand while a headwind condition increases it. The wind vector at the position of the vehicle is obtained via nearest neighbor interpolation. This synthesized wind field is incorporated into the kinematic state evolution such that V ground ( t ) = V air ( t ) + M ( P t ) . This modeling approach incentivizes the EA-TD3 agent to perceive the spatial wind field topology and prioritize trajectories with wind orientations to optimize the trade-off between path length and energy consumption. The parameters are summarized in Table 9.

3.8. Algorithmic Hyperparameters and Evaluation Metrics

The trajectory planning performance of the EA-TD3 model is sensitive to hyperparameter configurations. In this study, we evaluate the optimization of hyperparameters including learning rate, batch size, network architecture, and target policy noise. To facilitate a rigorous cross-comparison and eliminate dimensional influences, all evaluation metrics are normalized and mapped to a uniform [ 0 , 100 ] scale through a linear transformation. This ensures that the collaborative response of all indicators to hyperparameter changes can be visualized within a single coordinate system.
1. Best Average Reward R best . This reflects the peak performance of the learned policy by capturing the maximum average reward achieved across all evaluation points during the training process.
2. Success Rate P success . This indicates the reliability of task completion, defined as the proportion of episodes where the eVTOL UAV satisfies the termination condition P T P goal < ε with ε = 1.0 m.
3. Composite Score S composite . A weighted metric designed to integrate the reward signal with the success rate, formulated as
S composite = w R best + ( 1 w ) · β · P success
where w = 0.5 is the weighting coefficient and β = 200 is a normalization factor used to balance the numerical scales.
To ensure the generalizability of the hyperparameter selection, the optimization is conducted under sufficient energy conditions. This enables the EA-TD3 algorithm to prioritize learning optimal trajectory planning and avoidance strategies without favoring overly conservative behaviors induced by energy scarcity. The impact of energy constraints on performance is further analyzed in Section 4.
The experimental results of hyperparameter tuning are illustrated in Figure 9. In this analysis, S composite is designated as the primary evaluation criterion as it comprehensively assesses learning efficiency and task reliability. The other two metrics, including optimal reward and success rate, serve as auxiliary indicators to provide multi-dimensional validation. To interpret these results, the focus should remain on the synchronization of the indicators. As illustrated in Figure 9, S composite and the auxiliary metrics achieve their peak performance at the same coordinate points across the four hyperparameters. This consistency confirms that the optimal S composite is not achieved at the expense of any individual metric, validating the robustness of the selected configurations. The detailed analysis of each parameter is as follows.
Based on the evaluation framework established above, the analysis of the results for the four hyperparameters is as follows.
1. Learning Rate. The learning rate α l r affects convergence stability. As shown in Figure 9a, the value of α l r = 1 × 10 4 was selected because it yielded the highest S composite . The synchronization of the auxiliary indicators at this point confirms the reliability of this choice for the trajectory planning task.
2. Batch Size. We assessed batch sizes B { 128 , 256 , 512 } . A batch size of B = 256 was identified as the optimal point in Figure 9b. The alignment of the success rate and best reward at this same coordinate provides evidence that this balance maintains training efficiency while minimizing gradient variance.
3. Network Architecture. As shown in Figure 9c, the representational capacity was tested across three multi-layer perceptron (MLP) configurations. A three layer architecture with hidden units of 512, 256, and 128 outperformed alternative structures. This configuration captured the nonlinear mappings between the high-dimensional state space and the eVTOL UAV action space.
4. Target Policy Noise. As a feature of the EA-TD3 algorithm, target policy noise σ t p is utilized to achieve policy smoothing and mitigate Q-value overestimation. We evaluated noise levels of 0.1 , 0.2 , and 0.5 . Figure 9d shows that a noise level of σ t p = 0.1 provided exploration without destabilizing the target Q-value targets, resulting in the highest S composite and task success rate.
The final selected hyperparameters are summarized in Table 10.
To support the trajectory planning performance of the model, the hyperparameters and environmental constants are configured as documented in Table 11. These settings establish the foundation for the trade-off between exploration efficiency, safety constraints, and energy optimization. As detailed in the table, the configuration is categorized into algorithm-specific hyperparameters, energy intensity factors for different flight phases, and the weighting coefficients of the reward function. Specifically, the reward weights for goal reaching and collision avoidance are set as primary constraints, while the energy penalty weight ω e = 0.5 regulates the consumption-aware behavior of the agent during the 200,000 training steps.
The performance of various algorithms in trajectory planning is quantified through four metrics that evaluate the experimental results.
1. Success Rate. The success rate is the proportion of episodes where the eVTOL UAV reaches the target position within the termination threshold of 1.0 m. This metric reflects the reliability and task completion capability of the learned policy. A high success rate indicates that the agent can consistently navigate through urban environments with obstacles and wind disturbances to accomplish the mission.
2. Average Reward. The average reward is the mean cumulative reward across all evaluation episodes. This indicator integrates objectives including goal reaching, avoidance, distance minimization, and energy efficiency through the energy-aware reward function. As a performance indicator, it captures the quality of the policy by balancing task completion, safety, and resource management.
3. Average Steps. The eVTOL UAV selects actions at each time step according to the learned policy and transitions to the next state based on the system dynamics. Computed over successful episodes, the average number of steps reflects the temporal efficiency of the algorithm. Fewer steps indicate faster mission completion and more direct trajectory planning, which is necessary for time-sensitive operations.
4. Energy Consumption. This metric records the total energy expended during each episode, accounting for the differential energy costs of climbing, cruise, and descent phases. The average energy consumption reflects the energy awareness of the algorithm and its ability to plan resource-efficient trajectories. Lower energy consumption demonstrates that the agent has learned to optimize maneuvers by leveraging the energy penalty term within the reward function.

4. Discussion

4.1. Training Results and Comparative Analysis

The numerical experiments were conducted on a workstation equipped with an AMD Ryzen 9 7945HX processor, 64 GB of RAM, and an NVIDIA GeForce RTX 4060 GPU within a Windows 11 environment. To evaluate the performance of the proposed Energy-Aware Twin Delayed Deep Deterministic Policy Gradient (EA-TD3) trajectory planning algorithm, a comparative analysis was performed against three benchmark DRL frameworks: Deep Deterministic Policy Gradient (DDPG) [35], Soft Actor-Critic (SAC) [36], and the standard Twin Delayed Deep Deterministic Policy Gradient (TD3) [37].
To ensure the validation of operational robustness and energy efficiency, all algorithms were trained and evaluated under identical environmental conditions, including synchronized wind field topologies and calibrated energy intensity models. The hyperparameter configurations for each algorithm are summarized in Table 12. The EA-TD3 algorithm utilizes a 512 × 256 × 128 architecture and refined target policy noise levels, as identified in Section 3.8, to manage the nonlinearities and stochastic perturbations inherent in urban trajectory planning.
Comparative experiments were conducted across four algorithms. To mitigate stochastic training fluctuations, a smoothing technique was applied to the raw data. Figure 10 illustrates the training progress, where Figure 10a shows the success rate, Figure 10b shows the average reward, Figure 10c shows the average steps, and Figure 10d shows the energy consumption.
In the initial training phase, the agents undergo stochastic exploration, leading to frequent collisions with urban obstacles or environmental boundaries. These early failures result in a success rate near 0% and negative cumulative rewards. During this stage, the absence of an optimized trajectory planning policy keeps the average episode length between 150 and 180 steps. Such inefficient maneuvers lead to rapid energy depletion, nearly exhausting the eVTOL UAV battery budget.
As training progresses, the agents internalize trajectory planning behaviors through experience replay and policy gradient updates. Performance improvements emerge between 10,000 and 20,000 steps, characterized by rising success rates and rewards transitioning into positive territory, while average episode lengths decrease to 80–100 steps. By the conclusion of the training, EA-TD3 achieves a success rate of approximately 90–95%, an average reward of 100–130, and a reduced temporal footprint of 30–40 steps per episode.
Comparatively, SAC achieves the second-best performance with a success rate between 75% and 85%, while TD3 and DDPG exhibit success rates in the range of 60–75% and 50–65%, respectively. The average energy consumption of EA-TD3 is 10–20 J lower than that of the baseline algorithms. This efficiency validates the energy-aware optimization strategy, which is required for battery-constrained missions in complex urban airspaces.

4.2. Analysis of Energy-Aware Trajectory Planning Mechanisms

The training results validate the feasibility of the EA-TD3 algorithm for trajectory planning. To facilitate a qualitative comparison, Figure 11 illustrates the 3D trajectories generated by the four evaluated algorithms within a standardized mission scenario. From a geometric perspective, all algorithms successfully execute a complete flight profile encompassing the phases of climb, cruise, and descent. However, the distinct topological variations in the trajectories, as shown in Figure 11, highlight the varying trajectory characteristics of the agents within the low-altitude urban airspace.
For eVTOL UAV platforms such as the EH216-S, the requirement for trajectory planning lies in the rational utilization of limited energy resources amidst environmental uncertainties. To evaluate the impact of physical constraints on performance, we conduct three progressive analyses within a unified experimental framework. These scenarios represent distinct analytical dimensions of the same mission environment to evaluate the EA-TD3 framework.
1. Maneuvering Behavior Analysis Based on Real Battery Dynamics (Analysis 1). This analysis evaluates how the data-driven energy consumption model shapes the fundamental flight behaviors of the eVTOL UAV. Crucially, since the eVTOL UAV must operate without human intervention, the system must independently evaluate the cost disparities between various maneuvers to ensure mission viability. This phase specifically tests the ability of the EA-TD3 algorithm to internalize nonlinear power costs and identify the optimal balance between climb, cruise, and descent. It should be noted that the dynamic wind field is consistently applied across all analyses to ensure environmental realism and generalizability of the findings.
2. Robustness Testing Under Safe Flight Envelope Constraints and Resource Scarcity (Analysis 2). To evaluate the eVTOL UAV sensitivity to the Safe Flight Envelope (SFE), we progressively compressed the battery energy budget within the established dynamic wind field. This stress test is designed to identify the critical thresholds at which traditional energy-agnostic algorithms suffer functional failure. By highlighting the limitations of conventional methods under coupled environmental and resource pressures, this analysis demonstrates how the EA-TD3 autonomous agent ensures the reliability of missions through proactive trajectory planning reconfiguration. The inclusion of wind effects throughout this process validates that the energy-sensing mechanism remains effective under stochastic perturbations.
3. Analysis of Training Convergence Stability and Constraint Satisfaction (Analysis 3). This analysis is designed to dissect the underlying mechanisms responsible for the performance of the proposed algorithm. By analyzing the fluctuation rate of energy consumption throughout the training process, we distinguish the decision-making stability between soft penalties and hard constraints. The results reveal that incorporating explicit SOC observations enables the agent to internalize a risk-averse trajectory planning strategy. This mechanism ensures that the eVTOL UAV treats energy limits as a non-negotiable safety boundary. Such an intrinsic commitment to constraint satisfaction is the reason why the EA-TD3 autonomous agent maintains higher mission reliability in complex urban environments compared to traditional energy-agnostic frameworks.

4.2.1. Analysis 1: Maneuvering Behavior Analysis Based on Real Battery Dynamics

Under the baseline condition of sufficient energy reserves, this section investigates how the nonlinear power dynamics derived from the CMU battery dataset dictate the decision-making logic of the eVTOL UAV agent. Specifically, we examine the emergent ability of the agent to distinguish between energy-intensive climb/descent maneuvers and energy-efficient cruise flight within the unified experimental framework.
To mitigate the influence of stochastic policy exploration and ensure a rigorous validation of algorithmic robustness, we conducted 50 independent Monte Carlo (MC) simulations for each of the four trained algorithms. Table 13 presents the statistical averages and performance metrics derived from these trials. To evaluate the geometric quality of the generated trajectories, two directional metrics are introduced: the heading metric, representing the cumulative absolute changes in the horizontal yaw angle to quantify total steering effort, and the smoothness metric, defined as the integrated angular deviation between successive 3D velocity vectors to reflect trajectory fluidity and the suppression of abrupt maneuvers. This comparative analysis verifies the stability of the autonomous trajectory planning policy when interacting with high-fidelity physical constraints.
The experimental results demonstrate that EA-TD3 achieves an 11.6% reduction in energy consumption average compared to the baseline, exhibiting higher energy efficiency and trajectory quality among the evaluated algorithms. This energy-saving effect is not solely a consequence of trajectory length reduction, but originates from the internalization of climb and descent cost balancing through extensive experience sampling. The specific analytical findings are as follows.
First, the optimization of maneuver frequency is governed by power cost awareness. Analysis of the behavioral characteristics reveals that EA-TD3 recorded the lowest frequency of vertical maneuvers, with only 8.82 climbs and 7.72 descents on average. Incorporating the high-fidelity battery dynamics model confirms that eVTOL UAV power consumption is highly nonlinear. In autonomous flight, the vertical climb phase requires substantial power to counteract gravity, resulting in instantaneous demands that significantly exceed those of the horizontal cruise phase.
Traditional algorithms, such as DDPG and TD3, lack an explicit energy perception mechanism and tend to exhibit greedy obstacle-avoidance behavior by frequently resorting to drastic altitude changes. In contrast, the EA-TD3 agent understands the physical cost associated with high-rate battery discharge. As illustrated in the lateral profile in Figure 12, EA-TD3 maintains a more stable altitude layer and opts to bypass obstacles through horizontal trajectory adjustments rather than energy-intensive vertical shifts. While the trajectories of DDPG and EA-TD3 appear nearly coincident during the initial climb and mid-course phases in Figure 12, this phenomenon stems from their shared algorithmic lineage. Since EA-TD3 is developed as an energy-aware extension of the TD3 framework, which itself evolves from DDPG, both algorithms utilize a deterministic actor structure that prioritizes the most direct geometric path to satisfy the primary mission completion reward in the early training stages. Within the rigid constraints of a narrow urban canyon, this deterministic gradient leads to a convergence toward a similar initial climb path. However, significant topological differences emerge when comparing EA-TD3 with SAC and the standard TD3. For SAC, the inclusion of a maximum entropy objective encourages continuous action space exploration, resulting in stochastic and curved climb paths. For the standard TD3, the absence of an integrated energy penalty causes the agent to ignore the high battery discharge rate during aggressive climbs. In contrast, EA-TD3 identifies a critical divergence during the descent phase by utilizing its twin-delayed value estimation to prioritize a gradual glideslope. This strategic shift avoids the high power loss regimes of the battery system and ensures the structural load stability of the eVTOL UAV throughout the trajectory planning mission.
Despite its superior energy performance, Figure 13 reveals that the proposed EA-TD3 method exhibits noticeable right-angle maneuvers during the final landing phase. This behavior originates from the mission success-oriented logic in the terminal state where the agent executes aggressive heading corrections to eliminate residual positional errors to ensure the eVTOL UAV precisely strikes the landing pad and secures the success reward. Since the current reward structure does not impose heavy penalties on rapid heading changes, these sharp turns emerge as the most effective strategy for the agent to guarantee mission completion. However, these results also highlight a limitation of the current model, specifically the omission of strict kinematic smoothness constraints such as angular acceleration limits. Although the agent achieves high energy efficiency, it does so through a trade-off that sacrifices trajectory fluidity in the final seconds. In real-world scenarios, such abrupt maneuvers could impose severe structural stress on the airframe or even lead to aerodynamic stall. Future work will focus on incorporating curvature-constrained action spaces or refined smoothness terms to ensure that the generated trajectory planning solutions are more aligned with physical flight dynamics.
Secondly, to intuitively reveal decision-making disparities under physical constraints, Figure 13 illustrates the correlation between flight altitude profiles and instantaneous energy consumption rates for each algorithm. As observed in the side plot of Figure 13, the energy consumption trajectories of the baseline algorithms and EA-TD3 diverge significantly during the initial 10 to 15 steps. The main plot further demonstrates that the instantaneous energy consumption rates of TD3, DDPG, and SAC are approximately 40% higher than that of EA-TD3 in the initial phase.
From a trajectory perspective, the baseline algorithms, driven by energy-agnostic logic, tend to execute extremely steep climb maneuvers to rapidly establish a vertical safety margin. While this strategy is valid in purely geometric trajectory planning, these aggressive climbs consume 40% to 50% of the total energy budget within the first third of the mission when physical dynamics are considered. In contrast, the EA-TD3 energy consumption rate remains stable between 0.8 and 2.5 J per step throughout the flight.
This stabilized flight profile offers significant engineering value for autonomous operations. Consistent power output effectively extends battery cycle life and alleviates thermal management pressure on motors and electronic controllers during high-power discharges. Furthermore, for eVTOL platforms such as the EH216-S, smooth altitude transitions ensure structural load stability and minimize maneuvering stress on the airframe, adhering to rigorous operational standards for civil UAVs. In the figures, the green circles and red crosses represent the starting positions and targets, respectively, while stars denote the waypoints. Most importantly, the EA-TD3 results demonstrate that the energy perception mechanism enables the agent to identify the path with the lowest physical energy cost in complex urban canyons, achieving a dual optimization of energy and spatial efficiency.

4.2.2. Analysis 2: Robustness Testing Under Safe Flight Envelope Constraints and Resource Scarcity

In the UAM environment, an eVTOL UAV must handle both limited energy and wind turbulence in urban canyons. This section presents a stress test within the experimental framework by reducing the battery energy budget from 200 J to 120 J in a non-uniform dynamic wind field. To ensure statistical significance, we conducted 50 independent MC autonomous trajectory planning trials for each algorithm at every energy level. These experiments verify the algorithms’ sensitivity to the flight envelope and mission delivery capability under resource scarcity.
The results in Table 14 show that the reliability of the algorithms diverges as the energy budget tightens. As a pilotless platform, the EH216-S requires its trajectory planning agent to identify paths that satisfy the flight envelope without real-time human correction, making the robustness of the autonomous logic the primary factor in mission success.
First, we analyze the performance drop observed at the energy threshold. For this scenario, 120 J is defined as the survival threshold. At this limit, energy-agnostic algorithms such as DDPG and SAC showed high sensitivity with success rates falling to 0% and 6.1% respectively. This occurs because these baseline frameworks lack real-time SOC awareness and cannot reconfigure their trajectories based on remaining energy reserves.
Figure 14 shows the flight status of different algorithms under energy limits. The thicker lines represent failed missions, and the circles of the same color represent the corresponding endpoints. The EA-TD3 algorithm results in the fewest failed missions, followed by TD3 and SAC, while DDPG shows the lowest reliability. As illustrated in Figure 14, when baseline algorithms encounter wind perturbations requiring additional thrust, the eVTOL UAV frequently suffers from functional failure in the final stages, typically between steps 25 and 35. This occurs because their early maneuvers are energy-intensive, leaving no power buffer to counteract late-stage environmental stochasticity. Without managing energy–safety trade-offs, these agents lead the platform to power depletion before mission completion.
Secondly, the autonomous adaptive mechanism under aerodynamic energy coupling distinguishes EA-TD3 from other algorithms. EA-TD3 maintains a success rate of 87.8% even under the 120 J constraint. This resilience stems from its trajectory reconfiguration capability where the algorithm couples the wind field correction factor η with SOC observations. Upon perceiving that energy levels are approaching critical boundaries, the agent identifies regions with lower aerodynamic resistance within the wind field. This decision-making logic moves beyond geometric obstacle avoidance and functions as a physics-driven resource-preserving strategy.
At the 120 J threshold, the performance lead of EA-TD3 over the standard TD3 reaches 61.3 percentage points. Analysis of the successful trajectories in Figure 14 reveals that when facing an energy deficit, the algorithm increases flight range elasticity by suppressing maneuver gradients and optimizing cruise altitudes. These experiments verify that energy boundary sensitivity is a pivotal metric for evaluating trajectory planning frameworks. EA-TD3 demonstrates that explicit energy perception mechanisms serve as a safety guarantee for eVTOL UAV platforms confronting meteorological disturbances, ensuring autonomous airworthiness and mission delivery capability during energy-critical phases.

4.2.3. Analysis 3: Training Convergence Stability and Constraint Satisfaction

This analysis is designed to examine the relationship between decision-making stability and constraint mechanisms by quantifying energy efficiency fluctuations throughout the training evolution. To validate the reliability of the algorithm during the learning process, this section evaluates its statistical performance across multiple independent training sessions. Through the comparative visualization in Figure 15, the distinction between EA-TD3 and the benchmark frameworks regarding energy consumption stability becomes evident.
First, we analyze the impact of soft penalties and hard constraints on training stability. In Figure 15a, we can see that during the training evolution, DDPG, TD3, and SAC exhibited energy consumption fluctuations which oscillated between 57 J and 70 J. This instability stems from their reliance on an energy efficiency penalty within the reward function, which lacks a mandatory environmental termination condition. Under this mechanism, the agent frequently attempts high-energy maneuvers during exploration without bearing the immediate consequence of mission failure, which causes the strategy to oscillate between energy conservation and aggressive execution. In contrast, EA-TD3 maintained energy consumption within a narrow range of 61 to 63 J. By establishing energy depletion as a hard constraint termination condition and introducing explicit SOC observations, the agent was forced to internalize a risk-averse decision-making logic from the early stages of training. This mechanism ensures that EA-TD3 exhibits policy consistency across independent stochastic trajectory planning trials.
Secondly, we conducted statistical consistency verification through 50 independent simulations. To further demonstrate this reliability, the box plot in Figure 15b illustrates the statistical distribution characteristics of the trials. EA-TD3 not only achieved the lowest average energy consumption of 62.46 J but also maintained the smallest interquartile range (IQR) among all evaluated algorithms. This demonstrates that its learned energy allocation strategy possesses high repeatability when encountering varying starting coordinates and environmental noise. In contrast, the baseline algorithms exhibit wider bandwidths and outliers, which indicate that in the absence of hard constraints, they are prone to unpredictable high-energy behaviors. Such a level of uncertainty is a challenge for the autonomous operation of eVTOL UAV platforms.
The experimental results demonstrate that relying solely on reward-based penalties is insufficient to cultivate strategies with consistent reliability. Through a hard constraint-driven training paradigm, EA-TD3 induces flight behaviors characterized by high consistency and low volatility, which provides technical support for the safe execution of unmanned missions in confined urban airspaces.

4.3. Ablation Study

To verify the necessity and effectiveness of the energy perception mechanism in DRL-based eVTOL UAV autonomous trajectory planning, this study conducts ablation experiments by increasing environmental fidelity. Unlike the previous stress tests focusing on energy boundaries, these experiments are performed with a sufficient energy budget of 200 J to isolate the impact of three physical factors on trajectory planning performance, including dynamic wind fields, high-fidelity energy consumption models, and complex urban obstacle layouts.
As summarized in Table 15, we designed four simulation tiers progressing from idealized environments to those with physical realism. By introducing key variables of the operational environment, we established a performance benchmark for eVTOL UAV autonomous trajectory planning. This tiered configuration is intended to deconstruct the influence of environmental fidelity on the algorithm decision-making logic to ensure that the agent possesses robustness for transition from simulation to real-world application.
In the Level 1 configuration, the environment consists of a constant wind field, a uniform energy consumption model, and three simplified building obstacles. The mission objective is to reach a single target landmark. This level serves as an idealized trajectory planning environment, which provides a performance baseline for the agent.
In the Level 2 configuration, while maintaining constant wind and simplified obstacles, a realistic energy consumption model derived from the CMU dataset is introduced. This model accounts for the nonlinear power distribution of the eVTOL UAV during maneuvers such as climbing, cruising, and descending. This stage subjects the trajectory planning algorithm to physical performance constraints to test its ability to internalize aerodynamic costs.
In the Level 3 configuration, retaining the high-fidelity energy model, the constant wind field is replaced with a dynamic sinusoidal wind field that fluctuates across spatial and temporal dimensions. This enhancement introduces non-stationary aerodynamic drag, which evaluates the autonomous trajectory planning system trajectory correction capabilities under dynamic environmental uncertainties.
In the Level 4 configuration, representing the peak of environmental fidelity, this level integrates dynamic wind fields, CMU-based power models, six building clusters, and multiple traversal landmarks. This scenario requires the algorithm to execute energy management across multiple task phases while navigating spatial complexity. It serves as an assessment of the eVTOL UAV autonomous operational envelope in urban canyons.
Four algorithms, DDPG, TD3, SAC, and EA-TD3, were trained independently at each environment level under identical hyperparameter configurations. Each agent underwent 5 × 10 5 training steps per level with an initial energy budget of 200 J. The architecture employed a three-layer neural network with 256 units per layer, a batch size of 256, a learning rate of 3 × 10 4 , and a discount factor γ = 0.99 . Upon convergence, 50 independent MC simulations were performed at each level for evaluation. Success was defined as reaching the final waypoint without collisions or power depletion.
The quantitative results of the ablation study in Table 16 show that as environmental fidelity increases, EA-TD3 maintains operational resilience. The performance gap between EA-TD3 and the baseline frameworks exhibits a nonlinear expansion as physical constraints tighten.
During Level 1 and Level 2, characterized by simplistic configurations, all algorithms achieved success rates between 96% and 100% under idealized spatial or static energy models. This suggests that in predictable environments without dynamic disturbances, reward penalty mechanisms are sufficient for basic trajectory planning. Consequently, the necessity of an explicit energy perception mechanism is less prominent under these low-fidelity conditions.
However, a performance bifurcation point emerges at Level 3 with the introduction of non-stationary aerodynamic disturbances. Without the coupled perception between wind field correction factors and real-time SoC, the success rates of SAC, TD3, and DDPG decline to a range between 82% and 92%. In contrast, EA-TD3 maintains a success rate of 94% while reducing energy consumption by 13% by leveraging its adaptability to dynamic uncertainties.
Under the Level 4 operational envelope, the results define the survival red line for eVTOL UAV autonomous systems. DDPG shows a performance drop with a success rate of 62% and energy consumption of 46.61 J. Conversely, EA-TD3 maintains a 98% success rate despite the constraints of dynamic wind fields and multi-waypoint transitions, requiring 38.40 J on average. Compared to TD3, EA-TD3 improves the success rate by 23.3 percentage points and achieves a 9.9% gain in energy efficiency. These findings demonstrate that as operational environments shift toward high fidelity, energy awareness is a core safety driver ensuring the mission survivability of unmanned eVTOL UAV platforms in urban canyons.
Ablation experiments demonstrate that energy perception mechanisms are necessary in complex operational environments. Figure 16 illustrates the percentage advantage of the EA-TD3 algorithm over the DDPG, SAC, and TD3 models regarding the trajectory planning success rate and energy consumption under different environmental levels. In simplified scenarios such as Level 1 and Level 2, energy optimization managed through reward penalty mechanisms is sufficient for basic mission success. As shown in Figure 16a, the success rate advantage of EA-TD3 remains marginal at these stages as all algorithms achieve performance levels near 100%.
However, as environmental factors introduce non-stationary wind disturbances and increased spatial complexity in Level 3 and Level 4, explicit energy state observation becomes critical for mission survivability. The success rate advantage of EA-TD3 widens at Level 3, where it maintains a 94% success rate while the performance of other algorithms declines to between 82% and 92%. At Level 4, this performance gap expands further. EA-TD3 sustains a 98% success rate while the success rate for TD3 drops to 90%, SAC to 72%, and DDPG to 62%. EA-TD3 outperforms the least effective algorithm by 36 percentage points. These results confirm that for unmanned eVTOL UAV systems operating in high-fidelity environments, energy awareness is a requirement for maintaining the SFE.
This improvement in efficiency is accompanied by energy consumption patterns. Figure 16b illustrates the disparities in energy depletion across various complexity levels. At Level 1, the discrepancy in energy consumption is negligible, and this trend continues through Level 2 where the difference remains minimal. However, at Level 3 and Level 4, EA-TD3 achieves approximately 10% energy savings compared to the baseline algorithms. This margin represents the accumulated efficiency gains in complex multi-waypoint scenarios. The energy-saving advantage and the success rate of EA-TD3 exhibit a synchronized monotonic trend, which expands as the realism of the simulation environment increases.
The ablation experiments provide a foundation for integrating energy-aware reinforcement learning into energy-constrained eVTOL UAV autonomous trajectory planning frameworks. The empirical results demonstrate that energy awareness is more than a supplementary optimization function. Instead, it serves as a core mission capability for eVTOL UAV platforms operating in complex environments, with its significance increasing alongside system complexity. For unmanned eVTOL UAV systems navigating under realistic constraints such as variable dynamic wind fields, urban building clusters, and multi-task landmark sequences, the physical awareness trajectory planning paradigm provided by EA-TD3 is necessary. It ensures mission execution and energy safety management, providing technical support for reliable autonomous flight within future UAM networks.

5. Conclusions

To address the energy constraints and dynamic disturbances faced by eVTOL UAVs in UAM environments, this paper proposes and validates an energy-aware reinforcement learning framework named EA-TD3. This method utilizes a battery dataset from CMU to construct a nonlinear energy consumption model, enabling the system to perceive energy management differences across flight phases including climb, cruise, and descent. By coupling this model with stochastic low-altitude wind fields and employing an enhanced TD3 algorithm as the decision engine, the framework achieves autonomous trajectory planning while satisfying battery safety constraints. The proposed framework is validated using the technical parameters of the EH216-S eVTOL platform. The results show that physical perception is a prerequisite for autonomous airworthiness of eVTOL UAV platforms. Compared with baseline frameworks, the EA-TD3 algorithm improves energy efficiency and mission reliability in dynamically uncertain environments through autonomous trajectory planning optimization.
The contributions of this research are reflected in the following three aspects.
First, this study constructs a physically aware power management mechanism. Unlike traditional trajectory planning algorithms that simplify energy consumption as a linear ratio of displacement, this research establishes a mapping relationship between maneuvering actions and energy consumption by analyzing voltage and current fluctuations in battery charge–discharge cycles. The experimental results demonstrate that this model characterizes the high power consumption patterns of the eVTOL UAV during vertical maneuvers, achieving the optimization of autonomous trajectory planning from geometric paths to physically feasible trajectories.
Second, the proposed framework achieves robust trajectory planning strategies under energy constraints. By incorporating survival boundary logic with energy red line awareness into the reward function, the EA-TD3 algorithm optimizes flight behavior based on the real-time SoC. In stress tests where available energy drops to 120 J, EA-TD3 maintains an 87.8% mission success rate. This performance is higher than that of the traditional TD3 at 26.5% and SAC at 6.1%, providing quantitative support for addressing operational endurance challenges in urban air mobility scenarios.
Third, the research validates mission efficiency and statistical stability under dynamic environmental disturbances. In ablation experiments involving dynamic wind fields and 3D building obstacles, EA-TD3 demonstrates energy efficiency and environmental adaptability. Compared to the benchmark algorithm, EA-TD3 achieves energy savings between 9.9% and 11.6% in complex scenarios while improving the mission success rate by 23.3 percentage points. Furthermore, statistical consistency validation reveals a narrow energy consumption distribution bandwidth. This degree of decision determinism provides a reference paradigm for optimizing control strategies for pilotless platforms such as the EH216-S in medical emergency and logistics scenarios.

Author Contributions

Conceptualization and visualization, J.C.; Methodology, J.C. and X.L.; Validation, J.C., Z.W. and L.Z.; Formal analysis, data processing and drafting, J.C. and J.X.; Survey research, J.C. and Z.W.; Resource provision, L.Z.; Review and editing and supervision guidance, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We are grateful to our colleagues for their valuable feedback and constructive discussions that enriched this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
eVTOL UAVElectric Vertical Takeoff and Landing Unmanned Aerial Vehicle
EA-TD3Energy-Aware Twin Delayed Deep Deterministic Policy Gradient
TD3Twin Delayed Deep Deterministic Policy Gradient
DDPGDeep Deterministic Policy Gradient
SACSoft Actor-Critic
DRLDeep Reinforcement Learning
MDPMarkov Decision Process
RRTRapidly Exploring Random Trees
UAMUrban Air Mobility
CFDComputational Fluid Dynamics
CMUCarnegie Mellon University
SOCState of Charge
SFESafe Flight Envelope
MSEMean Squared Error
IQRInterquartile Range
MCMonte Carlo
WPWaypoints
VVoltage
ICurrent
TTemperature

References

  1. Hassanalian, M.; Abdelkefi, A. Classifications, applications, and design challenges of drones: A review. Prog. Aerosp. Sci. 2017, 91, 99–131. [Google Scholar] [CrossRef]
  2. Moradi, N.; Wang, C.; Mafakheri, F. Urban air mobility for last-mile transportation: A review. Vehicles 2024, 6, 1383–1414. [Google Scholar] [CrossRef]
  3. Lozano Tafur, C.; Orduy Rodríguez, J.; Aldana Rodríguez, D.; Traslaviña, D.S.; Fernández Valencia, S.; Celis Ardila, F.H. Risk-Based Design of Urban UAS Corridors. Drones 2025, 9, 815. [Google Scholar] [CrossRef]
  4. Li, Y.; Guo, T.; Chen, J.; Wu, J.; Zhang, Y.; Alam, S.; Cai, K.; Du, W. Urban air mobility: A review and challenges. IEEE Intell. Transp. Syst. Mag. 2024, 17, 67–87. [Google Scholar] [CrossRef]
  5. Milcsik, C.J.; Johnson, E.N.; Khamvilai, T. Urban Aircraft Path Planning in Wind Fields. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025; p. 2427. [Google Scholar] [CrossRef]
  6. Baskar, D.; Gorodetsky, A. A simulated wind-field dataset for testing energy efficient path-planning algorithms for UAVs in urban environment. In Proceedings of the AIAA Aviation 2020 Forum, Virtual, 15–19 June 2020; p. 2920. [Google Scholar] [CrossRef]
  7. Jiang, S.; Wang, J.; Li, C.; Ou, J.; Duan, P.; Li, L. Identification of no-fly zones for delivery drone path planning in various urban wind environments. Phys. Fluids 2024, 36, 085166. [Google Scholar] [CrossRef]
  8. Chan, Y.; Ng, K.K.; Lee, C.; Hsu, L.T.; Keung, K. Wind dynamic and energy-efficiency path planning for unmanned aerial vehicles in the lower-level airspace and urban air mobility context. Sustain. Energy Technol. Assess. 2023, 57, 103202. [Google Scholar] [CrossRef]
  9. Frey, J.; Rienecker, H.; Schubert, S.; Hildebrand, V.; Pfifer, H. Wind tunnel measurement of the urban wind field for flight path planning of unmanned aerial vehicles. In Proceedings of the AIAA SCITECH 2024 Forum, Orlando, FL, USA, 8–12 January 2024; p. 2510. [Google Scholar]
  10. Rienecker, H.; Hildebrand, V.; Pfifer, H. Energy optimal 3D flight path planning for unmanned aerial vehicle in urban environments. CEAS Aeronaut. J. 2023, 14, 621–636. [Google Scholar] [CrossRef]
  11. Tian, P.; Chao, H.; Rhudy, M.; Gross, J.; Wu, H. Wind sensing and estimation using small fixed-wing unmanned aerial vehicles: A survey. J. Aerosp. Inf. Syst. 2021, 18, 132–143. [Google Scholar] [CrossRef]
  12. Marzougui, T.; Saenz, E.S.; Bareille, M. A rule-based energy management strategy for hybrid powered eVTOL. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2023; Volume 2526, p. 012024. [Google Scholar]
  13. Senkans, E.; Skuhersky, M.; Kish, B.; Wilde, M. A first-principle power and energy model for eVTOL vehicles. In Proceedings of the AIAA Aviation 2021 Forum, Virtual, 2–6 August 2021; p. 3169. [Google Scholar] [CrossRef]
  14. Jiao, Q.; Liu, Y.; Zheng, Z.; Sun, L.; Bai, Y.; Zhang, Z.; Sun, L.; Ren, G.; Zhou, G.; Chen, X.; et al. Ground risk assessment for unmanned aircraft systems based on dynamic model. Drones 2022, 6, 324. [Google Scholar] [CrossRef]
  15. Wu, Y.; Zhang, S.; Ni, X.; Li, X. System dynamics analysis of development risks in emerging eVTOL aircraft. Expert Syst. Appl. 2025, 300, 130363. [Google Scholar] [CrossRef]
  16. Jastrzębska, A.; Łągiewka, Z.; Sieczka, P.; Zalewski, J. A Survey on Algorithms Used for Drone Energy Consumption Modelling. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing; Springer: Cham, Switzerland, 2025; pp. 38–49. [Google Scholar]
  17. Xu, J.; Guan, C.; Wang, Y.; Zhuang, J.; Gan, W. A Systematic Review of Urban Air Mobility Development: EVTOL Drones’ Technological Challenges and Low-Altitude Policies of Shenzhen. Drones 2025, 9, 842. [Google Scholar] [CrossRef]
  18. Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
  19. Xu, T. Recent advances in Rapidly-exploring random tree: A review. Heliyon 2024, 10, e32451. [Google Scholar] [CrossRef]
  20. Fagundes-Junior, L.A.; de Carvalho, K.B.; Ferreira, R.S.; Brandão, A.S. Machine learning for unmanned aerial vehicles navigation: An overview. SN Comput. Sci. 2024, 5, 256. [Google Scholar] [CrossRef]
  21. Primatesta, S.; Guglieri, G.; Rizzo, A. A Risk-Aware Path Planning Strategy for UAVs in Urban Environments. J. Intell. Robot. Syst. 2019, 95, 629–643. [Google Scholar] [CrossRef]
  22. Gao, L.; Ding, J.; Liu, W. A vision-based irregular obstacle avoidance framework via deep reinforcement learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
  23. Fu, H.; Li, Z.; Zhang, W.; Feng, Y.; Zhu, L.; Long, Y.; Li, J. Path Planning for Agricultural UAVs Based on Deep Reinforcement Learning and Energy Consumption Constraints. Agriculture 2025, 15, 943. [Google Scholar] [CrossRef]
  24. Chen, J.; Zhou, J.; Wu, D.; Jiang, H. A USV Path Planning Algorithm under Special Environment Based on TD3-RRT. J. Syst. Simul. 2025, 37, 2888–2903. [Google Scholar] [CrossRef]
  25. Lv, H.; Chen, Y.; Li, S.; Zhu, B.; Li, M. Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy. Meas. Sci. Technol. 2024, 35, 056206. [Google Scholar] [CrossRef]
  26. Xie, Y.; Ma, Y.; Cheng, Y.; Li, Z.; Liu, X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Appl. Sci. 2010, 15, 3446. [Google Scholar] [CrossRef]
  27. Loquercio, A.; Kaufmann, E.; Ranftl, R.; Müller, M.; Koltun, V.; Scaramuzza, D. Learning high-speed flight in the wild. Sci. Robot. 2021, 6, eabg5810. [Google Scholar] [CrossRef]
  28. Bills, A.; Sripad, S.; Fredericks, L.; Guttenberg, M.; Charles, D.; Frank, E.; Viswanathan, V. A battery dataset for electric vertical takeoff and landing aircraft. Sci. Data 2023, 10, 344. [Google Scholar] [CrossRef]
  29. Phung, M.T.; Akhtar, M.S.; Yang, O.B. Machine learning approaches for assessing rechargeable battery state-of-charge in unmanned aircraft vehicle-eVTOL. J. Comput. Sci. 2024, 81, 102380. [Google Scholar] [CrossRef]
  30. Debnath, S.K.; Omar, R.; Latip, N.B.A. A review on energy efficient path planning algorithms for unmanned air vehicles. In Computational Science and Technology, Proceedings of the 5th ICCST 2018, Kota, Kinabalu, Malaysia, 29–30 August 2018; Springer: Singapore, 2018; pp. 523–532. [Google Scholar]
  31. Liu, S.; Li, S.; Li, H.; Li, W.; Tan, J. TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields. arXiv 2024, arXiv:2403.14877. [Google Scholar] [CrossRef]
  32. Hong, D.; Lee, S.; Cho, Y.H.; Baek, D.; Kim, J.; Chang, N. Energy-efficient online path planning of multiple drones using reinforcement learning. IEEE Trans. Veh. Technol. 2021, 70, 9725–9740. [Google Scholar] [CrossRef]
  33. Shani, G.; Heckerman, D.; Brafman, R.I. An MDP-based recommender system. J. Mach. Learn. Res. 2005, 6, 1265–1295. [Google Scholar]
  34. Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2018; pp. 1587–1596. [Google Scholar]
  35. Tan, H. Reinforcement learning with deep deterministic policy gradient. In Proceedings of the 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA); IEEE: Piscataway, NJ, USA, 2021; pp. 82–85. [Google Scholar]
  36. Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar] [CrossRef]
  37. Wu, J.; Wu, Q.J.; Chen, S.; Pourpanah, F.; Huang, D. A-TD3: An adaptive asynchronous twin delayed deep deterministic for continuous action spaces. IEEE Access 2022, 10, 128077–128089. [Google Scholar] [CrossRef]
Figure 1. EA-TD3 methodological framework.
Figure 1. EA-TD3 methodological framework.
Drones 10 00325 g001
Figure 2. EHang EH216-S model.
Figure 2. EHang EH216-S model.
Drones 10 00325 g002
Figure 3. Simplified task profile.
Figure 3. Simplified task profile.
Drones 10 00325 g003
Figure 4. VAH01 Tenth Cycle: Temporal evolution of voltage, current, and power during climb, cruise, and descent phases.
Figure 4. VAH01 Tenth Cycle: Temporal evolution of voltage, current, and power during climb, cruise, and descent phases.
Drones 10 00325 g004
Figure 5. City modeling display. (a) City scene modeling diagram. (b) 2D modeling search direction. (c) 3D modeling search direction.
Figure 5. City modeling display. (a) City scene modeling diagram. (b) 2D modeling search direction. (c) 3D modeling search direction.
Drones 10 00325 g005
Figure 6. MDP methodological framework.
Figure 6. MDP methodological framework.
Drones 10 00325 g006
Figure 7. The logic of the EA-TD3 algorithm.
Figure 7. The logic of the EA-TD3 algorithm.
Drones 10 00325 g007
Figure 8. eVTOL UAV flight route.
Figure 8. eVTOL UAV flight route.
Drones 10 00325 g008
Figure 9. Hyperparameter evaluation results. (a) Learning rate evaluation results. (b) Batch size evaluation results. (c) Network architecture evaluation results. (d) Target noise evaluation results.
Figure 9. Hyperparameter evaluation results. (a) Learning rate evaluation results. (b) Batch size evaluation results. (c) Network architecture evaluation results. (d) Target noise evaluation results.
Drones 10 00325 g009
Figure 10. Training process results across different algorithms: (a) Success rate; (b) Average reward; (c) Average Steps; (d) Energy consumption.
Figure 10. Training process results across different algorithms: (a) Success rate; (b) Average reward; (c) Average Steps; (d) Energy consumption.
Drones 10 00325 g010
Figure 11. Comparison of path planning trajectory examples for four different algorithms.
Figure 11. Comparison of path planning trajectory examples for four different algorithms.
Drones 10 00325 g011
Figure 12. Comparison of 3D trajectories and flight profiles for the four evaluated algorithms: (a) 3D trajectory visualization; (b) horizontal top view; (c) vertical side view; (d) multi-phase flight profile.
Figure 12. Comparison of 3D trajectories and flight profiles for the four evaluated algorithms: (a) 3D trajectory visualization; (b) horizontal top view; (c) vertical side view; (d) multi-phase flight profile.
Drones 10 00325 g012
Figure 13. A comparison of the energy used to average the trajectory profiles of the four algorithms.
Figure 13. A comparison of the energy used to average the trajectory profiles of the four algorithms.
Drones 10 00325 g013
Figure 14. Comparison of trajectory planning results for the four algorithms under a 120 J energy limit.
Figure 14. Comparison of trajectory planning results for the four algorithms under a 120 J energy limit.
Drones 10 00325 g014
Figure 15. Training process. (a) Energy expenditure displayed in normalized training progress. (b) Distribution differences in energy expenditure during training.
Figure 15. Training process. (a) Energy expenditure displayed in normalized training progress. (b) Distribution differences in energy expenditure during training.
Drones 10 00325 g015
Figure 16. Advantages of EA-TD3 at different environmental levels. (a) Success rate advantage of EA-TD3. (b) Differences in energy consumption.
Figure 16. Advantages of EA-TD3 at different environmental levels. (a) Success rate advantage of EA-TD3. (b) Differences in energy consumption.
Drones 10 00325 g016
Table 1. Nomenclature of symbols and variables used in this study.
Table 1. Nomenclature of symbols and variables used in this study.
SymbolDescriptionUnit/Remark
Energy Consumption Modeling based on CMU Data
U ( t ) , I ( t ) Terminal voltage and current derived from eVTOL UAV battery profilesV, A
S o C t State of charge (SoC) reflecting the remaining energy at time t%
C r a t e d Total rated capacity of the lithium ion battery systemAh
κ ( a t ) energy intensity factor defined as energy cost per unit ground displacementJ/m
η ( θ r e l ) wind angle correction factor based on relative wind direction
Environmental Formulation and Wind Field
W Three dimensional Euclidean workspace based on urban digital twinm
O i Building obstacle identified in urban simulation
P t Instantaneous spatial coordinates ( x t , y t , z t ) of the eVTOL UAVm
d s a f e Safety buffer margin considering eVTOL UAV dimensions and rotor clearancem
v ( z ) Horizontal wind speed from atmospheric boundary layer (ABL) modelm/s
α Wind shear exponent determining the vertical wind profile
θ r e l Relative angle between the eVTOL UAV heading and wind vectorrad
Reinforcement Learning and EA-TD3 Algorithm
s t , a t State vector and continuous action vector including velocity and angular commands
d t a r g e t Normalized Euclidean distance to the target waypoint
Δ θ , Δ ϕ Horizontal and vertical angular deviations relative to the target vectorrad
d o b s Minimum distance to obstacle boundaries within the trajectory planning rangem
r t Composite reward function balancing safety and energy and efficiency
ω g , c , d , e Weighting coefficients for goal reaching and collision and distance and energy
r e n e r g y Energy-aware penalty term sensitive to ground displacement and wind
Q θ 1 , 2 , π ϕ Parameters of the twin critics and energy-aware actor networks
τ Soft update coefficient for target network parameters
Table 2. Basic parameters of the EHang EH216-S.
Table 2. Basic parameters of the EHang EH216-S.
Parameter NameValuePhysical Constraint Role
Fuselage Height1.93 mCollision detection boundary
Fuselage Width5.73 mMinimum passage clearance
Max Takeoff Weight620 kgInitial dynamic mass
Maximum Range30 kmMission planning radius
Max Design Speed130 km/hAction space velocity upper bound
Table 3. Experimental conditions and corresponding CMU subdatasets [29].
Table 3. Experimental conditions and corresponding CMU subdatasets [29].
Experimental ConditionsCharge/Discharge SettingsSubdatasets
Baseline1 C, 4.2 VVAH01, VAH17, VAH27
Constant Current Charge0.5 C, 4.2 VVAH06, VAH24
1.5 C, 4.2 VVAH16, VAH20
Constant Voltage Charge1 C, 4.0 V and 4.1 VVAH07, VAH23
Extended Cruise1000 s, 1 C, 4.2 VVAH02, VAH15, VAH22
Short Cruise Length400 s, 1 C, 4.2 VVAH12
600 s, 1 C, 4.2 VVAH13, VAH26
Thermal Chamber Temperature20 °C (1 C, 4.2 V)VAH09, VAH25
30 °C (1 C, 4.2 V)VAH10
35 °C (1 C, 4.2 V)VAH30
Power Reduction during Discharge10% reduction (1 C, 4.2 V)VAH05, VAH28
20% reduction (1 C, 4.2 V)VAH11
Note: All subdatasets are sourced from the CMU battery degradation database.
Table 4. Key physical parameters and operational constraints of the EHang EH216-S.
Table 4. Key physical parameters and operational constraints of the EHang EH216-S.
ParameterValueParameterValue
Max aircraft dimension W a 5.73 mMax flight height H max 120 m
Temperature T20 °CMin flight height H min 5 m
Battery efficiency η 85%Objective weight α 1 0.4
Max steering angle θ max π / 2 Objective weight α 2 0.6
Max climb angle β max π / 2
Table 5. Data samples and structural attributes of the VAH01 baseline mission dataset.
Table 5. Data samples and structural attributes of the VAH01 baseline mission dataset.
TimeVoltageCurrentE-chgC-chgE-disC-disTemp.Cycle
(s)(V)(mA)(Wh)(mAh)(Wh)(mAh)(25.12 180 °C)Index
0.004.19513,2450.0000.0000.0000.00025.121
1.004.08213,2500.0000.0000.0153.68025.151
2.004.07513,2480.0000.0000.0307.36125.181
3.004.07113,2520.0000.0000.04511.04225.211
4.004.06813,2500.0000.0000.06014.72325.251
Note E-chg and C-chg denote energy and capacity during charging while E-dis and C-dis denote the same during discharge phases, which are inputs for energy intensity κ modeling.
Table 6. Extracted and normalized energy intensity factors κ for different battery datasets and flight phases.
Table 6. Extracted and normalized energy intensity factors κ for different battery datasets and flight phases.
Battery ID κ climb κ cruise κ descent
VAH011.0590.0342.118
VAH171.0000.0201.993
VAH270.9950.0201.986
Average1.0180.0252.032
Table 7. Parameters of building obstacles and waypoints in the simulation workspace.
Table 7. Parameters of building obstacles and waypoints in the simulation workspace.
Object IDPosition Range (m)Dimensions (m)Height (m)
O 1 [ 2 , 2 , 0 ] [ 5 , 5 , 12 ] 3 × 3 × 12 12
O 2 [ 12 , 2 , 0 ] [ 15 , 6 , 8 ] 3 × 4 × 8 8
O 3 [ 2 , 13 , 0 ] [ 6 , 17 , 7 ] 4 × 4 × 7 7
O 4 [ 8 , 8 , 0 ] [ 11 , 11 , 15 ] 3 × 3 × 15 15
O 5 [ 14 , 14 , 0 ] [ 18 , 18 , 5 ] 4 × 4 × 5 5
O 6 [ 10 , 1 , 0 ] [ 11 , 3 , 10 ] 1 × 2 × 10 10
W 1 ( 6.0 , 6.0 , 12.0 ) 12
W 2 ( 12.0 , 12.0 , 12.0 ) 12
Table 8. System mission parameters for the eVTOL UAV trajectory planning task.
Table 8. System mission parameters for the eVTOL UAV trajectory planning task.
ParameterSymbolValue
Start Position P 0 ( 0 , 0 , 0 ) m
Target Position P goal ( 18 , 18 , 0 ) m
Success Threshold ε 1.0 m
Maximum Episode Steps T max 200
Initial Energy Budget E 0 100.0∼200.0
Environment Scale Factor λ L 1/6
Table 9. Parameters of the coupled wind field and atmospheric boundary layer model.
Table 9. Parameters of the coupled wind field and atmospheric boundary layer model.
SymbolDescriptionValue
Δ grid Spatial resolution of the wind field grid1.0 m
sAmplitude scaling factor for base wind0.5
σ Baseline stochastic noise magnitude0.2 m/s
δ Wind profile power law coefficient for high density urban terrain0.35
z r e f Reference altitude for noise coupling20.0 m
α Wind sensitivity coefficient for energy modulation0.5
Observation limitVelocity truncation per component [ 2 , 2 ] m/s
Table 10. Finalized algorithmic hyperparameters for EA-TD3 training.
Table 10. Finalized algorithmic hyperparameters for EA-TD3 training.
HyperparameterValue
Learning Rate α l r 1 × 10 4
Batch Size256
Network Architecture 512 × 256 × 128
Target Policy Noise σ t p 0.1
Discount Factor γ 0.99
Buffer Capacity 1 × 10 6
Exploration Noise σ e x p 0.1
Table 11. Parameter configurations for the EA-TD3 model and simulation environment.
Table 11. Parameter configurations for the EA-TD3 model and simulation environment.
CategoryParameter NameSymbolValue
TD3 HyperparametersLearning Rate α l r 1 × 10 4
Batch SizeB256
Discount Factor γ 0.99
Soft Update Rate τ 0.005
Replay Buffer Size N b u f f 1 × 10 6
Energy ParametersClimb Cost Intensity κ climb 1.018
Cruise Cost Intensity κ cruise 0.025
Descent Cost Intensity κ descent 2.032
Reward WeightsGoal Reward Weight ω g 100.0
Collision Penalty Weight ω c 100.0
Distance Penalty Weight ω d 1.0
Energy Penalty Weight ω e 0.5
Training SettingsTotal Training Steps-200,000
Evaluation Frequency-5000
Evaluation Episodes-10
Table 12. Hyperparameter configurations for the comparative study of trajectory planning algorithms.
Table 12. Hyperparameter configurations for the comparative study of trajectory planning algorithms.
HyperparameterDDPGTD3SACEA-TD3
Learning rate α l r 1 × 10 4 1 × 10 4 3 × 10 4 1 × 10 4
Total training steps 6 × 10 4 6 × 10 4 6 × 10 4 6 × 10 4
Batch size B256256256256
Update rate τ 0.0050.0050.0050.005
Discount factor γ 0.990.990.990.99
Network architecture 400 × 300 400 × 300 400 × 300 512 × 256 × 128
Policy noise σ e x p 0.20.10.1
Delayed update d22
Target policy noise σ t p 0.20.1
Table 13. Performance comparison of trajectory planning algorithms under Analysis 1.
Table 13. Performance comparison of trajectory planning algorithms under Analysis 1.
MetricDDPGTD3SACEA-TD3Best
Average Steps34.5026.9429.5227.48TD3
Energy (J)46.6139.2942.0138.40EA-TD3
Path Length (m)39.6737.1035.7534.81EA-TD3
Turn Count16.049.747.4011.22SAC
Planning Time (s)0.030.020.030.02TD3/EA-TD3
Climb Count9.9210.029.948.82EA-TD3
Descent Count8.348.669.087.72EA-TD3
Cruise Count16.248.2610.5010.94TD3
Avg Altitude (m)6.476.416.196.24DDPG
Max Altitude (m)9.4910.239.979.25TD3
Heading (°)1234.63443.89433.06496.98SAC
Smoothness (°)1321.28497.15486.78512.04SAC
Table 14. Success rates of trajectory planning algorithms under different energy constraints in Analysis 2.
Table 14. Success rates of trajectory planning algorithms under different energy constraints in Analysis 2.
Energy ConstraintDDPGSACTD3EA-TD3
200 J85.70%93.90%96.95%100.00%
180 J71.40%83.70%90.93%98.15%
160 J63.30%75.50%85.90%96.30%
140 J61.20%65.30%78.95%92.60%
120 J0.00%6.10%26.50%87.80%
Table 15. Environmental configuration across ablation study levels.
Table 15. Environmental configuration across ablation study levels.
LevelWind FieldEnergy ModelObstaclesWaypointsComplexity
Level 1ConstantUniformSimple (3)SingleBaseline
Level 2ConstantRealisticSimple (3)SingleLow
Level 3DynamicRealisticSimple (3)SingleMedium
Level 4DynamicRealisticComplex (6)MultipleHigh
Table 16. Comparison of eVTOL UAV flight performance metrics across different levels.
Table 16. Comparison of eVTOL UAV flight performance metrics across different levels.
LevelAlgorithmSuccess RateAvg. StepsAvg. Energy (J)Avg. Path (m)
Level 1DDPG100%19.336.4147.12
SAC100%19.032.6846.23
TD3100%19.040.07849.56
EA-TD3100%20.036.4745.58
Level 2DDPG98%19.033.8345.39
SAC96%19.032.3445.49
TD396%20.033.8645.14
EA-TD398%19.035.2945.55
Level 3DDPG82%19.1241.0448.07
SAC90%19.8836.8848.62
TD392%19.233.7347.71
EA-TD394%19.032.8845.79
Level 4DDPG62%34.546.6139.67
SAC72%29.5242.0135.75
TD390%26.9439.2937.10
EA-TD398%27.4838.4034.81
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, J.; Xie, J.; Zhang, L.; Wang, Z.; Li, X.; Zhao, Y. EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones 2026, 10, 325. https://doi.org/10.3390/drones10050325

AMA Style

Cai J, Xie J, Zhang L, Wang Z, Li X, Zhao Y. EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones. 2026; 10(5):325. https://doi.org/10.3390/drones10050325

Chicago/Turabian Style

Cai, Jinxu, Juanzhang Xie, Lanxin Zhang, Ziyi Wang, Xueshun Li, and Yongjun Zhao. 2026. "EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft" Drones 10, no. 5: 325. https://doi.org/10.3390/drones10050325

APA Style

Cai, J., Xie, J., Zhang, L., Wang, Z., Li, X., & Zhao, Y. (2026). EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones, 10(5), 325. https://doi.org/10.3390/drones10050325

Article Metrics

Back to TopTop