EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft

Cai, Jinxu; Xie, Juanzhang; Zhang, Lanxin; Wang, Ziyi; Li, Xueshun; Zhao, Yongjun

doi:10.3390/drones10050325

Open AccessArticle

EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft

by

Jinxu Cai

,

Juanzhang Xie

,

Lanxin Zhang

,

Ziyi Wang

,

Xueshun Li

and

Yongjun Zhao

^*

Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(5), 325; https://doi.org/10.3390/drones10050325

Submission received: 27 February 2026 / Revised: 18 April 2026 / Accepted: 22 April 2026 / Published: 26 April 2026

(This article belongs to the Special Issue Intelligent Control and Optimization of Electric Vertical Take-Off and Landing Unmanned Aerial Vehicles)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We developed the EA-TD3 autonomous trajectory planning framework for eVTOL UAV platforms which integrates a stochastic urban wind field model with an energy consumption model derived from battery data. By establishing flight energy boundaries, this framework mitigates safety hazards in autonomous trajectory planning arising from environmental interference and energy constraints.
The EA-TD3 framework enables eVTOL UAV platforms to reduce energy consumption by 11.6% in wind conditions and maintain an 87.8% mission success rate even under energy constraints. Regarding energy efficiency and operational robustness, this method outperforms baseline algorithms, including DDPG, SAC, and standard TD3.

What are the implications of the main findings?

By embedding environmental and battery energy perception into the autonomous trajectory planning framework, the process evolves from geometric homing to a physics-aware trajectory optimization paradigm. This enhancement improves the energy efficiency and trajectory reliability of autonomous eVTOL UAV operations.
Through simulations utilizing the physical parameters of the EH216-S platform, this study demonstrates that energy consumption boundaries provide a safety mechanism to alleviate range anxiety. This offers a technical reference for ensuring the operational reliability of unmanned systems within UAM networks.

Abstract

Autonomous trajectory planning for electric Vertical Takeoff and Landing (eVTOL) Unmanned Aerial Vehicles (UAVs) faces the dual challenges of low-altitude environmental interference and limited onboard energy, which affects the reliability and safety of unmanned missions. To address these challenges, this paper develops the EA-TD3 autonomous trajectory planning framework for eVTOL UAV systems. First, a stochastic urban wind field model is established to simulate low-altitude interference. Then, by integrating eVTOL UAV battery discharge data from Carnegie Mellon University (CMU), a mapping relationship between maneuvers and energy consumption is identified to construct a nonlinear energy consumption model. Finally, an energy boundary penalty function is introduced into the TD3 algorithm to ensure that trajectory planning remains within battery safety margins. Experiments based on the parameters of the EH216-S platform show that EA-TD3 achieves a near 100.00% success rate under ideal conditions and outperforms benchmark algorithms while reducing average energy consumption by 11.6%. Under an energy constraint of 120 J, its success rate remains at 87.80%, which exceeds the performance of the DDPG, SAC, and standard TD3 algorithms. This study optimizes the autonomous trajectory planning of eVTOL UAV platforms in urban air mobility (UAM) to improve the energy perception and power management of the autonomous system.

Keywords:

eVTOL UAV; trajectory planning; energy perception; TD3; UAM

1. Introduction

As an emerging strategic field, the low-altitude economy is driving the rapid development of urban air mobility (UAM), with electric Vertical Takeoff and Landing (eVTOL) aircraft serving as its core technological carrier. Thanks to their unique flexibility and vertical takeoff and landing capabilities and zero-emission characteristics, eVTOL systems are utilized in last mile logistics [1,2] and medical emergency services and urban patrols. While the eVTOL field encompasses both manned and unmanned platforms, the development of autonomous trajectory planning technologies has made electric Vertical Takeoff and Landing Unmanned Aerial Vehicle (eVTOL UAV) systems the mainstream trend for future urban transportation. In the following sections, autonomous trajectory planning is collectively referred to as trajectory planning. Unlike traditional aviation where pilots adjust flight strategies based on real-time sensory feedback, eVTOL UAV systems lack human in the loop intervention in extreme situations. This necessitates that trajectory planning systems automatically adapt to complex physical constraints. For unmanned platforms, energy-aware mechanisms are fundamental safety requirements ensuring successful trajectory planning. Therefore, the performance of trajectory planning has become a key technology for flight safety. However the limited onboard energy storage of eVTOL UAV remains a key constraint because energy states directly impact planning performance. Consequently, reducing energy-related risks during trajectory planning is a prerequisite for the deployment of eVTOL UAV systems. In this regard, this research focuses on eVTOL UAV and their energy-aware trajectory planning technologies.

In the absence of a human pilot, eVTOL UAV systems must independently plan flight trajectories prior to takeoff by accounting for the destination and the complex urban environment and real-time energy status to ensure autonomous trajectory planning. In high-density metropolitan environments, three primary factors contribute to energy-related risks during trajectory planning [3] including stochastic low-altitude wind fields and maneuver-dependent power consumption and the inherent energy constraints of onboard batteries. Specifically, random wind fields in urban low-altitude airspace induce severe aerodynamic interference which forces the eVTOL UAV to frequently perform trajectory adjustments and triggers unpredictable power surges. Simultaneously, the energy expenditure of these unmanned platforms exhibits significant power variations across different flight phases such as vertical takeoff and landing and cruise while rendering traditional linear energy estimation models inadequate. Furthermore, the intrinsic discharge characteristics of onboard batteries impose strict safety boundaries on trajectory planning endurance [4]. Failure to synergistically optimize these three dimensions not only compromises planning performance but also poses substantial risks of energy depletion and catastrophic system failure within complex urban airspace. Recently, research has been conducted across these three dimensions to enhance the reliability of trajectory planning systems.

Wind serves as a primary environmental factor which directly influences energy consumption during urban flight and renders the trajectory planning of eVTOL UAV systems significantly more sensitive to energy utilization during the transition to unmanned operations. Existing research on the impact of urban wind fields on trajectory planning primarily focuses on leveraging wind characteristics for energy conservation or risk mitigation. In terms of impact analysis, Milcsik et al. [5] utilized Monte Carlo simulations to evaluate how urban wind fields influence trajectory planning. Baskar et al. [6] released an open source computational fluid dynamics (CFD) wind field dataset for authentic urban environments and demonstrated that optimal trajectories for fixed wing UAVs deviate significantly from traditional shortest path strategies when wind effects are integrated. For flight safety, Jiang et al. [7] developed a method to identify urban no fly zones through CFD simulations while Chan et al. [8] proposed a deep learning-based trajectory planning model to ensure safety. Regarding energy efficiency, Frey et al. [9] combined CFD simulations with nonlinear energy-aware strategies to optimize trajectory planning and Rienecker et al. [10] employed Large Eddy Simulation (LES) to reduce consumption by strategically utilizing airflows. However while these strategies enhance efficiency for general UAVs, they often overlook the unique aerodynamic sensitivities and unpredictable energy expenditure that stochastic dynamic wind fields impose on eVTOL UAV trajectory planning [11]. A more robust energy-aware trajectory planning framework is required to address these challenges.

In energy modeling of eVTOL UAV systems, existing research primarily employs physical modeling methods which utilize prior knowledge to mathematically analyze energy consumption and establish dynamic equations to predict energy expenditure under different maneuvers. For example, to assess the feasibility of trajectory planning, Marzougui et al. [12] implemented a rule-based strategy for a hydrogen fuel cell battery hybrid eVTOL UAV to ensure power allocation in search and rescue missions. Similarly, to integrate energy constraints into trajectory planning, Senkans et al. [13] proposed a first principles model that calculates the power required for eVTOL UAV to operate under specified conditions. To assess the correlation between random maneuvers and energy consumption, Jiao et al. [14] introduced a dynamic model-based evaluation method that couples aircraft mass and instantaneous velocity and kinetic energy loss. While physical energy modeling is interpretable, its limitations in trajectory planning are significant. These models often rely on oversimplified assumptions and neglect nonlinear effects of flight speed and vertical motion, which reduces accuracy. Furthermore, static equations fail to capture complex power system interactions and dynamic equipment degradation [15]. In urban low-altitude environments, the accelerating wear of batteries and motors widens the gap between theoretical models and actual performance [16]. For eVTOL UAV systems, this accumulated error creates energy depletion risks [17], which necessitates dynamic correction using flight data for robust trajectory planning.

In the domain of trajectory planning for eVTOL UAV systems, classical algorithms such as A* and rapidly exploring random tree (RRT) have laid the foundation for collision-free trajectory generation [18,19]. However, these methods often treat the aircraft as an idealized point mass and primarily focus on geographic distance or computational time while neglecting trajectory planning constraints. As previously discussed, the trajectory planning performance of eVTOL UAV systems in urban environments is challenged by the coupling of aerodynamic interference and onboard energy availability. Traditional models fail to effectively incorporate stochastic wind field effects and nonlinear energy-aware issues into the planning loop. Without integrating the physical state and energy status of the eVTOL UAV into the decision-making process, a path that is geometrically optimal may lead to risks such as energy depletion before reaching the destination. The connection between theoretical path planning and energy-aware trajectory planning is essential for ensuring flight safety.

In response to the requirements for trajectory planning of eVTOL UAV systems to interact with urban wind fields and maintain real-time energy awareness, deep reinforcement learning (DRL) offers a robust solution. DRL enables agents to autonomously establish mappings between environmental features and action strategies and provides a new paradigm for addressing complex trajectory planning tasks [20]. Regarding risk identification, Primatesta et al. [21] utilized reinforcement learning to identify risk aware optimal paths and demonstrated the potential of this learning mechanism for handling urban complexity. Gao et al. [22] introduced the concept of virtual risk terrain to plan eVTOL UAV trajectories safely through high-risk environments. To address energy constraints, Fu et al. [23] proposed the BiLG D3QN algorithm to optimize trajectory planning under payload-related energy constraints. However, traditional DRL algorithms are prone to overestimating Q values, which leads to performance instability in complex continuous action spaces. In contrast, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm introduces dual critic networks to bound estimation bias and achieves superior convergence stability. This results in smoother and safer maneuver amplitudes for eVTOL UAV systems. Building on this, Chen et al. [24] proposed the TD3 RRT algorithm to enhance planning success and path smoothness. For 3D energy-saving challenges, Lv et al. [25] developed a dynamic energy-efficient trajectory planning method using TD3 while Xie et al. [26] integrated a fluid dynamics-based reward mechanism into a hybrid TD3 algorithm to penalize inefficient maneuvers. Despite the potential of DRL, existing energy-aware strategies for eVTOL UAV systems still largely rely on simplified constant wind fields and theoretical physical models. This reliance poses operational risks for energy critical trajectory planning [27].

To enhance the reliability and safety of trajectory planning, this paper proposes an Energy-Aware Twin Delayed Deep Deterministic Policy Gradient (EA-TD3) framework for eVTOL UAV systems. First, a dynamic turbulence wind field model is integrated into the framework, which characterizes wind disturbances through a combination of multi-frequency sine waves and stochastic noise. Second, to address the discrepancy between theoretical physical models and real-world energy consumption, a data-driven energy expenditure model is constructed by utilizing the Carnegie Mellon University (CMU) eVTOL UAV battery dataset. This model captures the specific power demands of various flight maneuvers. Finally, to mitigate the safety risks associated with limited onboard energy, an energy-aware penalty function is incorporated into the TD3 algorithm to ensure that the trajectory planning remains within permissible energy boundaries. Upon establishing the EA-TD3 framework, 3D simulations were conducted to validate its effectiveness and robustness. The experimental results demonstrate that the proposed framework addresses the limitations of traditional eVTOL UAV trajectory planning. The primary contributions of this work are as follows.

The main contributions of this paper are summarized as follows:

Establishment of an adaptive trajectory planning mechanism: We propose a framework for eVTOL UAV systems operating under stochastic wind fields. By integrating a stochastic urban low-altitude wind model, the framework enables the eVTOL UAV to learn from and interact with authentic wind environments effectively.
Construction of a data-driven energy-aware model: Leveraging the Carnegie Mellon University (CMU) battery dataset, a data-driven energy model was developed. Compared to traditional theoretical models, this approach significantly enhances the physical authenticity of the trajectory planning environment.
Refinement of a survival-aware and energy-aware strategy: By incorporating a survival-aware reward function into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, we designed a trajectory planning strategy with energy boundary constraints to ensure mission completion and system safety.

The overall architecture of the proposed EA-TD3 trajectory planning framework for eVTOL UAV systems is illustrated in Figure 1, which provides a detailed representation of the system inputs and outputs and the methodology employed. The core objective of this framework is to address the safety and reliability challenges of trajectory planning arising from stochastic wind field interference and nonlinear energy consumption characteristics in urban low-altitude environments. By integrating a data-driven energy model with a robust reinforcement learning agent, the framework ensures that the eVTOL UAV can independently make planning decisions while maintaining energy awareness.

2. Method

This section provides a detailed introduction to the proposed energy-aware trajectory planning framework for eVTOL UAV systems, which is established on real data-driven principles. The framework focuses on enhancing the decision-making intelligence of eVTOL UAV systems by integrating authentic flight data into the planning loop. To facilitate an understanding of the following derivation, the mathematical symbols and their corresponding physical meanings used in this study are detailed in Table 1.

2.1. eVTOL Related Instructions

2.1.1. Determine the Research Subjects

To enhance the physical fidelity of the simulation, the EHang EH216-S eVTOL UAV platform is selected as the primary research object of this study, which is sourced from EHang Holdings Limited, Guangzhou, China. The EH216-S is a multi-rotor eVTOL UAV and represents a milestone as a platform to obtain both type certificate (TC) and airworthiness certificate (AC). Its design philosophy emphasizes a pilotless and full redundancy architecture which ensures that components such as propellers and motors and trajectory planning systems and batteries have backups to maintain safe operation in the event of a component failure. The platform is illustrated in Figure 2. This structural complexity and autonomous dependency makes it a candidate for evaluating energy-aware trajectory planning in urban environments.

In this framework, while the energy expenditure is modeled using a data-driven approach based on the Carnegie Mellon University (CMU) dataset, physical and kinematic constraints are derived from the parameters of the EH216-S eVTOL UAV to define the operational boundaries of the trajectory planning system. These parameters ensure that the action space explored by the EA-TD3 algorithm remains within the eVTOL UAV flight envelope. By integrating these constraints, the reinforcement learning agent learns to execute maneuvers that are both energetically feasible and structurally safe for trajectory planning. Basic parameters of the platform are shown in Table 2.

2.1.2. Trajectory Profile Definition

In a typical urban air mobility (UAM) scenario, a complete eVTOL UAV flight generally encompasses five distinct phases which include takeoff and climb and cruise and descent and landing. Existing research indicates that at altitudes exceeding 120 m, the density of urban structures decreases. In the absence of dynamic obstacles, eVTOL UAV systems often achieve linear flight which renders trajectory planning less critical in such high-altitude airspace.

The scope of this research is focused on the low-altitude environment below 120 m. Based on this perspective, the trajectory profile in this study is simplified to include the inclined climb and low-altitude cruise and constrained descent phases while omitting the vertical takeoff and landing segments at extremely low altitudes. Throughout these stages, the eVTOL UAV must execute trajectory planning through dense building clusters for obstacle avoidance while managing power surges induced by stochastic wind fields. This simplified profile as illustrated in Figure 3 allows for a concentrated investigation of the trajectory planning performance and energy resilience of eVTOL UAV systems in urban canyons.

2.1.3. Flight Constraints

In the study of eVTOL UAV trajectory planning for urban environments, identifying and modeling constraints is essential to ensure the feasibility and stability of the trajectory planning system. These constraints reflect the physical and safety and kinematic limitations that eVTOL UAV systems must adhere to during operation. This ensures that the generated trajectory is both robust and feasible. To establish a foundation for the EA-TD3 algorithm, this paper constructs the following constraints for eVTOL UAV systems operating in urban environments.

The kinematic and operational constraints of the eVTOL UAV are defined as follows:

Flight Altitude Constraints: The flight altitude of the eVTOL UAV must be regulated to avoid collisions with urban infrastructure and comply with low-altitude traffic management rules. The altitude constraint is defined as

$H_{\min} \leq h_{i} \leq H_{\max}$

(1)

where $h_{i}$ represents the instantaneous altitude at step i while $H_{\min}$ and $H_{\max}$ denote the minimum and maximum permissible flight altitudes respectively.
Climb Angle Constraints: To maintain aerodynamic stability and respect propulsion system performance limits, the climb angle $θ$ must remain within a predefined operational range

$θ_{i} = \arctan (\frac{| z_{i} - z_{i - 1} |}{\sqrt{{(x_{i} - x_{i - 1})}^{2} + {(y_{i} - y_{i - 1})}^{2}}}) \leq θ_{\max}$

(2)
Turning Angle Constraints: To ensure trajectory smoothness in the horizontal plane, the turning angle $ψ$ is constrained. Unlike the vertical climb angle defined in Equation (2), the turning angle $ψ$ represents the change in orientation between two consecutive flight segments in the $x y$ plane

$ψ_{i} = \arccos (\frac{(x_{i} - x_{i - 1}) (x_{i + 1} - x_{i}) + (y_{i} - y_{i - 1}) (y_{i + 1} - y_{i})}{\sqrt{{(x_{i} - x_{i - 1})}^{2} + {(y_{i} - y_{i - 1})}^{2}} \cdot \sqrt{{(x_{i + 1} - x_{i})}^{2} + {(y_{i + 1} - y_{i})}^{2}}}) \leq ψ_{\max}$

(3)

where $(x, y, z)$ coordinates denote the eVTOL UAV spatial position. Equation (2) constrains the vertical slope relative to the ground while Equation (3) limits the lateral maneuverability by calculating the inner product of sequential horizontal velocity vectors.

2.2. Energy Consumption Modeling

This paper references technical parameters of the EH216-S to construct a flight environment with realistic physical constraints. The performance parameters of the EH216-S are primarily used to define the aircraft kinematic envelope and spatial maneuverability constraints. For energy consumption modeling, this paper does not limit itself to proprietary battery parameters. Due to the commercial confidentiality of the EH216-S, its specific battery performance data and electrochemical parameters remain undisclosed. Instead, this study utilizes a publicly available power battery dataset released by Carnegie Mellon University (CMU). The core logic behind this choice is to address the lack of open-source industrial battery data while improving the versatility of the trajectory planning algorithm and the reproducibility of scientific research.

First, the CMU dataset is widely regarded as a source for eVTOL UAV energy intensity research. During the experimental design phase, this dataset simulates the energy and power characteristics of typical eVTOL UAV flight through combinations of various discharge rates and temperature conditions. This operational design covers the entire process from high instantaneous loads during vertical takeoff to steady state loads during level flight and its physical essence lies in characterizing the nonlinear loss patterns during flight. Although there is a significant difference in mass between the experimental battery cells at CMU and full-scale eVTOL UAV systems, this paper does not attempt to force a fit on absolute power values. Instead, it utilizes the CMU dataset to construct a general energy intensity proxy model. Therefore, the use of this dataset makes the trajectory planning algorithm universal across platforms. Its core value lies in characterizing the relative energy intensity trend of the battery under typical eVTOL UAV discharge profiles and can serve as a general energy intensity proxy model for various multi-rotor flight platforms. Second, this paper sets physical constraints based on the kinematic dimensions of the EH216-S. This paper references the basic information of the EH216-S model and sets the motion envelope constraints for the aircraft. These constraints define the feasible region of the reinforcement learning agent in the search space to ensure that the generated trajectory meets the airworthiness and safety standards of the EH216-S model in engineering practice.

Under this logic, the energy intensity perceived by the algorithm reflects a characteristic trend, namely the fluctuation pattern of battery efficiency loss under specific maneuvers. This paper aims to verify the adaptive learning ability of the proposed trajectory planning algorithm to energy intensity patterns in environments. This method, which combines a general energy consumption change mechanism with specific aircraft movement constraints, can effectively avoid the problem of decreased algorithm generalization caused by parameter overfitting.

2.2.1. Experimental Data Background

The dataset used in this paper was collected by Carnegie Mellon University (CMU) and published by Alexander Bills et al. [28]. It encompasses over 15 million records including charge–discharge cycles and diverse operating conditions. For eVTOL UAV applications, the experiments utilized Sony Murata 18650 VTC 6 cylindrical batteries (sourced from Murata Manufacturing Co., Ltd., Kyoto, Japan) with 3000 mAh capacity and 3.6 V nominal voltage and 230 Wh/kg specific energy. During testing, batteries were maintained at 25 °C in a temperature-controlled chamber and cycled using a modular cycling device. The experimental conditions and corresponding CMU subdatasets are detailed in Table 3.

Core parameters recorded in the dataset include battery voltage (U), current (I), surface temperature (T), cycle count and cycle segment recorded by the tester (

N_{s}

), time (t), energy balance during charge and discharge phases (

E_{Charge}

and

E_{Discharge}

), and the amount of charge extracted from the cell (

Q_{Charge}

and

Q_{Discharge}

). The task parameter range spans 400 to 1000 s. The dataset comprises 22 batteries which simulate eVTOL UAV trajectory profiles through varying cycle counts. As shown in Table 3, the collected dataset is segmented into subdatasets based on test conditions. The baseline condition subdataset follows a trajectory sequence consisting of a 75 s takeoff at 54 W and an 800 s cruise at 16 W and a 105 s landing at 54 W [28]. This dataset captures variations across critical parameters including temperature and cruise duration and aircraft power and charging current, which are encountered during eVTOL UAV flight.

2.2.2. Power Demand Analysis

This study aims to establish a trajectory planning framework for eVTOL UAV systems applicable to general scenarios where battery energy consumption characteristics are primarily determined by flight maneuvers. Because this study focuses on the planning decision-making mechanism within a single flight, the impact of battery aging on energy consumption is considered a quasi-static process which remains constant throughout a single flight. Therefore, this study neglects long-term factors such as battery degradation or extreme environmental conditions. Since the energy consumption portion of this study references the Carnegie Mellon University (CMU) dataset, the flight phase division is based on the parameters of that dataset. This paper decomposes the flight process into three maneuver phases, which include climb and cruise and descent. The climb phase encompasses the process from takeoff to the predetermined altitude while the descent phase encompasses the descent from cruise altitude to the ground. However, this study acknowledges the limitation of omitting the pure vertical takeoff and landing segments. In actual operations, these segments are subject to complex near-ground wind disturbances. Neglecting these factors may lead to an underestimation of the total mission energy and the influence of initial turbulent fluctuations on the trajectory planning process.

Based on the above assumptions, we selected three benchmark battery cycle samples from the CMU dataset including VAH01 and VAH17 and VAH27 to represent baseline energy consumption data. Figure 2 shows the trends of voltage and current and power as a function of flight time for VAH01 during the climb and cruise and descent phases of the tenth flight cycle. Analysis of the voltage U and current I and temperature T curves within a single cycle reveals significant nonlinear characteristics. For eVTOL UAV operation, both the climb and descent phases require high and stable power output. During the climb phase, the propulsion system needs to generate sufficient lift to overcome gravity, which results in high power output. During the descent phase, the rotor needs to maintain a high power output comparable to that during takeoff to generate balancing thrust against airflow disturbances.

This physical behavior is reflected as the voltage U decreases with increasing depth of discharge while the onboard management system compensates by increasing the current I. Despite the fluctuations in instantaneous electrochemical parameters, the measured power exhibited stability across all maneuver phases. In contrast, during the cruise phase, power demand decreases to the steady state range. These power distribution characteristics form the physical basis for the energy intensity assessment indices and the trajectory planning algorithm presented in this paper. This method, which combines a general energy consumption change mechanism with specific aircraft movement constraints, can effectively avoid the problem of decreased algorithm generalization caused by parameter overfitting.

Based on these physical observations, this study employs a stage-integrated averaging method to reduce the dimensionality of the battery data. We calculate the average power for each maneuver stage

s \in {climb, cruise, descent}

as follows

{\bar{P}}_{s} = \frac{1}{Δ T_{s}} \int_{0}^{Δ T_{s}} U (t) I (t) d t

(4)

where

Δ T_{s}

represents the duration of the respective phase. These integrated values are mapped to maneuver intensity factors

ξ_{s}

relative to the cruise phase

ξ_{s} = \frac{{\bar{P}}_{s}}{{\bar{P}}_{cruise}}

(5)

The temporal evolution of the underlying battery parameters and the resulting power distribution across these stages are illustrated in Figure 4. This approach preserves the power distribution observed in the datasets while filtering out transient noise that could hinder the convergence of the EA-TD3 reinforcement learning algorithm.

2.2.3. Modeling of Energy Intensity Factors

To facilitate energy assessment in trajectory planning, this study establishes a relationship between energy consumption and flight path length [30] where the proportionality coefficient is adjusted based on maneuvering actions. Consequently, energy expenditure is decoupled from flight duration and linked to spatial displacement to reflect how trajectory selection impacts eVTOL UAV energy reserves [31]. This spatial mapping approach enables the agent to evaluate the energy cost of path candidates during trajectory planning.

The evolution of the battery state of charge (

S o C \in [0, 1]

) is governed by the following state transition equation

S o C_{t + 1} = S o C_{t} - \frac{κ (a_{t}) \cdot Δ d}{E_{total}}

(6)

where

Δ d

denotes the spatial displacement magnitude within a single decision step and

E_{total}

represents the total rated energy capacity of the battery system and

κ (a_{t})

denotes the energy intensity factor measured in J/m associated with action

a_{t}

.

By mapping the power characteristics derived from the Carnegie Mellon University (CMU) dataset to the aircraft flight dynamics, the energy intensity factor

κ (a_{t})

is formulated as

κ (a_{t}) = \frac{{\bar{P}}_{s}}{v (a_{t})} = \{\begin{matrix} κ_{climb}, & a_{t} \in Climb \\ κ_{cruise}, & a_{t} \in Cruise \\ κ_{descent}, & a_{t} \in Descent \end{matrix}

(7)

In this formulation,

{\bar{P}}_{s}

is the average power for each maneuver stage s derived from Formula (4) and

v (a_{t})

is the corresponding average velocity. Since the climb phase demands power output at a lower ascent speed, the resulting

κ_{climb}

is higher than

κ_{cruise}

. By integrating this displacement-based cost into the state observation and reward function of the EA-TD3 algorithm, the agent learns to optimize trajectories by minimizing high-intensity maneuvers to ensure mission completion within battery operational boundaries.

2.3. Environmental Formulation

This section details the construction of the three dimensional simulation environment specifically designed for the eVTOL UAV path planning task. By integrating high fidelity urban digital twins with stochastic wind field models, this environment provides a rigorous platform for evaluating the trajectory planning intelligence of the unmanned platform in complex low-altitude airspace.

2.3.1. Continuous 3D Workspace and Obstacle Modeling

In this study, the simulation workspace is defined as a continuous three dimensional Euclidean space

W \subset R^{3}

with dimensions

L \times W \times H

. To ensure physical fidelity for the trajectory planning of the EH216-S, we employ an analytical cuboid model to represent the urban landscape. Each building

O_{i} \in O

is characterized by its center coordinates

(x_{i}, y_{i})

, half-length

L_{i}

, half-width

W_{i}

, and height

h_{i}

.

For the eVTOL UAV located at any continuous position

P_{t} = (x_{t}, y_{t}, z_{t})

, the collision detection function

C (P_{t})

is formulated as

C (P_{t}) = \{\begin{matrix} 1, & if \exists O_{i} : | x_{t} - x_{i} | < L_{i} + d_{safe} \land | y_{t} - y_{i} | < W_{i} + d_{safe} \land z_{t} \leq h_{i} + d_{safe} \\ 0, & otherwise \end{matrix}

(8)

where

d_{safe}

is the safety buffer derived from the aircraft physical dimensions. To facilitate the agent perception of the spatial distribution of these continuous obstacles, the workspace is digitized into a 3D occupancy grid as shown in Figure 5a. In environment modeling, the determination of search directions follows the geometric logic of neighborhood expansion. Within traditional 2D planar modeling, algorithms typically consider a 3 × 3 area centered on the current node. As shown in Figure 5b, inside this nine grid region, the algorithm can expand in 8 basic directions comprising 4 cardinal and 4 diagonal directions after excluding the center node. When the scenario extends to 3D spatial modeling, the search neighborhood evolves from a planar grid into a 3 × 3 × 3 cubic volume. Consequently, each node connects to the 8 adjacent points within the same plane and the 9 adjacent points in both the layers above and below. The calculation logic is that the three layers of grids contain 27 potential node positions. After subtracting the central node position, the remaining 26 positions form the 26 search directions illustrated in Figure 5c. This 26 direction search topology covers faces and edges and vertices in contact with the center node to evaluate connectivity within 3D environments. Through this approach, the EA-TD3 algorithm combines the efficiency of spatial discretization with the precision of continuous trajectory planning.

2.3.2. Wind Field Model and Stochastic Disturbance

The synthetic wind field in this study is defined on a discrete three dimensional grid consistent with the trajectory planning space where each grid point possesses a three dimensional wind velocity vector. To characterize the aerodynamic environment, the wind velocity at each grid cell is constructed through a superposition of double frequency sinusoidal terms and uniform stochastic noise. The phase for each wind velocity component is independently sampled during the environment initialization and remains constant throughout the life cycle of the environment instance. The baseline wind velocity is formulated using a combination of sine functions with varying frequencies to simulate the multi scale oscillations of urban airflow. By integrating these components, the model captures horizontal wind fluctuations and vertical wind components such as updraughts and downdraughts, which are used for evaluating the energy consumption during the climb and descent phases of the eVTOL UAV.

Building upon this multi-dimensional sinusoidal foundation, this study accounts for the atmospheric boundary layer (ABL) effects to reflect the dependence of wind intensity on altitude. The intensity of the synthetic wind field is modulated by a wind profile parameter according to the power law relationship

v (z) = v_{r e f} {(\frac{z}{z_{r e f}})}^{α}

(9)

where

v (z)

is the wind speed at altitude z and

v_{r e f}

is the reference wind speed at height

z_{r e f}

and

α

is the wind shear exponent determined by the surface roughness. Although this formulation integrates global vertical wind components, the current work does not explicitly model the interaction between the wind field and specific building topologies. This omission means that microscale phenomena including flow separations and wake effects that generate localized updraughts and downdraughts in the vicinity of obstacles are not considered. Such vertical winds may have an impact on energy consumption during the climb and descent phases, which constitutes a limitation of the present simulation environment. In future testing or research, computational fluid dynamics (CFD) methods could be utilized to achieve a simulation of wind conditions near structures or the research findings could be applied to enhance battery capacity and extend flight endurance.

Environmental robustness is critical for the operation of eVTOL UAV systems [32]. In this study, the integrated wind field model assigns wind velocity and direction vectors to each grid cell to facilitate the trajectory planning process. To maintain computational tractability while isolating the impact of directional trajectory planning, we focus on how the relative angle between the wind vector and the aircraft heading modulates the energy intensity. A dimensionless wind angle correction factor is introduced to adjust the baseline energy intensity derived from the Carnegie Mellon University (CMU) dataset. This trigonometric formulation ensures that energy demand is minimized during tailwind conditions due to assistive flow and maximized during headwind conditions to compensate for resistive drag. By treating wind magnitude as a normalized constant in the energy calculation, this approach compels the EA-TD3 agent to perceive the spatial wind field topology and prioritize trajectories with favorable wind angles.

2.4. Reinforcement Learning Algorithms

2.4.1. MDP Formulation

To address the trajectory planning problem under coupled energy and environmental constraints, the task is formulated as a Markov decision process (MDP) [33] defined by the quintuple

(S, A, P, R, γ)

. This MDP is extended to account for energy autonomy and directional aerodynamic disturbances.

The reinforcement learning components are defined as follows:

State Space ( $S$ ): The state vector $s_{t} \in S$ is designed to provide the agent with multi-modal perception. At each time step t, the observation is defined as

$s_{t} = [d_{t a r g e t}, Δ θ, Δ ϕ, d_{o b s}, S o C_{t}, θ_{w i n d}]$

(10)

where $d_{t a r g e t}$ is the normalized distance to the target $P_{g o a l}$ while $Δ θ$ and $Δ ϕ$ represent horizontal and vertical angular deviations. $d_{o b s}$ denotes the proximity to the nearest obstacle. We integrate the battery state of charge (SoC) $S o C_{t}$ and the local relative wind angle $θ_{w i n d}$ as core variables. This enables the agent to perceive its energy survival boundary and the spatial wind field topology.
Action Space ( $A$ ): The action space $A$ is continuous to ensure smooth trajectory control. The action vector $a_{t}$ controls the kinematic update

$a_{t} = {[v_{t}, θ_{t}, ϕ_{t}]}^{T}$

(11)

where $v_{t}$ is the velocity and $θ_{t}, ϕ_{t}$ are heading increments. These actions determine the maneuver-specific energy intensity $κ (a_{t})$ .
Energy-Aware Reward Function ( $R$ ): The reward function at time step t balances mission efficiency and safety and path optimality and energy consumption

$r_{t} = ω_{g} r_{g o a l} + ω_{c} r_{c o l l i s i o n} + ω_{d} r_{d i s t} + ω_{e} r_{e n e r g y}$

(12)

where $ω_{g}$ and $ω_{c}$ and $ω_{d}$ and $ω_{e}$ are weighting coefficients. $r_{g o a l}$ denotes the reward provided as an incentive when the agent reaches the targe t. $r_{c o l l i s i o n}$ represents the collision penalty imposing a negative reward when safety constraints are violated. $r_{d i s t}$ is a distance-based shaping term that encourages the agent to move toward the goal.

The energy-related term

r_{e n e r g y}

models the propulsion cost during motion and is defined as

r_{e n e r g y} = - (κ (a_{t}) \cdot [1 - α \cos (θ_{r e l})] \cdot Δ d)

(13)

where

κ (a_{t})

denotes the energy intensity associated with the motion mode and

α

represents the wind sensitivity coefficient and

θ_{r e l}

is the relative angle between the heading of the eVTOL UAV and the wind direction and

Δ d

is the traveled distance within the current step.

This formulation introduces an angle-dependent energy penalty through the wind angle correction factor

η (θ_{r e l}) = 1 - α \cos (θ_{r e l})

. It reflects the insight that energy consumption depends on motion intensity and alignment with environmental wind conditions. Energy cost increases under headwind conditions where

θ_{r e l} \to 180 °

and decreases with tailwinds while maneuvers such as climbing with larger

κ (a_{t})

incur higher penalties. As a result, the agent is encouraged to discover energy-efficient trajectories that minimize the energy expenditure.

Figure 6 illustrates the operational logic of the proposed framework.

2.4.2. Improved Energy-Aware TD3 (EA-TD3) Algorithm

To train an agent for trajectory planning in three dimensional environments under energy constraints, this study employs the TD3 algorithm [34]. As an extension of the deep deterministic policy gradient (DDPG) framework, TD3 is optimized for high dimensional continuous action spaces. In the context of eVTOL UAV trajectory planning, where velocity and angular control are required alongside energy penalties, the EA-TD3 addresses Q-value overestimation through three mechanisms. The Q-value represents the expected cumulative reward an agent receives by taking an action in a state. Since the agent selects actions with the highest Q-values, overestimation forces the policy to exploit erroneous peaks, which leads to inaccurate flight commands and high-energy maneuvers and oscillatory control outputs. By mitigating this estimation bias, EA-TD3 ensures stable and energy-efficient trajectory planning.

The core mechanisms of the EA-TD3 algorithm are organized as follows:

Twin Critic Architecture for Task-Oriented Energy Evaluation: The critic architecture evaluates flight decisions by mapping the state of the eVTOL UAV and its actions to a value. Each critic network uses a multi-layer perceptron (MLP) structure to represent energy consumption rules during mission phases such as climbing or cruising. By calculating future rewards, the critic networks provide the actor network feedback to pick trajectory planning choices that prioritize safety and minimize battery drain throughout the mission. To mitigate overestimation bias caused by function approximation errors, TD3 computes the target value y by taking the minimum output of the two target critic networks

$y = r + γ \min_{i = 1, 2} Q_{θ_{t a r g e t, i}} (s^{'}, \tilde{a})$

(14)

where $\tilde{a}$ is the target action smoothed by random noise. For the eVTOL UAV mission, this conservative estimation prevents the agent from selecting high-risk moves or power-hungry climb segments that might lead to battery exhaustion or mission failure in urban airspaces.
Delayed Updates and Target Policy Smoothing: To ensure training stability in urban wind fields, the EA-TD3 introduces the following strategies. The actor network and target networks are updated at a lower frequency than the critic networks. This ensures that the policy gradient is calculated after the value function, which evaluates energy safety trade-offs, has stabilized. To prevent changes in the flight state due to wind disturbances, a clipped Gaussian noise $ϵ$ is added to the target action

$\tilde{a} = clip (π_{ϕ_{t a r g e t}} (s^{'}) + ϵ, a_{l o w}, a_{h i g h}), ϵ \sim clip (N (0, σ^{2}), - c, c)$

(15)

This mechanism forces the Q-function to learn that similar trajectory planning actions should yield similar values, which results in smoother trajectory outputs and aerodynamic stability.
Experience Replay and Energy Balanced Optimization: The training process utilizes a replay buffer to store transitions $[s_{t}, a_{t}, r_{t}, s_{t + 1}, d o n e]$ . By minimizing the mean squared error (MSE) loss $L (θ_{i}) = E [{(y - Q_{θ_{i}} (s, a))}^{2}]$ , the agent optimizes its behavior. Since the reward function $r_{t}$ incorporates the energy intensity $κ (a_{t})$ and the state of charge (SoC) $S o C_{t}$ , the EA-TD3 agent learns a strategy. This strategy prioritizes goal reaching when $S o C_{t}$ is abundant and switches to energy-efficient maneuvers such as optimizing the heading relative to the wind angle $θ_{r e l}$ when energy is scarce. Figure 7 illustrates the structural logic of the algorithm.

2.5. Algorithm Introduction

The operational logic of the EA-TD3 is categorized into four primary stages including state perception and directional decision-making and energy-aware reward calculation and twin critic optimization. The execution flow is presented in Algorithm 1.

The core components and logic of the proposed method are summarized as follows:

Initialization and Environmental Robustness: The EA-TD3 algorithm establishes a dual critic framework $Q_{θ_{1}, θ_{2}}$ and an actor network $π_{ϕ}$ to mitigate the overestimation of action values in stochastic urban airspaces. By resetting the simulation environment at the beginning of each episode, the agent is exposed to a spectrum of wind vectors. This training regime ensures that the learned trajectory planning policy $π$ captures the physical correlation between aerodynamic resistance and energy intensity. Consequently, the eVTOL UAV develops a resilience that prioritizes energy efficiency rather than overfitting to a static geometric route.
Angular-Driven Autonomous Trajectory Planning: In alignment with the operational requirements for stable flight in the low-altitude economy, the EA-TD3 algorithm outputs maneuvers $a_{t} = {[v_{t}, θ_{t}, ϕ_{t}]}^{T}$ representing velocity and heading increments. Each action is translated into a spatial displacement $Δ d_{t}$ within the environment. This approach ensures that the reinforcement learning agent identifies the energy-efficient spatial topology within the three dimensional urban workspace while adhering to search connectivity. By prioritizing angular-driven trajectory planning, the framework enables the eVTOL UAV to execute smooth transitions between mission waypoints.

Algorithm 1: EA-TD3: energy-aware autonomous trajectory planning method.

Physics-Coupled Energy Mapping: The innovation of this stage lies in the real-time coupling of flight maneuvers with the Carnegie Mellon University (CMU) battery dataset. For each spatial displacement, the algorithm identifies the maneuver type $m \in {c l i m b, c r u i s e, d e s c e n t}$ based on the vertical component of the action. Simultaneously, it calculates the relative wind angle $θ_{r e l}$ between the heading of the eVTOL UAV and the wind vector. These variables are utilized to compute the energy intensity $κ (a_{t})$ using the wind angle correction factor $η (θ_{r e l})$

$κ (a_{t}) = κ_{m} \cdot (1 - α \cos θ_{r e l})$

(16)

This mapping mechanism transforms a spatial movement into an energy depletion metric $Δ E_{t} = κ (a_{t}) \cdot ∥ Δ d_{t} ∥$ to integrate battery characteristics into the trajectory planning loop. By accounting for these physics-coupled factors, the framework ensures that the planned trajectories are energetically sustainable for long-range missions.
Energy-Aware Reward and Policy Evolution: The calculated energy depletion $Δ E_{t}$ is integrated into the multi-objective reward function $r_{t}$ as a penalty term. This feedback loop compels the EA-TD3 agent to explore energy-efficient corridors where the $κ (a_{t})$ factor is minimized such as trajectory segments leveraging tailwinds. Through delayed policy updates, the algorithm stabilizes the learning process against wind noise and converges on a trajectory planning policy that prioritizes trajectories with the lowest cumulative electrochemical cost. This optimization ensures that the eVTOL UAV can achieve mission completion while maintaining energy consumption within the battery safety boundaries to enhance the operational reliability of autonomous systems.

3. Experimental Setup

3.1. Trajectory Planning Problem Description

In urban air traffic systems, the trajectory planning of eVTOL UAV platforms is a task involving stochastic variables and operational constraints. Modeling this process aims to construct a mathematical framework that simulates the operating environment. The objective of the proposed model is to minimize the total energy consumption and flight distance of the eVTOL UAV from its origin to its destination while ensuring flight safety and compliance with low-altitude air traffic regulations and the avoidance of structural obstacles, as illustrated in Figure 8.

Within this framework, the model integrates factors including the flight performance constraints of the eVTOL UAV and the safety requirements for building avoidance. The flight performance constraints encompass operational altitude and climb angle and yaw rate limitations. Meanwhile, environmental constraints are defined by the spatial distribution of urban structures. These constraints detailed in Section 2.1.3 are formulated as mathematical boundaries to ensure the feasibility of the trajectory planning solution. Consequently, the eVTOL UAV trajectory planning process is structured around three core pillars including spatial environment modeling and multi-objective function establishment and the definition of operational constraints.

3.2. Model Assumptions

To ensure the scientific rigor and computational efficiency of the trajectory planning strategy, the following assumptions are established for the eVTOL UAV simulation environment.

1. Deterministic Mission Boundaries. The spatial coordinates of the takeoff point

S_{0}

and landing destination

S_{g o a l}

are defined and remain fixed throughout the trajectory planning process.

2. Decoupled Kinematic and Geometric Model. The eVTOL UAV is treated as a point mass regarding its inertia and energy consumption calculations to enhance efficiency during the reinforcement learning training process. For the purpose of collision detection and spatial interaction, the aircraft is represented by a three dimensional safety cylinder rather than a geometric point. This equivalent safety envelope

W_{e n v}

accounts for the physical dimensions of the EH216-S and ensures that the trajectory planning results maintain a safety margin from urban obstacles.

3. Constant Ground Velocity and Its Limitations. The eVTOL UAV is assumed to maintain a constant ground speed

V_{g}

throughout the mission. This simplification is adopted because the aerodynamic response data and flight control schedules for the EH216-S are not publicly available for this study. By fixing the ground velocity, the simulation evaluates the relationship between energy intensity

κ (a_{t})

and the three dimensional spatial topology under wind disturbances. This modeling choice ignores the energy consumption during acceleration and deceleration phases, which may lead to an underestimation of the total energy required for missions with frequent maneuvers.

4. Static Obstacle Environment. Urban structures and obstacles are treated as stationary entities within a three dimensional occupancy grid representing an urban canyon for trajectory planning.

5. Profile Compliant Mission Structure. The trajectory planning mission includes mandatory waypoints to ensure distinct climb and cruise and descent phases, which satisfies flight profile requirements.

6. Regulated Low-Altitude Airspace. To comply with low-altitude economy regulations, the cruising altitude is constrained to ensure safety in building shuttle scenarios while adhering to regional airspace management policies.

7. Geometric Scaling and Spatial Rationality. To optimize training efficiency while preserving interaction fidelity, the simulation employs a geometric scaling strategy. According to the airworthiness constraints issued by the Civil Aviation Administration of China (CAAC), the maximum operational altitude for the EH216-S is 120 m. By mapping this real-world boundary to the 20 m numerical simulation workspace, a geometric scaling operator of

1 / 6

is established. Regarding the aircraft dimensions, the physical wingspan of

5.73

m is equivalent to a core fuselage cylinder with a diameter of

3.0

m to enhance search efficiency. Through the

1 / 6

scaling ratio, this physical dimension is represented by a

0.5

m equivalent safety envelope

W_{e n v}

within the simulation. This alignment ensures that the spatial occupancy and building clearances reflect urban canyon dynamics.

8. Energy Constraints and SoC Safety Red line. The initial energy

E_{0}

is set between 100 and 200 J to construct an energy-constrained trajectory planning task. By combining this setting with a

20 %

state of charge (SoC) safety red line, the framework compresses the energy redundancy of the mission to strengthen the sensitivity of the reward function to wind field disturbances. Within the EA-TD3 reward mechanism, breaching this threshold triggers a penalty to prioritize flight safety while evaluating the decision-making performance under limited resource conditions.

Note that the energy metrics in this simulation are presented in normalized Joules to maintain numerical stability during the reinforcement learning training process. These energy values reflect the proportional mapping of the power consumption characteristics derived from the CMU dataset onto the EH216-S kinematic model rather than the absolute kilowatt hour values of the actual aircraft. Consequently, all energy metrics mentioned hereafter are expressed in these normalized Joules to ensure consistency with the energy intensity definitions. This standardized convention ensures that the learned decision logic is evaluated based on the relative energy depletion patterns rather than absolute physical magnitudes, which enhances the robustness of the trajectory planning strategy.

3.3. Key Function Formulations

To ensure that the EA-TD3 agent achieves trajectory planning while maintaining operational safety, we define two mathematical frameworks including the objective function for performance optimization and the energy boundary for safety assurance.

3.3.1. Establishment of Objective Function

The objective function is designed to balance the trade-off between trajectory planning efficiency and energy autonomy. The total cost function f is formulated as a weighted sum of the trajectory distance and energy expenditure

\min f = α_{1} L + α_{2} E

(17)

where

L = \sum_{i = 1}^{n} {∥ s_{i} - s_{i - 1} ∥}_{2}

is the cumulative geometric length cost and

E = \sum_{i = 1}^{n} κ (a_{i}) \cdot L_{i}

is the energy expenditure cost driven by the energy intensity factor

κ (a_{t})

derived from the CMU dataset.

3.3.2. Energy Boundary and Penalty Function

To prevent mission failure, a state of charge (SoC) safety red line is established at

20 %

. This boundary is implemented through a penalty mechanism in the reward structure

R_{p e n a l t y} = \{\begin{matrix} 0, & S o C_{t} > 20 % \\ - P_{r e d l i n e}, & S o C_{t} \leq 20 % \end{matrix}

(18)

where

P_{r e d l i n e}

represents a constant penalty. This formulation ensures that the trajectory planning policy prioritizes energy preservation over path shortening when the eVTOL UAV approaches the safety limit.

3.4. eVTOL UAV Parameter Settings

With the expansion of the low-altitude economy, the field of eVTOL UAVs has undergone evolution. Progress remains diverse and an industry standard for configuration has yet to be finalized. In this study, since the eVTOL UAV adopts a decoupled kinematic and geometric model for trajectory planning, the influence of design parameters is integrated into the kinematic constraints. As detailed in Section 2.1, this paper selects the EHang EH216-S platform, which is the first eVTOL UAV to receive a type certificate (TC) as the research object. By utilizing the parameters of this model as preset values, this research aims to evaluate the robustness of the energy-aware trajectory planning method and conduct a comparative analysis. Table 4 summarizes the parameters required to define the operational envelope for this study.

3.5. Energy Consumption Model Results

In this study, three battery cells including VAH01 and VAH17 and VAH27 from the CMU eVTOL UAV battery dataset are selected for characterization and energy intensity modeling.

As presented in Table 5, the dataset provides time series measurements such as terminal voltage and discharge current and cell temperature, which serve as the foundation for the power consumption analysis.

As presented in Table 6, through statistical analysis of the power characteristics throughout the life cycles of the eVTOL UAV batteries, this study calibrated the energy intensity factors

κ

for the flight phases. To ensure the numerical stability and gradient convergence of the EA-TD3 algorithm, a linear normalization factor of

0.1

is applied to the calibrated values. This adjustment maps the energy metrics into a range for neural network training while preserving the relative physical proportions of energy depletion across different flight phases. The resulting hierarchical pattern reveals that descent energy intensity exceeds climb energy intensity and both remain higher than cruise energy intensity.

The physical significance of

κ

lies in its representation of energy depletion per unit of spatial displacement. While the power demand during the climb or descent phases is approximately three times that of the cruise phase, the disparity in

κ

reaches 40 to 80 times. This is governed by the relationship

κ = P / v

, where the cruise phase benefits from horizontal cruise velocities

v_{c r u i s e}

, which compresses the time required to traverse a unit distance. Conversely, the climb and descent phases involve lower vertical velocities while requiring power to counteract gravity, which results in energy accumulation per meter of displacement.

The calibrated and normalized values including

κ_{d e s c e n t} = 2.032

and

κ_{c l i m b} = 1.018

and

κ_{c r u i s e} = 0.025

provide a physical foundation for the EA-TD3 trajectory planning agent. This scaling strategy ensures that the learned trajectory planning policy is driven by the coupling between three dimensional trajectory topology and energy depletion patterns rather than absolute physical magnitudes. By maintaining these relative ratios, the simulation preserves the energy constrained nature of the mission within the compressed numerical workspace.

3.6. 3D Urban Workspace and Obstacle Modeling Results

The simulation workspace is defined as a continuous three dimensional Euclidean space with dimensions

L \times W \times H = 20

m

\times 20

m

\times 20

m. Consistent with the model assumptions established previously, the simulation environment is constructed using proportional scaling to optimize computational resources while preserving the physical logic. Consequently, the physical parameters of the eVTOL UAV including its geometric dimensions and kinematic constraints are scaled. This ensures that the interaction between the eVTOL UAV and the urban obstacles remains consistent with operational dynamics.

To simulate a structured urban landscape, the set of obstacles

O

is represented using an analytical box model. Each building

O_{i} \in O

is defined by its minimum corner coordinates

(x_{i, \min}, y_{i, \min}, z_{i, \min})

and maximum corner coordinates

(x_{i, \max}, y_{i, \max}, z_{i, \max})

. The geometric dimensions of these urban architectural obstacles are characterized by

\begin{matrix} Δ x_{i} & = x_{i, \max} - x_{i, \min} \\ Δ y_{i} & = y_{i, \max} - y_{i, \min} \\ h_{i} & = z_{i, \max} - z_{i, \min} \end{matrix}

(19)

Therefore, for an eVTOL UAV located at position

P_{t} = (x_{t}, y_{t}, z_{t})

at time step t, the mathematical expression for the collision detection function

C (P_{t})

containing the safety buffer

d_{safe}

can be updated by Formula (8) as follows

C (P_{t}) = \{\begin{matrix} 1, & if \exists O_{i} \in O such that \\ (x_{i, \min} - d_{safe} \leq x_{t} \leq x_{i, \max} + d_{safe}) \land \\ (y_{i, \min} - d_{safe} \leq y_{t} \leq y_{i, \max} + d_{safe}) \land \\ (0 \leq z_{t} \leq z_{i, \max} + d_{safe}) \\ 0, & otherwise \end{matrix}

(20)

where

d_{safe} = 0.1

m denotes the safety buffer margin, which complements the

0.5

m scaled aircraft dimension. Since all buildings in the urban environment are ground-based, where

z_{i, \min} = 0

, the region is treated as free space when the flight altitude

z_{t}

exceeds the expanded obstacle height

z_{i, \max} + d_{safe}

, which enables vertical overflight maneuvers. This margin accounts for aerodynamic disturbances and control uncertainties near building surfaces.

The environment is set in Table 7.

Two intermediate waypoints

W_{1}

and

W_{2}

are integrated into the simulation workspace. First, from a mission-oriented perspective, all waypoints are predefined at a consistent altitude of 12 m, which serves as a safe cruise layer and simulates a low-altitude flight corridor for eVTOL UAV logistics. Second, from an algorithmic perspective, these waypoints function as local sub-goals within the EA-TD3 framework. By decomposing the trajectory planning task into sequential segments, the waypoints mitigate the sparse reward challenge and guide the agent through obstacle-dense regions to accelerate the convergence of the reinforcement learning process.

The configurations of the trajectory planning mission are summarized in Table 8. To ensure an evaluation, the mission requires the eVTOL UAV to traverse from the start position

P_{0}

to the goal

P_{goal}

within a time horizon

T_{\max}

. The scale factor

λ_{L}

defines the mapping between the physical dimensions and the simulation workspace, which maintains consistency with the proportional scaling assumption.

3.7. Wind Field Modeling Results

Environmental robustness is a requirement for eVTOL UAV operations [32]. In this study a spatially varying turbulent wind field

M (x)

is defined over a discrete grid of

20 \times 20 \times 20

nodes, where

x = {[x, y, z]}^{⊤}

denotes the spatial coordinate vector. To reflect the characteristics of the atmospheric boundary layer, the grid wind velocity is formulated by coupling a temporal sinusoidal base with a height-dependent stochastic noise term. This setup simulates localized flow patterns and wind shear effects encountered during trajectory planning missions.

The baseline wind velocity component

w_{i, j, k, c}^{base} (t)

is constructed using a dual-frequency sinusoidal superposition such that

w_{i, j, k, c}^{base} (t) = [0.5 \sin (t + ϕ_{i, j, k, c}) + 0.3 \sin (2 t + ϕ_{i, j, k, c})] \cdot s

(21)

where

s = 0.5

is the amplitude scaling factor and t is the dimensionless time derived from the step count. To incorporate the atmospheric boundary layer effects, the stochastic noise intensity is coupled with the vertical height through a wind profile power law relationship. The stochastic disturbance

η_{i, j, k, c}

is defined as

η_{i, j, k, c} \sim U (- σ \cdot {(\frac{z}{z_{r e f}})}^{δ}, σ \cdot {(\frac{z}{z_{r e f}})}^{δ})

(22)

where

σ = 0.2

m/s represents the baseline noise magnitude and

z_{r e f} = 20

m is the reference altitude. The exponent

δ = 0.35

denotes the wind profile power law coefficient for high-density urban terrain, which dictates that the turbulence intensity increases with flight altitude. The resulting grid wind velocity is formulated as

w_{i, j, k, c} = w_{i, j, k, c}^{base} (t) + η_{i, j, k, c}

.

The energy expenditure of the eVTOL UAV is modulated by the relative angle

θ_{r e l}

between the wind vector and the heading of the aircraft. Under the constant ground velocity assumption, we introduce a dimensionless wind angle correction factor

η (θ_{r e l})

to formulate the effective energy intensity

κ_{e f f}

κ_{e f f} (a_{t}) = κ_{s} \cdot η (θ_{r e l})

(23)

where

κ_{s} \in {κ_{c l i m b}, κ_{c r u i s e}, κ_{d e s c e n t}}

is the calibrated energy intensity and

η (θ_{r e l}) = 1 - α \cdot \cos (θ_{r e l})

with

α = 0.5

as the wind sensitivity coefficient. This trigonometric formulation ensures that a tailwind condition reduces the energy demand while a headwind condition increases it. The wind vector at the position of the vehicle is obtained via nearest neighbor interpolation. This synthesized wind field is incorporated into the kinematic state evolution such that

V_{ground} (t) = V_{air} (t) + M (P_{t})

. This modeling approach incentivizes the EA-TD3 agent to perceive the spatial wind field topology and prioritize trajectories with wind orientations to optimize the trade-off between path length and energy consumption. The parameters are summarized in Table 9.

3.8. Algorithmic Hyperparameters and Evaluation Metrics

The trajectory planning performance of the EA-TD3 model is sensitive to hyperparameter configurations. In this study, we evaluate the optimization of hyperparameters including learning rate, batch size, network architecture, and target policy noise. To facilitate a rigorous cross-comparison and eliminate dimensional influences, all evaluation metrics are normalized and mapped to a uniform

[0, 100]

scale through a linear transformation. This ensures that the collaborative response of all indicators to hyperparameter changes can be visualized within a single coordinate system.

1. Best Average Reward

R_{best}

. This reflects the peak performance of the learned policy by capturing the maximum average reward achieved across all evaluation points during the training process.

2. Success Rate

P_{success}

. This indicates the reliability of task completion, defined as the proportion of episodes where the eVTOL UAV satisfies the termination condition

∥ P_{T} - P_{goal} ∥ < ε

with

ε = 1.0

m.

3. Composite Score

S_{composite}

. A weighted metric designed to integrate the reward signal with the success rate, formulated as

S_{composite} = w R_{best} + (1 - w) \cdot β \cdot P_{success}

(24)

where

w = 0.5

is the weighting coefficient and

β = 200

is a normalization factor used to balance the numerical scales.

To ensure the generalizability of the hyperparameter selection, the optimization is conducted under sufficient energy conditions. This enables the EA-TD3 algorithm to prioritize learning optimal trajectory planning and avoidance strategies without favoring overly conservative behaviors induced by energy scarcity. The impact of energy constraints on performance is further analyzed in Section 4.

The experimental results of hyperparameter tuning are illustrated in Figure 9. In this analysis,

S_{composite}

is designated as the primary evaluation criterion as it comprehensively assesses learning efficiency and task reliability. The other two metrics, including optimal reward and success rate, serve as auxiliary indicators to provide multi-dimensional validation. To interpret these results, the focus should remain on the synchronization of the indicators. As illustrated in Figure 9,

S_{composite}

and the auxiliary metrics achieve their peak performance at the same coordinate points across the four hyperparameters. This consistency confirms that the optimal

S_{composite}

is not achieved at the expense of any individual metric, validating the robustness of the selected configurations. The detailed analysis of each parameter is as follows.

Based on the evaluation framework established above, the analysis of the results for the four hyperparameters is as follows.

1. Learning Rate. The learning rate

α_{l r}

affects convergence stability. As shown in Figure 9a, the value of

α_{l r} = 1 \times 10^{- 4}

was selected because it yielded the highest

S_{composite}

. The synchronization of the auxiliary indicators at this point confirms the reliability of this choice for the trajectory planning task.

2. Batch Size. We assessed batch sizes

B \in {128, 256, 512}

. A batch size of

B = 256

was identified as the optimal point in Figure 9b. The alignment of the success rate and best reward at this same coordinate provides evidence that this balance maintains training efficiency while minimizing gradient variance.

3. Network Architecture. As shown in Figure 9c, the representational capacity was tested across three multi-layer perceptron (MLP) configurations. A three layer architecture with hidden units of 512, 256, and 128 outperformed alternative structures. This configuration captured the nonlinear mappings between the high-dimensional state space and the eVTOL UAV action space.

4. Target Policy Noise. As a feature of the EA-TD3 algorithm, target policy noise

σ_{t p}

is utilized to achieve policy smoothing and mitigate Q-value overestimation. We evaluated noise levels of

0.1

,

0.2

, and

0.5

. Figure 9d shows that a noise level of

σ_{t p} = 0.1

provided exploration without destabilizing the target Q-value targets, resulting in the highest

S_{composite}

and task success rate.

The final selected hyperparameters are summarized in Table 10.

To support the trajectory planning performance of the model, the hyperparameters and environmental constants are configured as documented in Table 11. These settings establish the foundation for the trade-off between exploration efficiency, safety constraints, and energy optimization. As detailed in the table, the configuration is categorized into algorithm-specific hyperparameters, energy intensity factors for different flight phases, and the weighting coefficients of the reward function. Specifically, the reward weights for goal reaching and collision avoidance are set as primary constraints, while the energy penalty weight

ω_{e} = 0.5

regulates the consumption-aware behavior of the agent during the 200,000 training steps.

The performance of various algorithms in trajectory planning is quantified through four metrics that evaluate the experimental results.

1. Success Rate. The success rate is the proportion of episodes where the eVTOL UAV reaches the target position within the termination threshold of 1.0 m. This metric reflects the reliability and task completion capability of the learned policy. A high success rate indicates that the agent can consistently navigate through urban environments with obstacles and wind disturbances to accomplish the mission.

2. Average Reward. The average reward is the mean cumulative reward across all evaluation episodes. This indicator integrates objectives including goal reaching, avoidance, distance minimization, and energy efficiency through the energy-aware reward function. As a performance indicator, it captures the quality of the policy by balancing task completion, safety, and resource management.

3. Average Steps. The eVTOL UAV selects actions at each time step according to the learned policy and transitions to the next state based on the system dynamics. Computed over successful episodes, the average number of steps reflects the temporal efficiency of the algorithm. Fewer steps indicate faster mission completion and more direct trajectory planning, which is necessary for time-sensitive operations.

4. Energy Consumption. This metric records the total energy expended during each episode, accounting for the differential energy costs of climbing, cruise, and descent phases. The average energy consumption reflects the energy awareness of the algorithm and its ability to plan resource-efficient trajectories. Lower energy consumption demonstrates that the agent has learned to optimize maneuvers by leveraging the energy penalty term within the reward function.

4. Discussion

4.1. Training Results and Comparative Analysis

The numerical experiments were conducted on a workstation equipped with an AMD Ryzen 9 7945HX processor, 64 GB of RAM, and an NVIDIA GeForce RTX 4060 GPU within a Windows 11 environment. To evaluate the performance of the proposed Energy-Aware Twin Delayed Deep Deterministic Policy Gradient (EA-TD3) trajectory planning algorithm, a comparative analysis was performed against three benchmark DRL frameworks: Deep Deterministic Policy Gradient (DDPG) [35], Soft Actor-Critic (SAC) [36], and the standard Twin Delayed Deep Deterministic Policy Gradient (TD3) [37].

To ensure the validation of operational robustness and energy efficiency, all algorithms were trained and evaluated under identical environmental conditions, including synchronized wind field topologies and calibrated energy intensity models. The hyperparameter configurations for each algorithm are summarized in Table 12. The EA-TD3 algorithm utilizes a

512 \times 256 \times 128

architecture and refined target policy noise levels, as identified in Section 3.8, to manage the nonlinearities and stochastic perturbations inherent in urban trajectory planning.

Comparative experiments were conducted across four algorithms. To mitigate stochastic training fluctuations, a smoothing technique was applied to the raw data. Figure 10 illustrates the training progress, where Figure 10a shows the success rate, Figure 10b shows the average reward, Figure 10c shows the average steps, and Figure 10d shows the energy consumption.

In the initial training phase, the agents undergo stochastic exploration, leading to frequent collisions with urban obstacles or environmental boundaries. These early failures result in a success rate near 0% and negative cumulative rewards. During this stage, the absence of an optimized trajectory planning policy keeps the average episode length between 150 and 180 steps. Such inefficient maneuvers lead to rapid energy depletion, nearly exhausting the eVTOL UAV battery budget.

As training progresses, the agents internalize trajectory planning behaviors through experience replay and policy gradient updates. Performance improvements emerge between 10,000 and 20,000 steps, characterized by rising success rates and rewards transitioning into positive territory, while average episode lengths decrease to 80–100 steps. By the conclusion of the training, EA-TD3 achieves a success rate of approximately 90–95%, an average reward of 100–130, and a reduced temporal footprint of 30–40 steps per episode.

Comparatively, SAC achieves the second-best performance with a success rate between 75% and 85%, while TD3 and DDPG exhibit success rates in the range of 60–75% and 50–65%, respectively. The average energy consumption of EA-TD3 is 10–20 J lower than that of the baseline algorithms. This efficiency validates the energy-aware optimization strategy, which is required for battery-constrained missions in complex urban airspaces.

4.2. Analysis of Energy-Aware Trajectory Planning Mechanisms

The training results validate the feasibility of the EA-TD3 algorithm for trajectory planning. To facilitate a qualitative comparison, Figure 11 illustrates the 3D trajectories generated by the four evaluated algorithms within a standardized mission scenario. From a geometric perspective, all algorithms successfully execute a complete flight profile encompassing the phases of climb, cruise, and descent. However, the distinct topological variations in the trajectories, as shown in Figure 11, highlight the varying trajectory characteristics of the agents within the low-altitude urban airspace.

For eVTOL UAV platforms such as the EH216-S, the requirement for trajectory planning lies in the rational utilization of limited energy resources amidst environmental uncertainties. To evaluate the impact of physical constraints on performance, we conduct three progressive analyses within a unified experimental framework. These scenarios represent distinct analytical dimensions of the same mission environment to evaluate the EA-TD3 framework.

1. Maneuvering Behavior Analysis Based on Real Battery Dynamics (Analysis 1). This analysis evaluates how the data-driven energy consumption model shapes the fundamental flight behaviors of the eVTOL UAV. Crucially, since the eVTOL UAV must operate without human intervention, the system must independently evaluate the cost disparities between various maneuvers to ensure mission viability. This phase specifically tests the ability of the EA-TD3 algorithm to internalize nonlinear power costs and identify the optimal balance between climb, cruise, and descent. It should be noted that the dynamic wind field is consistently applied across all analyses to ensure environmental realism and generalizability of the findings.

2. Robustness Testing Under Safe Flight Envelope Constraints and Resource Scarcity (Analysis 2). To evaluate the eVTOL UAV sensitivity to the Safe Flight Envelope (SFE), we progressively compressed the battery energy budget within the established dynamic wind field. This stress test is designed to identify the critical thresholds at which traditional energy-agnostic algorithms suffer functional failure. By highlighting the limitations of conventional methods under coupled environmental and resource pressures, this analysis demonstrates how the EA-TD3 autonomous agent ensures the reliability of missions through proactive trajectory planning reconfiguration. The inclusion of wind effects throughout this process validates that the energy-sensing mechanism remains effective under stochastic perturbations.

3. Analysis of Training Convergence Stability and Constraint Satisfaction (Analysis 3). This analysis is designed to dissect the underlying mechanisms responsible for the performance of the proposed algorithm. By analyzing the fluctuation rate of energy consumption throughout the training process, we distinguish the decision-making stability between soft penalties and hard constraints. The results reveal that incorporating explicit SOC observations enables the agent to internalize a risk-averse trajectory planning strategy. This mechanism ensures that the eVTOL UAV treats energy limits as a non-negotiable safety boundary. Such an intrinsic commitment to constraint satisfaction is the reason why the EA-TD3 autonomous agent maintains higher mission reliability in complex urban environments compared to traditional energy-agnostic frameworks.

4.2.1. Analysis 1: Maneuvering Behavior Analysis Based on Real Battery Dynamics

Under the baseline condition of sufficient energy reserves, this section investigates how the nonlinear power dynamics derived from the CMU battery dataset dictate the decision-making logic of the eVTOL UAV agent. Specifically, we examine the emergent ability of the agent to distinguish between energy-intensive climb/descent maneuvers and energy-efficient cruise flight within the unified experimental framework.

To mitigate the influence of stochastic policy exploration and ensure a rigorous validation of algorithmic robustness, we conducted 50 independent Monte Carlo (MC) simulations for each of the four trained algorithms. Table 13 presents the statistical averages and performance metrics derived from these trials. To evaluate the geometric quality of the generated trajectories, two directional metrics are introduced: the heading metric, representing the cumulative absolute changes in the horizontal yaw angle to quantify total steering effort, and the smoothness metric, defined as the integrated angular deviation between successive 3D velocity vectors to reflect trajectory fluidity and the suppression of abrupt maneuvers. This comparative analysis verifies the stability of the autonomous trajectory planning policy when interacting with high-fidelity physical constraints.

The experimental results demonstrate that EA-TD3 achieves an 11.6% reduction in energy consumption average compared to the baseline, exhibiting higher energy efficiency and trajectory quality among the evaluated algorithms. This energy-saving effect is not solely a consequence of trajectory length reduction, but originates from the internalization of climb and descent cost balancing through extensive experience sampling. The specific analytical findings are as follows.

First, the optimization of maneuver frequency is governed by power cost awareness. Analysis of the behavioral characteristics reveals that EA-TD3 recorded the lowest frequency of vertical maneuvers, with only 8.82 climbs and 7.72 descents on average. Incorporating the high-fidelity battery dynamics model confirms that eVTOL UAV power consumption is highly nonlinear. In autonomous flight, the vertical climb phase requires substantial power to counteract gravity, resulting in instantaneous demands that significantly exceed those of the horizontal cruise phase.

Traditional algorithms, such as DDPG and TD3, lack an explicit energy perception mechanism and tend to exhibit greedy obstacle-avoidance behavior by frequently resorting to drastic altitude changes. In contrast, the EA-TD3 agent understands the physical cost associated with high-rate battery discharge. As illustrated in the lateral profile in Figure 12, EA-TD3 maintains a more stable altitude layer and opts to bypass obstacles through horizontal trajectory adjustments rather than energy-intensive vertical shifts. While the trajectories of DDPG and EA-TD3 appear nearly coincident during the initial climb and mid-course phases in Figure 12, this phenomenon stems from their shared algorithmic lineage. Since EA-TD3 is developed as an energy-aware extension of the TD3 framework, which itself evolves from DDPG, both algorithms utilize a deterministic actor structure that prioritizes the most direct geometric path to satisfy the primary mission completion reward in the early training stages. Within the rigid constraints of a narrow urban canyon, this deterministic gradient leads to a convergence toward a similar initial climb path. However, significant topological differences emerge when comparing EA-TD3 with SAC and the standard TD3. For SAC, the inclusion of a maximum entropy objective encourages continuous action space exploration, resulting in stochastic and curved climb paths. For the standard TD3, the absence of an integrated energy penalty causes the agent to ignore the high battery discharge rate during aggressive climbs. In contrast, EA-TD3 identifies a critical divergence during the descent phase by utilizing its twin-delayed value estimation to prioritize a gradual glideslope. This strategic shift avoids the high power loss regimes of the battery system and ensures the structural load stability of the eVTOL UAV throughout the trajectory planning mission.

Despite its superior energy performance, Figure 13 reveals that the proposed EA-TD3 method exhibits noticeable right-angle maneuvers during the final landing phase. This behavior originates from the mission success-oriented logic in the terminal state where the agent executes aggressive heading corrections to eliminate residual positional errors to ensure the eVTOL UAV precisely strikes the landing pad and secures the success reward. Since the current reward structure does not impose heavy penalties on rapid heading changes, these sharp turns emerge as the most effective strategy for the agent to guarantee mission completion. However, these results also highlight a limitation of the current model, specifically the omission of strict kinematic smoothness constraints such as angular acceleration limits. Although the agent achieves high energy efficiency, it does so through a trade-off that sacrifices trajectory fluidity in the final seconds. In real-world scenarios, such abrupt maneuvers could impose severe structural stress on the airframe or even lead to aerodynamic stall. Future work will focus on incorporating curvature-constrained action spaces or refined smoothness terms to ensure that the generated trajectory planning solutions are more aligned with physical flight dynamics.

Secondly, to intuitively reveal decision-making disparities under physical constraints, Figure 13 illustrates the correlation between flight altitude profiles and instantaneous energy consumption rates for each algorithm. As observed in the side plot of Figure 13, the energy consumption trajectories of the baseline algorithms and EA-TD3 diverge significantly during the initial 10 to 15 steps. The main plot further demonstrates that the instantaneous energy consumption rates of TD3, DDPG, and SAC are approximately 40% higher than that of EA-TD3 in the initial phase.

From a trajectory perspective, the baseline algorithms, driven by energy-agnostic logic, tend to execute extremely steep climb maneuvers to rapidly establish a vertical safety margin. While this strategy is valid in purely geometric trajectory planning, these aggressive climbs consume 40% to 50% of the total energy budget within the first third of the mission when physical dynamics are considered. In contrast, the EA-TD3 energy consumption rate remains stable between 0.8 and 2.5 J per step throughout the flight.

This stabilized flight profile offers significant engineering value for autonomous operations. Consistent power output effectively extends battery cycle life and alleviates thermal management pressure on motors and electronic controllers during high-power discharges. Furthermore, for eVTOL platforms such as the EH216-S, smooth altitude transitions ensure structural load stability and minimize maneuvering stress on the airframe, adhering to rigorous operational standards for civil UAVs. In the figures, the green circles and red crosses represent the starting positions and targets, respectively, while stars denote the waypoints. Most importantly, the EA-TD3 results demonstrate that the energy perception mechanism enables the agent to identify the path with the lowest physical energy cost in complex urban canyons, achieving a dual optimization of energy and spatial efficiency.

4.2.2. Analysis 2: Robustness Testing Under Safe Flight Envelope Constraints and Resource Scarcity

In the UAM environment, an eVTOL UAV must handle both limited energy and wind turbulence in urban canyons. This section presents a stress test within the experimental framework by reducing the battery energy budget from 200 J to 120 J in a non-uniform dynamic wind field. To ensure statistical significance, we conducted 50 independent MC autonomous trajectory planning trials for each algorithm at every energy level. These experiments verify the algorithms’ sensitivity to the flight envelope and mission delivery capability under resource scarcity.

The results in Table 14 show that the reliability of the algorithms diverges as the energy budget tightens. As a pilotless platform, the EH216-S requires its trajectory planning agent to identify paths that satisfy the flight envelope without real-time human correction, making the robustness of the autonomous logic the primary factor in mission success.

First, we analyze the performance drop observed at the energy threshold. For this scenario, 120 J is defined as the survival threshold. At this limit, energy-agnostic algorithms such as DDPG and SAC showed high sensitivity with success rates falling to 0% and 6.1% respectively. This occurs because these baseline frameworks lack real-time SOC awareness and cannot reconfigure their trajectories based on remaining energy reserves.

Figure 14 shows the flight status of different algorithms under energy limits. The thicker lines represent failed missions, and the circles of the same color represent the corresponding endpoints. The EA-TD3 algorithm results in the fewest failed missions, followed by TD3 and SAC, while DDPG shows the lowest reliability. As illustrated in Figure 14, when baseline algorithms encounter wind perturbations requiring additional thrust, the eVTOL UAV frequently suffers from functional failure in the final stages, typically between steps 25 and 35. This occurs because their early maneuvers are energy-intensive, leaving no power buffer to counteract late-stage environmental stochasticity. Without managing energy–safety trade-offs, these agents lead the platform to power depletion before mission completion.

Secondly, the autonomous adaptive mechanism under aerodynamic energy coupling distinguishes EA-TD3 from other algorithms. EA-TD3 maintains a success rate of 87.8% even under the 120 J constraint. This resilience stems from its trajectory reconfiguration capability where the algorithm couples the wind field correction factor

η

with SOC observations. Upon perceiving that energy levels are approaching critical boundaries, the agent identifies regions with lower aerodynamic resistance within the wind field. This decision-making logic moves beyond geometric obstacle avoidance and functions as a physics-driven resource-preserving strategy.

At the 120 J threshold, the performance lead of EA-TD3 over the standard TD3 reaches 61.3 percentage points. Analysis of the successful trajectories in Figure 14 reveals that when facing an energy deficit, the algorithm increases flight range elasticity by suppressing maneuver gradients and optimizing cruise altitudes. These experiments verify that energy boundary sensitivity is a pivotal metric for evaluating trajectory planning frameworks. EA-TD3 demonstrates that explicit energy perception mechanisms serve as a safety guarantee for eVTOL UAV platforms confronting meteorological disturbances, ensuring autonomous airworthiness and mission delivery capability during energy-critical phases.

4.2.3. Analysis 3: Training Convergence Stability and Constraint Satisfaction

This analysis is designed to examine the relationship between decision-making stability and constraint mechanisms by quantifying energy efficiency fluctuations throughout the training evolution. To validate the reliability of the algorithm during the learning process, this section evaluates its statistical performance across multiple independent training sessions. Through the comparative visualization in Figure 15, the distinction between EA-TD3 and the benchmark frameworks regarding energy consumption stability becomes evident.

First, we analyze the impact of soft penalties and hard constraints on training stability. In Figure 15a, we can see that during the training evolution, DDPG, TD3, and SAC exhibited energy consumption fluctuations which oscillated between 57 J and 70 J. This instability stems from their reliance on an energy efficiency penalty within the reward function, which lacks a mandatory environmental termination condition. Under this mechanism, the agent frequently attempts high-energy maneuvers during exploration without bearing the immediate consequence of mission failure, which causes the strategy to oscillate between energy conservation and aggressive execution. In contrast, EA-TD3 maintained energy consumption within a narrow range of 61 to 63 J. By establishing energy depletion as a hard constraint termination condition and introducing explicit SOC observations, the agent was forced to internalize a risk-averse decision-making logic from the early stages of training. This mechanism ensures that EA-TD3 exhibits policy consistency across independent stochastic trajectory planning trials.

Secondly, we conducted statistical consistency verification through 50 independent simulations. To further demonstrate this reliability, the box plot in Figure 15b illustrates the statistical distribution characteristics of the trials. EA-TD3 not only achieved the lowest average energy consumption of 62.46 J but also maintained the smallest interquartile range (IQR) among all evaluated algorithms. This demonstrates that its learned energy allocation strategy possesses high repeatability when encountering varying starting coordinates and environmental noise. In contrast, the baseline algorithms exhibit wider bandwidths and outliers, which indicate that in the absence of hard constraints, they are prone to unpredictable high-energy behaviors. Such a level of uncertainty is a challenge for the autonomous operation of eVTOL UAV platforms.

The experimental results demonstrate that relying solely on reward-based penalties is insufficient to cultivate strategies with consistent reliability. Through a hard constraint-driven training paradigm, EA-TD3 induces flight behaviors characterized by high consistency and low volatility, which provides technical support for the safe execution of unmanned missions in confined urban airspaces.

4.3. Ablation Study

To verify the necessity and effectiveness of the energy perception mechanism in DRL-based eVTOL UAV autonomous trajectory planning, this study conducts ablation experiments by increasing environmental fidelity. Unlike the previous stress tests focusing on energy boundaries, these experiments are performed with a sufficient energy budget of 200 J to isolate the impact of three physical factors on trajectory planning performance, including dynamic wind fields, high-fidelity energy consumption models, and complex urban obstacle layouts.

As summarized in Table 15, we designed four simulation tiers progressing from idealized environments to those with physical realism. By introducing key variables of the operational environment, we established a performance benchmark for eVTOL UAV autonomous trajectory planning. This tiered configuration is intended to deconstruct the influence of environmental fidelity on the algorithm decision-making logic to ensure that the agent possesses robustness for transition from simulation to real-world application.

In the Level 1 configuration, the environment consists of a constant wind field, a uniform energy consumption model, and three simplified building obstacles. The mission objective is to reach a single target landmark. This level serves as an idealized trajectory planning environment, which provides a performance baseline for the agent.

In the Level 2 configuration, while maintaining constant wind and simplified obstacles, a realistic energy consumption model derived from the CMU dataset is introduced. This model accounts for the nonlinear power distribution of the eVTOL UAV during maneuvers such as climbing, cruising, and descending. This stage subjects the trajectory planning algorithm to physical performance constraints to test its ability to internalize aerodynamic costs.

In the Level 3 configuration, retaining the high-fidelity energy model, the constant wind field is replaced with a dynamic sinusoidal wind field that fluctuates across spatial and temporal dimensions. This enhancement introduces non-stationary aerodynamic drag, which evaluates the autonomous trajectory planning system trajectory correction capabilities under dynamic environmental uncertainties.

In the Level 4 configuration, representing the peak of environmental fidelity, this level integrates dynamic wind fields, CMU-based power models, six building clusters, and multiple traversal landmarks. This scenario requires the algorithm to execute energy management across multiple task phases while navigating spatial complexity. It serves as an assessment of the eVTOL UAV autonomous operational envelope in urban canyons.

Four algorithms, DDPG, TD3, SAC, and EA-TD3, were trained independently at each environment level under identical hyperparameter configurations. Each agent underwent

5 \times 10^{5}

training steps per level with an initial energy budget of 200 J. The architecture employed a three-layer neural network with 256 units per layer, a batch size of 256, a learning rate of

3 \times 10^{- 4}

, and a discount factor

γ = 0.99

. Upon convergence, 50 independent MC simulations were performed at each level for evaluation. Success was defined as reaching the final waypoint without collisions or power depletion.

The quantitative results of the ablation study in Table 16 show that as environmental fidelity increases, EA-TD3 maintains operational resilience. The performance gap between EA-TD3 and the baseline frameworks exhibits a nonlinear expansion as physical constraints tighten.

During Level 1 and Level 2, characterized by simplistic configurations, all algorithms achieved success rates between 96% and 100% under idealized spatial or static energy models. This suggests that in predictable environments without dynamic disturbances, reward penalty mechanisms are sufficient for basic trajectory planning. Consequently, the necessity of an explicit energy perception mechanism is less prominent under these low-fidelity conditions.

However, a performance bifurcation point emerges at Level 3 with the introduction of non-stationary aerodynamic disturbances. Without the coupled perception between wind field correction factors and real-time SoC, the success rates of SAC, TD3, and DDPG decline to a range between 82% and 92%. In contrast, EA-TD3 maintains a success rate of 94% while reducing energy consumption by 13% by leveraging its adaptability to dynamic uncertainties.

Under the Level 4 operational envelope, the results define the survival red line for eVTOL UAV autonomous systems. DDPG shows a performance drop with a success rate of 62% and energy consumption of 46.61 J. Conversely, EA-TD3 maintains a 98% success rate despite the constraints of dynamic wind fields and multi-waypoint transitions, requiring 38.40 J on average. Compared to TD3, EA-TD3 improves the success rate by 23.3 percentage points and achieves a 9.9% gain in energy efficiency. These findings demonstrate that as operational environments shift toward high fidelity, energy awareness is a core safety driver ensuring the mission survivability of unmanned eVTOL UAV platforms in urban canyons.

Ablation experiments demonstrate that energy perception mechanisms are necessary in complex operational environments. Figure 16 illustrates the percentage advantage of the EA-TD3 algorithm over the DDPG, SAC, and TD3 models regarding the trajectory planning success rate and energy consumption under different environmental levels. In simplified scenarios such as Level 1 and Level 2, energy optimization managed through reward penalty mechanisms is sufficient for basic mission success. As shown in Figure 16a, the success rate advantage of EA-TD3 remains marginal at these stages as all algorithms achieve performance levels near 100%.

However, as environmental factors introduce non-stationary wind disturbances and increased spatial complexity in Level 3 and Level 4, explicit energy state observation becomes critical for mission survivability. The success rate advantage of EA-TD3 widens at Level 3, where it maintains a 94% success rate while the performance of other algorithms declines to between 82% and 92%. At Level 4, this performance gap expands further. EA-TD3 sustains a 98% success rate while the success rate for TD3 drops to 90%, SAC to 72%, and DDPG to 62%. EA-TD3 outperforms the least effective algorithm by 36 percentage points. These results confirm that for unmanned eVTOL UAV systems operating in high-fidelity environments, energy awareness is a requirement for maintaining the SFE.

This improvement in efficiency is accompanied by energy consumption patterns. Figure 16b illustrates the disparities in energy depletion across various complexity levels. At Level 1, the discrepancy in energy consumption is negligible, and this trend continues through Level 2 where the difference remains minimal. However, at Level 3 and Level 4, EA-TD3 achieves approximately 10% energy savings compared to the baseline algorithms. This margin represents the accumulated efficiency gains in complex multi-waypoint scenarios. The energy-saving advantage and the success rate of EA-TD3 exhibit a synchronized monotonic trend, which expands as the realism of the simulation environment increases.

The ablation experiments provide a foundation for integrating energy-aware reinforcement learning into energy-constrained eVTOL UAV autonomous trajectory planning frameworks. The empirical results demonstrate that energy awareness is more than a supplementary optimization function. Instead, it serves as a core mission capability for eVTOL UAV platforms operating in complex environments, with its significance increasing alongside system complexity. For unmanned eVTOL UAV systems navigating under realistic constraints such as variable dynamic wind fields, urban building clusters, and multi-task landmark sequences, the physical awareness trajectory planning paradigm provided by EA-TD3 is necessary. It ensures mission execution and energy safety management, providing technical support for reliable autonomous flight within future UAM networks.

5. Conclusions

To address the energy constraints and dynamic disturbances faced by eVTOL UAVs in UAM environments, this paper proposes and validates an energy-aware reinforcement learning framework named EA-TD3. This method utilizes a battery dataset from CMU to construct a nonlinear energy consumption model, enabling the system to perceive energy management differences across flight phases including climb, cruise, and descent. By coupling this model with stochastic low-altitude wind fields and employing an enhanced TD3 algorithm as the decision engine, the framework achieves autonomous trajectory planning while satisfying battery safety constraints. The proposed framework is validated using the technical parameters of the EH216-S eVTOL platform. The results show that physical perception is a prerequisite for autonomous airworthiness of eVTOL UAV platforms. Compared with baseline frameworks, the EA-TD3 algorithm improves energy efficiency and mission reliability in dynamically uncertain environments through autonomous trajectory planning optimization.

The contributions of this research are reflected in the following three aspects.

First, this study constructs a physically aware power management mechanism. Unlike traditional trajectory planning algorithms that simplify energy consumption as a linear ratio of displacement, this research establishes a mapping relationship between maneuvering actions and energy consumption by analyzing voltage and current fluctuations in battery charge–discharge cycles. The experimental results demonstrate that this model characterizes the high power consumption patterns of the eVTOL UAV during vertical maneuvers, achieving the optimization of autonomous trajectory planning from geometric paths to physically feasible trajectories.

Second, the proposed framework achieves robust trajectory planning strategies under energy constraints. By incorporating survival boundary logic with energy red line awareness into the reward function, the EA-TD3 algorithm optimizes flight behavior based on the real-time SoC. In stress tests where available energy drops to 120 J, EA-TD3 maintains an 87.8% mission success rate. This performance is higher than that of the traditional TD3 at 26.5% and SAC at 6.1%, providing quantitative support for addressing operational endurance challenges in urban air mobility scenarios.

Third, the research validates mission efficiency and statistical stability under dynamic environmental disturbances. In ablation experiments involving dynamic wind fields and 3D building obstacles, EA-TD3 demonstrates energy efficiency and environmental adaptability. Compared to the benchmark algorithm, EA-TD3 achieves energy savings between 9.9% and 11.6% in complex scenarios while improving the mission success rate by 23.3 percentage points. Furthermore, statistical consistency validation reveals a narrow energy consumption distribution bandwidth. This degree of decision determinism provides a reference paradigm for optimizing control strategies for pilotless platforms such as the EH216-S in medical emergency and logistics scenarios.

Author Contributions

Conceptualization and visualization, J.C.; Methodology, J.C. and X.L.; Validation, J.C., Z.W. and L.Z.; Formal analysis, data processing and drafting, J.C. and J.X.; Survey research, J.C. and Z.W.; Resource provision, L.Z.; Review and editing and supervision guidance, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We are grateful to our colleagues for their valuable feedback and constructive discussions that enriched this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

eVTOL UAV	Electric Vertical Takeoff and Landing Unmanned Aerial Vehicle
EA-TD3	Energy-Aware Twin Delayed Deep Deterministic Policy Gradient
TD3	Twin Delayed Deep Deterministic Policy Gradient
DDPG	Deep Deterministic Policy Gradient
SAC	Soft Actor-Critic
DRL	Deep Reinforcement Learning
MDP	Markov Decision Process
RRT	Rapidly Exploring Random Trees
UAM	Urban Air Mobility
CFD	Computational Fluid Dynamics
CMU	Carnegie Mellon University
SOC	State of Charge
SFE	Safe Flight Envelope
MSE	Mean Squared Error
IQR	Interquartile Range
MC	Monte Carlo
WP	Waypoints
V	Voltage
I	Current
T	Temperature

References

Hassanalian, M.; Abdelkefi, A. Classifications, applications, and design challenges of drones: A review. Prog. Aerosp. Sci. 2017, 91, 99–131. [Google Scholar] [CrossRef]
Moradi, N.; Wang, C.; Mafakheri, F. Urban air mobility for last-mile transportation: A review. Vehicles 2024, 6, 1383–1414. [Google Scholar] [CrossRef]
Lozano Tafur, C.; Orduy Rodríguez, J.; Aldana Rodríguez, D.; Traslaviña, D.S.; Fernández Valencia, S.; Celis Ardila, F.H. Risk-Based Design of Urban UAS Corridors. Drones 2025, 9, 815. [Google Scholar] [CrossRef]
Li, Y.; Guo, T.; Chen, J.; Wu, J.; Zhang, Y.; Alam, S.; Cai, K.; Du, W. Urban air mobility: A review and challenges. IEEE Intell. Transp. Syst. Mag. 2024, 17, 67–87. [Google Scholar] [CrossRef]
Milcsik, C.J.; Johnson, E.N.; Khamvilai, T. Urban Aircraft Path Planning in Wind Fields. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025; p. 2427. [Google Scholar] [CrossRef]
Baskar, D.; Gorodetsky, A. A simulated wind-field dataset for testing energy efficient path-planning algorithms for UAVs in urban environment. In Proceedings of the AIAA Aviation 2020 Forum, Virtual, 15–19 June 2020; p. 2920. [Google Scholar] [CrossRef]
Jiang, S.; Wang, J.; Li, C.; Ou, J.; Duan, P.; Li, L. Identification of no-fly zones for delivery drone path planning in various urban wind environments. Phys. Fluids 2024, 36, 085166. [Google Scholar] [CrossRef]
Chan, Y.; Ng, K.K.; Lee, C.; Hsu, L.T.; Keung, K. Wind dynamic and energy-efficiency path planning for unmanned aerial vehicles in the lower-level airspace and urban air mobility context. Sustain. Energy Technol. Assess. 2023, 57, 103202. [Google Scholar] [CrossRef]
Frey, J.; Rienecker, H.; Schubert, S.; Hildebrand, V.; Pfifer, H. Wind tunnel measurement of the urban wind field for flight path planning of unmanned aerial vehicles. In Proceedings of the AIAA SCITECH 2024 Forum, Orlando, FL, USA, 8–12 January 2024; p. 2510. [Google Scholar]
Rienecker, H.; Hildebrand, V.; Pfifer, H. Energy optimal 3D flight path planning for unmanned aerial vehicle in urban environments. CEAS Aeronaut. J. 2023, 14, 621–636. [Google Scholar] [CrossRef]
Tian, P.; Chao, H.; Rhudy, M.; Gross, J.; Wu, H. Wind sensing and estimation using small fixed-wing unmanned aerial vehicles: A survey. J. Aerosp. Inf. Syst. 2021, 18, 132–143. [Google Scholar] [CrossRef]
Marzougui, T.; Saenz, E.S.; Bareille, M. A rule-based energy management strategy for hybrid powered eVTOL. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2023; Volume 2526, p. 012024. [Google Scholar]
Senkans, E.; Skuhersky, M.; Kish, B.; Wilde, M. A first-principle power and energy model for eVTOL vehicles. In Proceedings of the AIAA Aviation 2021 Forum, Virtual, 2–6 August 2021; p. 3169. [Google Scholar] [CrossRef]
Jiao, Q.; Liu, Y.; Zheng, Z.; Sun, L.; Bai, Y.; Zhang, Z.; Sun, L.; Ren, G.; Zhou, G.; Chen, X.; et al. Ground risk assessment for unmanned aircraft systems based on dynamic model. Drones 2022, 6, 324. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, S.; Ni, X.; Li, X. System dynamics analysis of development risks in emerging eVTOL aircraft. Expert Syst. Appl. 2025, 300, 130363. [Google Scholar] [CrossRef]
Jastrzębska, A.; Łągiewka, Z.; Sieczka, P.; Zalewski, J. A Survey on Algorithms Used for Drone Energy Consumption Modelling. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing; Springer: Cham, Switzerland, 2025; pp. 38–49. [Google Scholar]
Xu, J.; Guan, C.; Wang, Y.; Zhuang, J.; Gan, W. A Systematic Review of Urban Air Mobility Development: EVTOL Drones’ Technological Challenges and Low-Altitude Policies of Shenzhen. Drones 2025, 9, 842. [Google Scholar] [CrossRef]
Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
Xu, T. Recent advances in Rapidly-exploring random tree: A review. Heliyon 2024, 10, e32451. [Google Scholar] [CrossRef]
Fagundes-Junior, L.A.; de Carvalho, K.B.; Ferreira, R.S.; Brandão, A.S. Machine learning for unmanned aerial vehicles navigation: An overview. SN Comput. Sci. 2024, 5, 256. [Google Scholar] [CrossRef]
Primatesta, S.; Guglieri, G.; Rizzo, A. A Risk-Aware Path Planning Strategy for UAVs in Urban Environments. J. Intell. Robot. Syst. 2019, 95, 629–643. [Google Scholar] [CrossRef]
Gao, L.; Ding, J.; Liu, W. A vision-based irregular obstacle avoidance framework via deep reinforcement learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
Fu, H.; Li, Z.; Zhang, W.; Feng, Y.; Zhu, L.; Long, Y.; Li, J. Path Planning for Agricultural UAVs Based on Deep Reinforcement Learning and Energy Consumption Constraints. Agriculture 2025, 15, 943. [Google Scholar] [CrossRef]
Chen, J.; Zhou, J.; Wu, D.; Jiang, H. A USV Path Planning Algorithm under Special Environment Based on TD3-RRT. J. Syst. Simul. 2025, 37, 2888–2903. [Google Scholar] [CrossRef]
Lv, H.; Chen, Y.; Li, S.; Zhu, B.; Li, M. Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy. Meas. Sci. Technol. 2024, 35, 056206. [Google Scholar] [CrossRef]
Xie, Y.; Ma, Y.; Cheng, Y.; Li, Z.; Liu, X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Appl. Sci. 2010, 15, 3446. [Google Scholar] [CrossRef]
Loquercio, A.; Kaufmann, E.; Ranftl, R.; Müller, M.; Koltun, V.; Scaramuzza, D. Learning high-speed flight in the wild. Sci. Robot. 2021, 6, eabg5810. [Google Scholar] [CrossRef]
Bills, A.; Sripad, S.; Fredericks, L.; Guttenberg, M.; Charles, D.; Frank, E.; Viswanathan, V. A battery dataset for electric vertical takeoff and landing aircraft. Sci. Data 2023, 10, 344. [Google Scholar] [CrossRef]
Phung, M.T.; Akhtar, M.S.; Yang, O.B. Machine learning approaches for assessing rechargeable battery state-of-charge in unmanned aircraft vehicle-eVTOL. J. Comput. Sci. 2024, 81, 102380. [Google Scholar] [CrossRef]
Debnath, S.K.; Omar, R.; Latip, N.B.A. A review on energy efficient path planning algorithms for unmanned air vehicles. In Computational Science and Technology, Proceedings of the 5th ICCST 2018, Kota, Kinabalu, Malaysia, 29–30 August 2018; Springer: Singapore, 2018; pp. 523–532. [Google Scholar]
Liu, S.; Li, S.; Li, H.; Li, W.; Tan, J. TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields. arXiv 2024, arXiv:2403.14877. [Google Scholar] [CrossRef]
Hong, D.; Lee, S.; Cho, Y.H.; Baek, D.; Kim, J.; Chang, N. Energy-efficient online path planning of multiple drones using reinforcement learning. IEEE Trans. Veh. Technol. 2021, 70, 9725–9740. [Google Scholar] [CrossRef]
Shani, G.; Heckerman, D.; Brafman, R.I. An MDP-based recommender system. J. Mach. Learn. Res. 2005, 6, 1265–1295. [Google Scholar]
Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2018; pp. 1587–1596. [Google Scholar]
Tan, H. Reinforcement learning with deep deterministic policy gradient. In Proceedings of the 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA); IEEE: Piscataway, NJ, USA, 2021; pp. 82–85. [Google Scholar]
Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar] [CrossRef]
Wu, J.; Wu, Q.J.; Chen, S.; Pourpanah, F.; Huang, D. A-TD3: An adaptive asynchronous twin delayed deep deterministic for continuous action spaces. IEEE Access 2022, 10, 128077–128089. [Google Scholar] [CrossRef]

Figure 1. EA-TD3 methodological framework.

Figure 2. EHang EH216-S model.

Figure 3. Simplified task profile.

Figure 4. VAH01 Tenth Cycle: Temporal evolution of voltage, current, and power during climb, cruise, and descent phases.

Figure 5. City modeling display. (a) City scene modeling diagram. (b) 2D modeling search direction. (c) 3D modeling search direction.

Figure 6. MDP methodological framework.

Figure 7. The logic of the EA-TD3 algorithm.

Figure 8. eVTOL UAV flight route.

Figure 9. Hyperparameter evaluation results. (a) Learning rate evaluation results. (b) Batch size evaluation results. (c) Network architecture evaluation results. (d) Target noise evaluation results.

Figure 10. Training process results across different algorithms: (a) Success rate; (b) Average reward; (c) Average Steps; (d) Energy consumption.

Figure 11. Comparison of path planning trajectory examples for four different algorithms.

Figure 12. Comparison of 3D trajectories and flight profiles for the four evaluated algorithms: (a) 3D trajectory visualization; (b) horizontal top view; (c) vertical side view; (d) multi-phase flight profile.

Figure 13. A comparison of the energy used to average the trajectory profiles of the four algorithms.

Figure 14. Comparison of trajectory planning results for the four algorithms under a 120 J energy limit.

Figure 15. Training process. (a) Energy expenditure displayed in normalized training progress. (b) Distribution differences in energy expenditure during training.

Figure 16. Advantages of EA-TD3 at different environmental levels. (a) Success rate advantage of EA-TD3. (b) Differences in energy consumption.

Table 1. Nomenclature of symbols and variables used in this study.

Symbol	Description	Unit/Remark
Energy Consumption Modeling based on CMU Data
$U (t), I (t)$	Terminal voltage and current derived from eVTOL UAV battery profiles	V, A
$S o C_{t}$	State of charge (SoC) reflecting the remaining energy at time t	%
$C_{r a t e d}$	Total rated capacity of the lithium ion battery system	Ah
$κ (a_{t})$	energy intensity factor defined as energy cost per unit ground displacement	J/m
$η (θ_{r e l})$	wind angle correction factor based on relative wind direction	–
Environmental Formulation and Wind Field
$W$	Three dimensional Euclidean workspace based on urban digital twin	m
$O_{i}$	Building obstacle identified in urban simulation	–
$P_{t}$	Instantaneous spatial coordinates $(x_{t}, y_{t}, z_{t})$ of the eVTOL UAV	m
$d_{s a f e}$	Safety buffer margin considering eVTOL UAV dimensions and rotor clearance	m
$v (z)$	Horizontal wind speed from atmospheric boundary layer (ABL) model	m/s
$α$	Wind shear exponent determining the vertical wind profile	–
$θ_{r e l}$	Relative angle between the eVTOL UAV heading and wind vector	rad
Reinforcement Learning and EA-TD3 Algorithm
$s_{t}, a_{t}$	State vector and continuous action vector including velocity and angular commands	–
$d_{t a r g e t}$	Normalized Euclidean distance to the target waypoint	–
$Δ θ, Δ ϕ$	Horizontal and vertical angular deviations relative to the target vector	rad
$d_{o b s}$	Minimum distance to obstacle boundaries within the trajectory planning range	m
$r_{t}$	Composite reward function balancing safety and energy and efficiency	–
$ω_{g, c, d, e}$	Weighting coefficients for goal reaching and collision and distance and energy	–
$r_{e n e r g y}$	Energy-aware penalty term sensitive to ground displacement and wind	–
$Q_{θ_{1, 2}}, π_{ϕ}$	Parameters of the twin critics and energy-aware actor networks	–
$τ$	Soft update coefficient for target network parameters	–

Table 2. Basic parameters of the EHang EH216-S.

Parameter Name	Value	Physical Constraint Role
Fuselage Height	1.93 m	Collision detection boundary
Fuselage Width	5.73 m	Minimum passage clearance
Max Takeoff Weight	620 kg	Initial dynamic mass
Maximum Range	30 km	Mission planning radius
Max Design Speed	130 km/h	Action space velocity upper bound

Table 3. Experimental conditions and corresponding CMU subdatasets [29].

Experimental Conditions	Charge/Discharge Settings	Subdatasets
Baseline	1 C, 4.2 V	VAH01, VAH17, VAH27
Constant Current Charge	0.5 C, 4.2 V	VAH06, VAH24
Constant Current Charge	1.5 C, 4.2 V	VAH16, VAH20
Constant Voltage Charge	1 C, 4.0 V and 4.1 V	VAH07, VAH23
Extended Cruise	1000 s, 1 C, 4.2 V	VAH02, VAH15, VAH22
Short Cruise Length	400 s, 1 C, 4.2 V	VAH12
Short Cruise Length	600 s, 1 C, 4.2 V	VAH13, VAH26
Thermal Chamber Temperature	20 °C (1 C, 4.2 V)	VAH09, VAH25
	30 °C (1 C, 4.2 V)	VAH10
	35 °C (1 C, 4.2 V)	VAH30
Power Reduction during Discharge	10% reduction (1 C, 4.2 V)	VAH05, VAH28
Power Reduction during Discharge	20% reduction (1 C, 4.2 V)	VAH11

Note: All subdatasets are sourced from the CMU battery degradation database.

Table 4. Key physical parameters and operational constraints of the EHang EH216-S.

Parameter	Value	Parameter	Value
Max aircraft dimension $W_{a}$	5.73 m	Max flight height $H_{\max}$	120 m
Temperature T	20 °C	Min flight height $H_{\min}$	5 m
Battery efficiency $η$	85%	Objective weight $α_{1}$	0.4
Max steering angle $θ_{\max}$	$π / 2$	Objective weight $α_{2}$	0.6
Max climb angle $β_{\max}$	$π / 2$

Table 5. Data samples and structural attributes of the VAH01 baseline mission dataset.

Time	Voltage	Current	E-chg	C-chg	E-dis	C-dis	Temp.	Cycle
(s)	(V)	(mA)	(Wh)	(mAh)	(Wh)	(mAh)	(25.12 180 °C)	Index
0.00	4.195	13,245	0.000	0.000	0.000	0.000	25.12	1
1.00	4.082	13,250	0.000	0.000	0.015	3.680	25.15	1
2.00	4.075	13,248	0.000	0.000	0.030	7.361	25.18	1
3.00	4.071	13,252	0.000	0.000	0.045	11.042	25.21	1
4.00	4.068	13,250	0.000	0.000	0.060	14.723	25.25	1

Note E-chg and C-chg denote energy and capacity during charging while E-dis and C-dis denote the same during discharge phases, which are inputs for energy intensity

κ

modeling.

Table 6. Extracted and normalized energy intensity factors

κ

for different battery datasets and flight phases.

Table 6. Extracted and normalized energy intensity factors

κ

for different battery datasets and flight phases.

Battery ID	$κ_{climb}$	$κ_{cruise}$	$κ_{descent}$
VAH01	1.059	0.034	2.118
VAH17	1.000	0.020	1.993
VAH27	0.995	0.020	1.986
Average	1.018	0.025	2.032

Table 7. Parameters of building obstacles and waypoints in the simulation workspace.

Object ID	Position Range (m)	Dimensions (m)	Height (m)
$O_{1}$	$[2, 2, 0] \to [5, 5, 12]$	$3 \times 3 \times 12$	12
$O_{2}$	$[12, 2, 0] \to [15, 6, 8]$	$3 \times 4 \times 8$	8
$O_{3}$	$[2, 13, 0] \to [6, 17, 7]$	$4 \times 4 \times 7$	7
$O_{4}$	$[8, 8, 0] \to [11, 11, 15]$	$3 \times 3 \times 15$	15
$O_{5}$	$[14, 14, 0] \to [18, 18, 5]$	$4 \times 4 \times 5$	5
$O_{6}$	$[10, 1, 0] \to [11, 3, 10]$	$1 \times 2 \times 10$	10
$W_{1}$	$(6.0, 6.0, 12.0)$	—	12
$W_{2}$	$(12.0, 12.0, 12.0)$	—	12

Table 8. System mission parameters for the eVTOL UAV trajectory planning task.

Parameter	Symbol	Value
Start Position	$P_{0}$	$(0, 0, 0)$ m
Target Position	$P_{goal}$	$(18, 18, 0)$ m
Success Threshold	$ε$	$1.0$ m
Maximum Episode Steps	$T_{\max}$	200
Initial Energy Budget	$E_{0}$	100.0∼200.0
Environment Scale Factor	$λ_{L}$	1/6

Table 9. Parameters of the coupled wind field and atmospheric boundary layer model.

Symbol	Description	Value
$Δ_{grid}$	Spatial resolution of the wind field grid	1.0 m
s	Amplitude scaling factor for base wind	0.5
$σ$	Baseline stochastic noise magnitude	0.2 m/s
$δ$	Wind profile power law coefficient for high density urban terrain	0.35
$z_{r e f}$	Reference altitude for noise coupling	20.0 m
$α$	Wind sensitivity coefficient for energy modulation	0.5
Observation limit	Velocity truncation per component	$[- 2, 2]$ m/s

Table 10. Finalized algorithmic hyperparameters for EA-TD3 training.

Hyperparameter	Value
Learning Rate $α_{l r}$	$1 \times 10^{- 4}$
Batch Size	256
Network Architecture	$512 \times 256 \times 128$
Target Policy Noise $σ_{t p}$	0.1
Discount Factor $γ$	0.99
Buffer Capacity	$1 \times 10^{6}$
Exploration Noise $σ_{e x p}$	0.1

Table 11. Parameter configurations for the EA-TD3 model and simulation environment.

Category	Parameter Name	Symbol	Value
TD3 Hyperparameters	Learning Rate	$α_{l r}$	$1 \times 10^{- 4}$
	Batch Size	B	256
	Discount Factor	$γ$	0.99
	Soft Update Rate	$τ$	0.005
	Replay Buffer Size	$N_{b u f f}$	$1 \times 10^{6}$
Energy Parameters	Climb Cost Intensity	$κ_{climb}$	1.018
	Cruise Cost Intensity	$κ_{cruise}$	0.025
	Descent Cost Intensity	$κ_{descent}$	2.032
Reward Weights	Goal Reward Weight	$ω_{g}$	100.0
	Collision Penalty Weight	$ω_{c}$	100.0
	Distance Penalty Weight	$ω_{d}$	1.0
	Energy Penalty Weight	$ω_{e}$	0.5
Training Settings	Total Training Steps	-	200,000
	Evaluation Frequency	-	5000
	Evaluation Episodes	-	10

Table 12. Hyperparameter configurations for the comparative study of trajectory planning algorithms.

Hyperparameter	DDPG	TD3	SAC	EA-TD3
Learning rate $α_{l r}$	$1 \times 10^{- 4}$	$1 \times 10^{- 4}$	$3 \times 10^{- 4}$	$1 \times 10^{- 4}$
Total training steps	$6 \times 10^{4}$	$6 \times 10^{4}$	$6 \times 10^{4}$	$6 \times 10^{4}$
Batch size B	256	256	256	256
Update rate $τ$	0.005	0.005	0.005	0.005
Discount factor $γ$	0.99	0.99	0.99	0.99
Network architecture	$400 \times 300$	$400 \times 300$	$400 \times 300$	$512 \times 256 \times 128$
Policy noise $σ_{e x p}$	0.2	0.1	—	0.1
Delayed update d	—	2	—	2
Target policy noise $σ_{t p}$	—	0.2	—	0.1

Table 13. Performance comparison of trajectory planning algorithms under Analysis 1.

Metric	DDPG	TD3	SAC	EA-TD3	Best
Average Steps	34.50	26.94	29.52	27.48	TD3
Energy (J)	46.61	39.29	42.01	38.40	EA-TD3
Path Length (m)	39.67	37.10	35.75	34.81	EA-TD3
Turn Count	16.04	9.74	7.40	11.22	SAC
Planning Time (s)	0.03	0.02	0.03	0.02	TD3/EA-TD3
Climb Count	9.92	10.02	9.94	8.82	EA-TD3
Descent Count	8.34	8.66	9.08	7.72	EA-TD3
Cruise Count	16.24	8.26	10.50	10.94	TD3
Avg Altitude (m)	6.47	6.41	6.19	6.24	DDPG
Max Altitude (m)	9.49	10.23	9.97	9.25	TD3
Heading (°)	1234.63	443.89	433.06	496.98	SAC
Smoothness (°)	1321.28	497.15	486.78	512.04	SAC

Table 14. Success rates of trajectory planning algorithms under different energy constraints in Analysis 2.

Energy Constraint	DDPG	SAC	TD3	EA-TD3
200 J	85.70%	93.90%	96.95%	100.00%
180 J	71.40%	83.70%	90.93%	98.15%
160 J	63.30%	75.50%	85.90%	96.30%
140 J	61.20%	65.30%	78.95%	92.60%
120 J	0.00%	6.10%	26.50%	87.80%

Table 15. Environmental configuration across ablation study levels.

Level	Wind Field	Energy Model	Obstacles	Waypoints	Complexity
Level 1	Constant	Uniform	Simple (3)	Single	Baseline
Level 2	Constant	Realistic	Simple (3)	Single	Low
Level 3	Dynamic	Realistic	Simple (3)	Single	Medium
Level 4	Dynamic	Realistic	Complex (6)	Multiple	High

Table 16. Comparison of eVTOL UAV flight performance metrics across different levels.

Level	Algorithm	Success Rate	Avg. Steps	Avg. Energy (J)	Avg. Path (m)
Level 1	DDPG	100%	19.3	36.41	47.12
	SAC	100%	19.0	32.68	46.23
	TD3	100%	19.0	40.078	49.56
	EA-TD3	100%	20.0	36.47	45.58
Level 2	DDPG	98%	19.0	33.83	45.39
	SAC	96%	19.0	32.34	45.49
	TD3	96%	20.0	33.86	45.14
	EA-TD3	98%	19.0	35.29	45.55
Level 3	DDPG	82%	19.12	41.04	48.07
	SAC	90%	19.88	36.88	48.62
	TD3	92%	19.2	33.73	47.71
	EA-TD3	94%	19.0	32.88	45.79
Level 4	DDPG	62%	34.5	46.61	39.67
	SAC	72%	29.52	42.01	35.75
	TD3	90%	26.94	39.29	37.10
	EA-TD3	98%	27.48	38.40	34.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, J.; Xie, J.; Zhang, L.; Wang, Z.; Li, X.; Zhao, Y. EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones 2026, 10, 325. https://doi.org/10.3390/drones10050325

AMA Style

Cai J, Xie J, Zhang L, Wang Z, Li X, Zhao Y. EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones. 2026; 10(5):325. https://doi.org/10.3390/drones10050325

Chicago/Turabian Style

Cai, Jinxu, Juanzhang Xie, Lanxin Zhang, Ziyi Wang, Xueshun Li, and Yongjun Zhao. 2026. "EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft" Drones 10, no. 5: 325. https://doi.org/10.3390/drones10050325

APA Style

Cai, J., Xie, J., Zhang, L., Wang, Z., Li, X., & Zhao, Y. (2026). EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft. Drones, 10(5), 325. https://doi.org/10.3390/drones10050325

Article Menu

EA-TD3: An Energy-Aware Autonomous Trajectory Planning Method for Unmanned Electric Vertical Takeoff and Landing Aircraft

Highlights

Abstract

1. Introduction

2. Method

2.1. eVTOL Related Instructions

2.1.1. Determine the Research Subjects

2.1.2. Trajectory Profile Definition

2.1.3. Flight Constraints

2.2. Energy Consumption Modeling

2.2.1. Experimental Data Background

2.2.2. Power Demand Analysis

2.2.3. Modeling of Energy Intensity Factors

2.3. Environmental Formulation

2.3.1. Continuous 3D Workspace and Obstacle Modeling

2.3.2. Wind Field Model and Stochastic Disturbance

2.4. Reinforcement Learning Algorithms

2.4.1. MDP Formulation

2.4.2. Improved Energy-Aware TD3 (EA-TD3) Algorithm

2.5. Algorithm Introduction

3. Experimental Setup

3.1. Trajectory Planning Problem Description

3.2. Model Assumptions

3.3. Key Function Formulations

3.3.1. Establishment of Objective Function

3.3.2. Energy Boundary and Penalty Function

3.4. eVTOL UAV Parameter Settings

3.5. Energy Consumption Model Results

3.6. 3D Urban Workspace and Obstacle Modeling Results

3.7. Wind Field Modeling Results

3.8. Algorithmic Hyperparameters and Evaluation Metrics

4. Discussion

4.1. Training Results and Comparative Analysis

4.2. Analysis of Energy-Aware Trajectory Planning Mechanisms

4.2.1. Analysis 1: Maneuvering Behavior Analysis Based on Real Battery Dynamics

4.2.2. Analysis 2: Robustness Testing Under Safe Flight Envelope Constraints and Resource Scarcity

4.2.3. Analysis 3: Training Convergence Stability and Constraint Satisfaction

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI