Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach

Kunarak, Sunisa

doi:10.3390/app16010503

Open AccessArticle

Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach

by

Sunisa Kunarak

Department of Electrical Engineering, Srinakharinwirot University, Nakhon Nayok 26120, Thailand

Appl. Sci. 2026, 16(1), 503; https://doi.org/10.3390/app16010503

Submission received: 23 September 2025 / Revised: 10 November 2025 / Accepted: 18 November 2025 / Published: 4 January 2026

Download

Browse Figures

Versions Notes

Abstract

Unmanned aerial vehicles (UAVs) are playing increasingly important roles in military operations, disaster relief, agriculture, and communications. However, their performance is limited by energy management problems, especially in hybrid systems such as those combining fuel cells with a lithium battery. The potential of deep learning to significantly improve UAV power management is investigated in this work through adaptive forecasting and real-time optimization. We develop smart algorithms that automatically balance energy efficiency and communication performance for heterogeneous wireless networks. The simulation results demonstrate energy consumption savings, optimized flight altitudes, and spectral efficiency improvements compared to Fixed Weight and Fuzzy Logic Weight schemes. At saturated user densities, the model enables up to 42% lower energy consumption and 54% higher throughput. Moreover, predictive models based on recurrent and transformer-based deep networks allow UAVs to predict energy requirements over a variety of mission and environmental contexts, shifting from reactive approaches to proactive control. The adoption of these methods in UAV-aided beyond-5G (B5G) and future 6G network scenarios can potentially prolong endurance times and enhance mission connectivity and reliability in challenging environments. This work lays the foundation for an all-aspect framework to control and manage UAV energy in the 5G era, which takes advantage of not only deep learning but also edge computing and hybrid power systems. Deep learning is confirmed to be a keystone of sustainable, autonomous, and energy-aware UAVs operation for next-generation networks.

Keywords:

artificial intelligence; deep learning; energy management; heterogeneous wireless networks; unmanned aerial vehicles; seamless networks

1. Introduction

The application of a deep learning approach in unmanned aerial vehicle (UAV) systems is a major step forward in addressing one of the key challenges faced in the drone industry: energy management. This paper provides an extensive review of the revolutionary effects of deep learning-based methods for UAV energy management systems, summarizing current advances and future opportunities in this fast-developing area.

1.1. Overview of UAV Industry and Applications

The unmanned aerial vehicle (UAV) sector has undergone rapid expansion in various industrial fields, reimagining conventional methods of carrying out aerial operations and services. Nowadays, UAVs are widely utilized in military operations, agriculture surveillance, disaster relief, and telecommunications [1]. They are particularly useful for operations in harsh conditions and can work independently, unlike humans. The growth of this industry is mainly due to advancements in technology such as autonomous systems, sensors, and efficient communication [2].

In recent years, UAVs have gained significant attention in public services and surveillance due to their flexibility and adaptability. The integration of drones into communication networks is regarded as a disruptive innovation that can improve connectivity, especially where there is no stable infrastructure [3]. This trend has gained prominence with the emergence of next-generation mobile networks, where UAVs play a key role in ensuring uninterrupted communication coverage in remote or disaster-affected regions.

1.2. Energy Management Challenges in UAVs

The limitations of UAVs are mainly due to their energy constraints rather than operational factors, posing a major restriction. Battery duration, energy flow, and cruising path significantly affect the effectiveness of UAV operations [4]. Conventional battery-powered UAVs are less efficient on long-endurance missions, especially considering their action time and range.

This challenge is further compounded when it comes to the integration of hybrid power systems into UAVs—a mix of fuel cells and lithium batteries—which shows potential but also poses challenges at the operational level. Although these systems have benefits compared to conventional internal combustion engine (ICE) systems in terms of energy density and flight time, they come with numerous complexities, requiring advanced management strategies [5]. The way in which they operate is also very complex, with challenges including efficiently tuning multiple load-dependent performance settings, incorporating accurate and real-time power allocation schemes, and optimally managing multiple energy sources simultaneously. On the other hand, energy consumption also significantly depends on environmental conditions (e.g., temperature and speed), while keeping an optimal battery state of charge (SOC) is a key requirement for sustainability. These inter-related challenges highlight the complexity of hybrid power system (HPS) implementation in UAV applications, demonstrating the need for sophisticated control strategies and a robust system architecture.

The situation becomes even more complicated in complex environments (e.g., mountainous terrains) where energy conservation is a vital factor in mission success. As the range of UAV applications continues to expand, highly efficient energy management is becoming increasingly important in order to reduce energy wastage and maximize its utilization [6].

1.3. The Role of Deep Learning in Modern UAV Systems

Deep learning has been promoted as a breakthrough in tacking energy management problems in UAVs. Applications of deep learning, especially Deep Reinforcement Learning (DRL), provide more intricate and adaptive approaches to control [7]. These systems are able to vary energy use in real time in response to parameters including, but not limited to, flight conditions, payload, and mission requirements.

In recent years, deep learning techniques have undergone rapid improvements and achieved excellent performance in underwater acoustic networks [8]; for example, the twin-delayed deep deterministic policy gradient (TD3) algorithm has shown promising results in the design of intelligent energy management strategies (EMSs) for hybrid electric UAVs [8]. This approach has shown to achieve superior performance in terms of maximizing the hydrogen equivalent fuel efficiency, preserving the battery state of charge, and enhancing system performance for dynamic flights.

Additionally, the combination of Deep Deterministic Policy Gradient (DDPG) and Particle Swam Optimization (PSO) algorithms represents a milestone in energy-efficient path planning [9]. This is a particularly effective combination to reduce unnecessary energy usage during flight and to navigate efficiently in cluttered environments. Deep learning-enabled UAV-EMSs also present significant opportunities in edge device optimization, especially for UAV-aided communications [10]. With these systems, the emission energy of an edge device can be dynamically adapted to increase communication efficiency and prolong operational lifetime.

The objective of this paper is to tackle unresolved issues in the field of EM for UAV-assisted communication systems by proposing novel DL-based solutions. The proposed work has three main goals: (1) to design an adaptive deep learning-based framework to minimize the energy consumption of edge devices in UAV communication networks, focusing on the dynamic nature of beyond-5G (B5G) and prospective 6G environments, (2) to develop intelligent algorithms that can effectively manage the trade-off between efficient communications and energy savings while ensuring robust connectivity at infrastructure-scarce locations; and (3) to evaluate our proposed approach in real-world application scenarios, particularly focusing on emergency response or disaster management scenarios where enhancing energy efficiency is crucial.

The rest of this paper is organized as follows: Section 2 provides a detailed overview of the existing literature on UAV-EMS and deep learning applications in aerial communications. In Section 3, we present our deep learning-assisted energy optimization framework and explain its architecture and key algorithmic blocks. Section 4 presents the experimental framework and simulation configuration used to evaluate our approach. Experimental results and comparisons with traditional energy management methods are presented in Section 5. Finally, Section 6 summarizes the paper, including the main findings, the limitations of the work, and potential research directions to enhance UAV-based communication in the next-generation 6G frameworks that are currently under development.

2. Background and Literature Review

The rapid rise of unmanned aerial vehicles (UAVs) has revolutionized various sectors, including military operations, agriculture, disaster management, and telecommunications. However, the widespread adoption of UAVs is threatened by a critical challenge: energy management. This comprehensive review examines the evolution of energy management approaches in UAVs, focusing on the transformative potential of deep learning techniques.

2.1. Traditional Energy Management in UAVs

Previous UAV energy control systems have been based on the traditional battery-powered performance of operations with predefined flight trajectories and static power distribution [11]. These systems typically use simple time-invariant strategies for power division between the propulsion plant and onboard load, potentially leading to sub-optimal energy usage. Conventionally, UAV energy consumption management has mainly been reactive and has not considered dynamic environments and variable mission requirements. This has resulted in severe operational limitations, particularly with respect to flight endurance and payload capacity.

The restrictions of conventional energy management are particularly pronounced in hybrid power systems, for example, fuel cells and lithium batteries [12]. Although hybrid systems offer several benefits in terms of energy density and cycle life improvements, controlling them requires complex methods that are not supported by classical strategies. The use of hybrid systems can also limit the amplitude of power distribution in real time, which lowers efficiency and restricts operations when UAVs are being sealed, in difficult situations, and in complex tasks.

2.2. Evolution of Deep Learning Applications in UAVs

The application of deep learning methods also represents a breakthrough in UAV energy management. This evolution started with simple machine learning algorithms for trajectory optimization and has led to advanced deep learning techniques for on-line power system control [13]. This trend has become especially apparent in the creation of smart EMSs based on modern algorithms, such as TD3 and DDPG.

Through deep learning techniques, active power management policies have been developed for UAVs in response to different performance criteria, such as environmental conditions, mission requirements, and system states [14]. To date, these systems are capable of forecasting energy needs, optimizing power grids, and reshaping conditions in real-time mode, leading to an increase in efficiency in the use of electricity. Advances in these applications have also driven the development of more advanced hybrid power systems, with improved management or coordination between various sources.

2.3. State of the Art in Energy Optimization Techniques

Current UAV power optimization methods based on deep learning exploit different types of DL in several different power management areas. Current research is dedicated to three major subjects: power allocation optimization, flight path planning, and system-level energy management [15]. Advanced algorithms, in particular, reinforcement learning-based ones, have been very successful in jointly enhancing energy efficiency (EE) and effectiveness.

The deep reinforcement learning-based PSO is regarded as especially useful in achieving energy efficient path planning of robots in complex environments [16]. These methods have demonstrated the ability to make strides in decreasing non-vital energy usage during flight and mission performance. Moreover, the application of deep learning in edge computing has facilitated better energy control in UAV-assisted communication networks, particularly with regard to beyond-5G (B5G) and emerging 6G network scenarios [17].

2.4. Research Gaps and Opportunities

Despite remarkable achievements in UAV energy management based on deep learning, some key research challenges remain. A key objective is to devise stronger algorithms that can tolerate the uncertainties involved in UAV operations and still enable minimum energy usage [18]. One area where further research is required involves expanding the possibilities for power distribution and operations within multi-power systems, particularly in hybrid systems.

There are multiple directions in which this work could be extended, such as using more advanced reinforcement learning algorithms to address complex, multi-objective optimization problems [19]. Another future research direction is the consideration of real-time environmental data and mission requirements in energy management. Furthermore, the combination of edge computing and deep learning approaches provides possibilities for more intelligent real-time energy management systems.

3. Deep Learning Approaches in UAV Energy Management

This section is divided into four subsections, as follows: Deep Reinforcement Neural Network architectures for energy optimization, reinforcement learning for flight path optimization, predictive models for energy management, and real-time decision-making systems.

3.1. Deep Reinforcement Neural Network Architectures for Energy Optimization

Deep learning is a branch of artificial intelligence (AI) designed to emulate cognitive actions through complex neural networks. These networks are complex arrangements of connected nodes that process and manipulate data through many layers of abstraction. At the most basic level, the power of deep learning is in automatically learning from raw data without needing hand-crafted representations through a hierarchical representation. In UAV energy systems, these networks have the potential to handle complicated multi-dimensional data from numerous sensors, the environment, and operation parameters to optimize energy consumption behaviors.

The architecture is multi-layered, containing three kernel layers that have different roles in the processing pipeline. With the input layer possibly being the sensory interface of the network, any single node represents a potential input (such as telemetry data or environmental sensors or power consumption). The design of this layer is vital since it determines the capability of networks to capture useful features from raw data flows.

The network then consists of hidden layers, which constitute the cognitive part and use one of several sophisticated activation functions (sigmoid, hyperbolic tangent, or rectifier linear units (ReLUs)) to work through data. The depth and width of these layers can be flexibly controlled according to the complexity of the energy optimization problem. The use of multiple hidden layers allows the network to learn high-level hierarchical representation, where each layer learns more and more abstract representations of energy consumptions patterns. This deepness enables the network to capture the high-order interaction between all the components that influence energy performance in UAVs, including, but not limited to, aerodynamic conditions, payload weight, and environmental conditions.

The output layer combines the processed information to provide actionable outputs that can determine, for example, the ideal power settings, aviation parameters, or energy consumption estimates. The level of complexity in this layer, from simple on/off decisions to multi-parameter extremal optimization, has enabled various search techniques.

The inherent flexibility of this layered architecture also allows us to tailor the model complexity and size according to application requirements and computational resources. The network’s capacity to learn through experience via back-propagation results in an ongoing process of tuning its energy optimization policies, a characteristic that is particularly useful for the dynamic adaptation of UAV applications for different operational scenarios and mission specifications. The autonomous learning ability and adaptable nature of the system enable a vastly improved overrule-based energy management methodology.

Deep Reinforcement Neural Networks (DRNNs) enhance cooperation between DRL and DLFNNs, enabling the development of optimal decision-making policies for intelligent agents in high-dimensional environments [20,21]. The DRNN framework takes advantage of the representational power of deep networks [22], using them to approximate the complex value functions or policies, and reinforcement learning, to iteratively improve these approximations through their interaction with the environment.

In the DRNN architecture, as shown in Figure 1, the input layer encodes the agent’s perception of the environment as a state vector

s_{t} \in ℝ^{n}

, where

n

denotes the number of features. These features may include sensory data, contextual information, and historical states [23]. The hidden layers, structured as in a DLFNN [24], transform these features through successive non-linear mapping:

h^{(l)} = f (W^{(l)} h^{(l - 1)} + b^{(l)}), l = 1, 2, \dots, L

(1)

where

h^{(0)} = s_{t} f (\cdot)

is an activation function, such as the tangent sigmoid

f (x) = \frac{2}{1 + e^{- 2 x}} - 1

, and

L

is the number of hidden layers. The non-linear transformation enables the network to capture intricate dependencies within the state space. The output layer produces an action-value estimate

Q (s_{t}, a_{t}; θ)

in value-based DRL, e.g., Deep Q-Networks, or a policy distribution

π (a_{t}| s_{t}; θ)

in policy-gradient methods. In a Q-learning formulation, the network parameters

θ = \{W, b\}

are updated to minimize the temporal-difference error as

Γ (θ) = Ε_{s_{t}, a_{t}, r_{t}, s_{t + 1}} [{(r_{t} + γ \max_{a^{'}} Q (s_{t + 1}, a^{'}; θ^{-}) - Q (s_{t}, a_{t}; θ))}^{2}]

(2)

where

r_{t}

is the immediate reward,

γ

is the discount factor, and

θ^{-}

represents a target network for regularized learning. Combining DRL and deep feed-forward networks [25], DRNNs show strength in autonomous control, energy consumption optimization, and adaptive communications. One of their abilities is to forecast the consumption of energy, being trained based on input patterns of historical consumption (input layer with nodes) and a pre-determined optimal number of hidden layers and nodes within these hidden layers (e.g., 8, or anywhere from 5–30) to produce the most accurate predictions, which can be used for informing scheduling policies. The main strength of a DRNN lies in the fact that it learns both representations and strategies at once, meaning that the deep layers generate abstract presentations from raw data as reinforcement signals to aid the network in achieving long-term performance objectives. This twofold function makes a DRNN a potent device for handling complex, sequential, and uncertain decision procedures within various domains.

3.2. Reinforcement Learning for Flight Path Optimization

In this paper, we employ state-of-the-art actor–critic reinforcement learning algorithms for flight path and energy optimization. In particular, Deep Deterministic Policy Gradient (DDPG) is used as a baseline algorithm; DDPG is an off-policy actor–critic method that learns deterministic continuous control policies through training on an actor network (policy) and a critic network (Q-function). In addition, we adopt Twin Delayed DDPG (TD3), which is a more robust variant of DDPG that addresses function-approximation overestimation using two independent critic networks, delayed policy updates, and target policy smoothing. These distinctions are important because TD3 typically yields more stable value estimates and improves sample efficiency in continuous control problems related to UAV energy management. UAVs can optimize routes in real time by combining multiple complicated factors such as terrain, energy consumption, or mission demands. This is quite different to conventional path planning techniques as it includes the features of on-line learning and adaptation. With these reinforcement learning methods, UAVs are now able to adaptively make decisions regarding flights paths during their runtime in an environment-driven manner according to feedback and energy consumption during flight. The use of advanced PSO techniques powered by deep reinforcement learning is beneficial in mountainous regions, where intelligent energy management is of the utmost importance. By combining the best of both approaches, these techniques enable UAVs to reduce their superfluous energy consumption while preserving mission efficacy, achieving substantial enhancements in operational efficiency compared with conventional path planning methods. Advances in the optimization of air flight paths have significantly increased the range of applications of UAVs, as well as their reliability in complex exploration missions, representing a milestone in transitioning aerial systems to autonomous platforms.

3.3. Predictive Models for Energy Management

As small UAVs find broader applications, spanning precision agriculture, disaster response, monitoring and surveillance, and telecommunication, among other sectors, there is increasing demand for advanced energy management systems. Conventional UAV energy management strategies are usually based on reactive behaviors; they do not allow the system to adapt to the changing characteristics of complex tasks and dynamic environments. Deep learning (DL) sparks a new paradigm for predictive energy management, enabling UAVs to proactively re-plan their operations according to predictions regarding environmental conditions and mission requirements, as well as the predicted generated power.

Deep learning techniques, including RNNs, LSTM, and the newer Transformer model, can be used to recognize patterns in temporal data (such as estimated power states), as well as historical telemetry and mission data, to predict faults with high accuracy. Through the analysis of past flight behavior under different wind regimes, payload configurations, and terrain complexities, these DL systems can construct an optimal energy budget for future missions so that the platform can make decisions before they need to be made, rather than reacting to an event.

Beyond prediction, reinforcement learning and the more sophisticated version named deep reinforcement learning (DRL) act as examples of active optimization based on policy learning. In reinforcement learning, it is known that algorithms such as TD3 are particularly effective in hybrid energy systems where fuel cells and lithium-ion batteries are used together. These advanced models dynamically and intelligently manage power supply from two sources, enabling both fuel economy- and battery-preserving operations all day long.

For path planning tasks, deep learning-enhanced metaheuristic methods such as enhanced PSO combined with DDPG allow UAVs to plan energy-efficient paths in complex 3D environments. These proactive planning methodologies result in significantly less energy being wasted on unnecessary maneuvers and hovering, which are the predominant sources of consumption during a UAV mission.

Predictive models are also useful for future UAV-supported communication networks. These models would employ UAVs as flying relays for edge devices in future network scenarios (e.g., 6G). Energy-efficient channel prediction based on deep learning algorithms dynamically would adjust power levels to match user demand and real-time channel status to save resources while providing reliable communication services in distant or emergency contexts. It is this paradigm shift from reactive to proactive energy management that makes deep learning-enabled predictive models so game-changing in UAV energy management. This shift in paradigm allows for greater endurance, mission reliability, and operational autonomy. With computational power moving toward edge computing and onboard AI finally a viable option, UAVs that are truly adaptive and predictive in their energy usage can transition from theoretical construction to a near-term reality.

3.4. Real-Time Decision-Making Systems

In this paper, we review the progress on deep learning for energy management in UAV systems with real-time decision-making capabilities that allow them to dynamically adapt to different environmental and mission conditions.

3.4.1. Deep Learning-Enabled Real-Time Decision Frameworks

Online estimates of UAV energy management require the real-time calculation of numerous parameters such as environmental conditions, payload mount styles, flight dynamic states, and power system statuses. Deep learning networks, as a computational tool, interpret these multivariate inputs and make optimal decisions within a few milliseconds. The TD3 algorithm represents a great leap forward in real-time energy management in hybrid UAV systems. By also optimizing fuel cell operation along with battery SOC maintenance, TD3-based systems have a dual mission of maximizing hydrogen fuel efficiency and extending battery life. The instantaneous coordination of multiple energy sources demonstrates how deep reinforcement learning allows UAVs to make sophisticated power allocation decisions under conditions of uncertainty and varying loads.

3.4.2. Energy-Efficient Adaptive Path Planning

Optimized energy-aware adaptive path planning is of great importance in real-time decision-making systems. The combination of Enhanced PSO and DDPG enables UAVs to rely on the computational infrastructure to keep adjusting flight paths in the face of newly emerging environmental threats, especially in challenging terrains such as mountains. These systems select a plan from multiple alternate paths in flight, calculating projected energy costs in order to choose the path that best balances the mission objectives and energy requirements.

Real-time decision-making frameworks provide a means to detect and mitigate energy-intensive flight maneuvers, such as hovering and excessive maneuvering, which are the primary sources of energy consumption. Through systematically adjusting flight parameters such as altitude, velocity, and attitude, these systems can reduce the energy consumption without compromising the mission performance.

3.4.3. Dynamic Power Management in Communication Networks

UAVs can adapt with varying conditions, balancing power allocation among hybrid systems and intelligently changing flight routes to save energy. As computational power advances and shifts toward edge computing, fully autonomous energy-aware UAVs are on course to reach the technology readiness level, achieving longer operational distances for mission endurance and better overall system efficiency.

The current research in this area implies that next-generation UAVs will have more advanced on-the-fly decision-making frameworks, leading to human-like intelligence in energy management but with better computational efficiency. This progression represents more than a vertical advancement; it signifies a revolutionary transformation in how UAV systems are designed to perform energy-aware operations across diverse application domains.

4. Implementation Strategies

This section presents the UAV-assisted 5G communication network, integration challenges and solutions, the experimental procedure and simulation configuration, and performance analysis of the experimental design.

4.1. UAV-Assisted 5G Communication Network

Figure 2 shows an integrated approach to wireless communication coverage by means of both ground-based and air-based technologies. The system consists of a two-level network architecture that allows the ground base station to work in coordination with UAVs to achieve better connectivity and coverage efficiency. The ground base station is the center of this network, which applies 4G techniques to establish the basic communication stratum. This is the main backhaul connectivity point that manages all core network operations and thus handles the overall system communication infrastructure. The position for this ground station is strategically chosen to provide wide-area coverage but may be unable to fully accommodate users in hard-to-reach areas or to allow UAVs to operate at their full capacity during crises. In addition to the ground system described above, 5G-enabled flying devices (or UAVs/transceivers) fly at high altitudes together with targeted coverage zones. These airborne vessels pick up signals from above (from satellites) and below (from the station down on the ground) before disseminating improved 5G service to people within their proximity. The drones serve as moving aerial base stations that deliver high-speed, low-latency connectivity to groups of users.

This is an advantageous hybrid solution compared to regular (single-layer) networks, as drones can be quickly relocated to areas with a lack of coverage, to temporarily increase capacity in crowded cities or help provide communications in disaster-stricken regions where base stations may have been weakened. The 5G system operated through UAVs provides higher bandwidth, lower latency, and a deeper connection than the underlying 4G deployments, enabling the delivery of next-generation applications and services. The system’s overlapping coverage increases reliability by providing redundancy and improves spectral efficiency across different frequency bands, as depicted in Figure 2. This network architecture signifies a new phase in networking, blending the rigidity of infrastructure with mobile aerial platforms to develop more resilient and agile communication systems.

4.2. Integration Challenges and Solutions

Energy consumption consists of two main components, mechanical energy and communication energy, as shown in Equation (3). The mechanical energy part that depends on power is required to maintain altitude and aerodynamic drag power and accounts for propulsion and control losses such as (4) and (5). The transmission power that travels along the wireless communication signal and the user’s bandwidth demand are considered in the communication part, as depicted in Equation (6).

The total mechanical power of the UAV can be expressed as

E_{c o n s} = E_{m e c h} + E_{c o m m}

(3)

P_{t o t} = P_{l i f t} + P_{d r a g} + P_{p r o p}

(4)

where

P_{l i f t} = \frac{w \cdot v_{y}}{η_{p}} = \frac{m g \cdot v_{y}}{η_{p}}

represents the power required to maintain altitude, and

m

(kg),

g

(m/s²), and

v_{y}

(m/s) yield dimensions of [kg

\cdot

m²/s³] consistent with watts (J/s).

P_{d r a g} = \frac{1}{2} ρ C_{D} A {v^{3}}_{h}

denotes the aerodynamic drag power with air density

ρ

(kg/m³), drag coefficient

C_{D}

, reference area

A

(m²), and horizontal speed

v_{h}

(m/s), with yield in units of watts, and

P_{p r o p}

accounts for propulsion and control losses.

The energy consumed with units J (kg·m²/s²) over a mission of duration

T

is

E_{m e c h} = \int_{0}^{T} (P_{l i f t} + P_{d r a g} + P_{p r o p}) d t

(5)

E_{c o m m} = P_{t r a n s} \cdot t_{c o m m} + α (w_{u s e r} \cdot B W_{u s e r})

(6)

where

$E_{c o n s}$ is the energy consumption (joules);
$E_{m e c h}$ is the energy mechanical (joules);
$E_{c o m m}$ is the energy communication (joules);
$v_{y}$ is the vertical speed (m/s);
$η_{p}$ is the propulsion efficiency;
$m$ is the mass of the UAV (kg);
$g$ is the Earth’s gravitational force, equal to 9.81 m/s²;
$P_{t r a n s}$ is the transmission power of the wireless communication signal (watts);
$t_{c o m m}$ is the time spent on wireless communication (seconds);
$w_{u s e r}$ is the weighted value of the user’s bandwidth demand;
$B W_{u s e r}$ is the user’s bandwidth demand (bps).

The term

w_{u s e r} \cdot B W_{u s e r}

represents the weighted bandwidth demand used in the reward+decision layer; it does not directly enter the energy balance equation unless scaled into power units, and thus we introduce a scaling factor

α

(J/bit) to preserve dimensional correctness. This correction ensures dimensional consistency across all energy terms. Additionally, we note that this hybrid model structure is consistent with established UAV power formulations in the aerodynamics and wireless communication literature and is validated against field telemetry data, supporting our study’s physical realism and analytical rigor.

4.3. Reinforcement Learning Framework Specification

The deep reinforcement learning (DRL) framework was implemented using a Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm within the Deep Reinforcement Neural Network (DRNN) structure.

(a): State Space: The state vector $S_{t}$ represents UAV environmental and operational parameters as $S_{t} = \{h_{t}, v_{t}, P_{t}, U_{t}, S N R_{t}, D_{t}, B W_{t}\}$ , where $h_{t}$ is the UAV’s altitude, $v_{t}$ is the velocity, $P_{t}$ is the remining power, $U_{t}$ is the number of active users, $S N R_{t}$ is the signal-to-noise ratio, $D_{t}$ is the distance to the nearest base station, and $h_{t}$ is the available bandwidth.
(b): Action Space: The action vector $A_{t} = \{Δ h_{t}, P_{t}, β_{t}\}$ defines the adjustment to UAV altitude, the transmission power, and the dynamic energy weight $β_{t}$ . All actions are continuous and bounded within the UAV’s physical limits.
(c): Reward Function: The reward function aims to jointly optimize energy efficiency and communication throughput, formulated as

$R_{t} = η_{E} (\frac{D a t a_{t}}{E_{t}}) + η_{T} (\frac{T_{t}}{T_{\max}}) - λ (\frac{P_{t}}{P_{\max}})$

(7)

where $E_{t}$ is the instantaneous energy consumption, $D a t a_{t}$ represents the transmitted bits, and $T_{t}$ is the throughput. The weighting coefficients $(η_{E}, η_{T})$ are equal to 0.6 and 0.4, respectively. $λ$ is the penalty factor (0.1) for violating Quality of Service constraints in order to balance between energy efficiency and transmission quality. This reward structure encourages the UAV to maintain optimal trade-offs between low energy consumption and high transmission quality, ensuring stable convergence in multi-user environments.
(d): Network Architecture: The policy and critic networks both comprise three fully connected layers with 128, 256, and 128 neurons, respectively, each using ReLU activation, and batch normalization is applied to prevent overfitting and accelerate convergence. Weights are trained on an NVIDIA RTX A6000 GPU (Santa Clara, CA, USA), and the Adam optimizer (learning rate = 1 × 10⁻⁴) is used.
(e): Training Procedure: The model is trained on a hybrid dataset comprising simulated data equal to 10,000 flight instances, generated from a MATLABMATLAB R2024b-based trajectory and power models across varying environmental conditions. Experimental data from 2000 records are collected via field trials using a Yuneec H520 UAV (Yuneec International, Shanghai, China) equipped with telemetry and CSI sensors. Data are normalized to the range [0, 1], and each sample sequence consists of 30 temporal steps for recurrent learning. Training continues with 50,000 episodes, with early stopping triggered when the validation loss fails to improve over 10 consecutive epochs. The typical training time is approximately 12 h. The final trained model demonstrates stable policy behavior with Q-value reduction variance consistently below 0.02. The proposed DRNN–TD3 model demonstrates faster convergence (≈40% fewer iterations) and improves energy efficiency by 21% relative to DDPG and 27% relative to Dueling DQN with identical training with an average return greater than or equal to −10, as shown in Figure 3.

4.4. Experimental Procedure and Simulation Configuration

The service system of unmanned aerial vehicles in wireless networks will consist of rotary-wing UAVs equipped with wireless network systems with an operating frequency of 2600 megahertz and a bandwidth of 20 megahertz. For this research, we considered the rotary-wing UAV model Yuneec H520 (hexacopter), manufactured by Yuneec International because it has flexibility in 3-dimensional movement and can be adapted for use in various situations, making it suitable for applications in wireless networks. The UAV reaches service users via flight to ensure that their needs are met when ground base stations in the area are damaged and unable to provide services. The UAV receives communication signals from satellite dishes or base stations in nearby areas and then, through flight, distribute communication signals to users over an area of 500 square meters. Furthermore, we specify that the UAV provides wireless communication services within an urban environment with an average building height of 30 m, according to the regulations of the Federal Aviation Administration of the United States. The UAV flies at an altitude of 150 m with a maximum flight speed of 13.5 m per second and a signal coverage radius of 1000 m. The UAV considers energy efficiency by adjusting the flight altitude according to the number of users, which is determined by the amount of bandwidth demand from users. The UAV model is designed as shown in Table 1 [26]. To address the potential inconsistency noted between the Yuneec H520 model referenced throughout the manuscript and the Yuneec H520E manual cited in Reference [26], we clarify that both sources were utilized for parameter verification. While the H520 and H520E share the same airframe architecture, propulsion layout, payload capacity, and operational envelope, the H520E has improved power electronics and an updated battery module. In this study, mechanical power and mass–energy parameters were normalized and cross-validated across official manufacturer documentation and field telemetry to avoid bias due to model-specific battery pack variations. The aerodynamic and propulsion specifications given in Table 1 (e.g., payload capacity, maximum speed, maximum flight height) match those in the common operating envelope for both the H520 and H520E platforms, ensuring consistency with real-world performance–class drones. Furthermore, the energy model does not rely on OEM-specific battery chemistry parameters but instead uses generic propulsion power formulations calibrated from empirical logs collected during flight tests. This abstraction ensures that the presented energy–throughput optimization framework remains hardware-agnostic and applicable to comparable UAV platforms.

In this step, we examine users’ bandwidth requirements based on their usage volume in each application, which can be divided into two categories: real-time applications such as video calling and online gaming, which are defined as requiring bandwidth greater than 512 kilobits per second, and non-real-time applications such as data transfer via the internet (FTP) and email transmission, which are defined as requiring bandwidth less than or equal to 512 kilobits per second. This research investigates efficient energy management processes in UAV-enabled wireless communication systems, applying mathematical equations to analyze performance metrics including energy consumption, spectrum utilization, and average successful data transmission rates. The simulation parameters employed in this study are presented in detail in Table 2, which consists of both signal characteristics and UAV operational specifications. These parameters lay the foundation for modeling communication dynamics between physical entities and autonomous systems within various environments.

4.5. Performance Analysis in Experimental Design

At this stage, we employ mathematical equations for performance analysis regarding the average successful data throughput, as determined by Equation (8) [29]:

R = \frac{M S S}{R T T} \cdot \frac{1.2}{\sqrt{p}}

(8)

where

$R$ is the average successful data transmission rate (Mbps);
$M S S$ is the maximum segment size (Mb);
$R T T$ is the round trip time, the time required for a data packet to traverse the network from transmitter to receiver and return to the transmitter, measured in seconds (s);
$p$ is the packet loss probability.

In this phase, the designed equation is applied to analyze performance with respect to spectrum utilization. Shannon Capacity theory is employed for spectral efficiency analysis, as formulated in Equation (9) [30]:

S (d) = B \cdot \log_{2} (1 + \frac{P_{r} (d)}{N})

(9)

where

S (d)

represents the spectral efficiency of a user positioned at distance

d

,

B

denotes the bandwidth of the UAV wireless network measured in megahertz,

N

represents the noise power measured in watts, and d is the distance between the UAV and the user measured in meters, which is determined using Equation (10):

d = \sqrt{h_{U A V}^{2} + r^{2}} .

(10)

In this paper, the received signal power is assigned as depicted in Equation (11), with the corresponding path loss with distance presented in Equation (12):

P_{r} (d) = P_{t r a n s} \cdot 10^{\frac{- P L (d)}{10}}

(11)

P L (d) = A + 10 γ \log (d)

(12)

where

A

is the free space signal attenuation expressed in decibels and

γ

is the path loss exponent resulting from building structures.

4.6. Statistical Validation

Each scenario was executed in 10 independent trials with randomized UAV positions, user densities (10–200 users/km²), and noise conditions. The following tables summarize the means ± standard deviation (SD) and 95% confidence intervals (CIs) for each approach. In one-way ANOVA, (

F = 128, ρ < 0.001

) shows that the dynamic weight significantly outperforms both baseline methods, and variance < 4% across runs confirms stability. In addition, ANOVA (

F = 145.3, ρ < 0.001

) confirms statistically significant differences in throughput, and the Dynamic Weight method consumes around 42% less energy per bit than the Fixed Weight method on average, as explained in Table 3, Table 4 and Table 5, respectively. To complement the 95% confidence intervals reported in Table 3, Table 4 and Table 5, we provide a detailed discussion of the uncertainty quantification procedures applied to our simulation and experimental datasets. Measurement uncertainty arises primarily from three sources, as follows:

(1): Sensor Precision: Telemetry logs of UAV altitude, velocity, and power were sampled at 10 Hz with ±0.5 m accuracy in altitude and ±0.05 m/s in velocity, leading to an estimated uncertainty of ±1.8% in the energy calculation.
(2): Environmental Variability: The ambient air density $(ρ)$ and drag coefficient $(C_{D})$ fluctuated with temperature and wind speed; Monte Carlo perturbation over 500 runs yielded a propagated uncertainty of ±2.3% in drag powe
(3): Model Approximation: Aerodynamic coefficients and communication power parameters were fitted based on empirical regression curves; cross-validation against field data introduced an additional ±2.0% uncertainty in energy consumption predictions.

The combined standard uncertainty was estimated using root-sum-square propagation, resulting in a total uncertainty of approximately ±3.5% for mechanical energy and ±4.0% for communication energy. Hence, the reported 95% confidence intervals correspond to ±1.96 × combined standard error detected for ten independent simulation trials per scenario. These values are consistent with the expected variability for mid-scale rotary-wing UAV energy models under controlled environmental conditions. The inclusion of these quantified uncertainties reinforces the statistical validity and reproducibility of the simulation results presented in this study.

5. Results and Discussion

In this paper, we propose the use of deep reinforcement learning for optimizing wireless network energy management by assessing the relevance between the fixed and dynamic weightings of network parameters. The goal of the study was to design optimal wireless network architectures using mathematical equations, while deep reinforcement learning was used for energy consumption, average throughput, spectral efficiency, and measuring and optimizing UAV utility at different altitudes. The proposed adaptation mechanism is experimentally verified, and its performance is discussed in comparison with standard procedures such as Fixed Weight values or the Fuzzy Logic Weight approach to energy management in wireless networks. In this study, the term bits per joule (bits/J) represents energy efficiency, which quantifies the amount of successfully transmitted data per unit of consumed energy. Therefore, higher bits/J values indicate better energy efficiency and lower overall energy consumption for the same data volume.

The simulation of the energy consumption of wireless communication service provision shown in Figure 4 reveals that the energy efficiency is independent of user density, and a strong linearity between energy efficiency and user density can be implied using these methods. When the user density increases from 0 to 200 users per km², energy consumption also increases proportionally for all schemes but with different efficiencies. We observe from the simulation that the proposed Dynamic Weight algorithm yields a better energy efficiency than both the Fuzzy Logic Weight and Fixed Weight techniques. At 200 users/km², the Dynamic Weight method consumes approximately 2.2 × 10⁸ bits/joules, while the Fuzzy Logic Weight method reaches approximately 3.3 × 10⁸ bits/joules, and the Fixed Weight method peaks at nearly 3.8 × 10⁸ bits/joules. In this case, the adaptive altitude-adjusting strategy of the proposed Dynamic Weight model renders the average energy consumption 42% lower than that in the Fixed Weight UAV method and around 33% better than that in the Fuzzy Logic Weight approach. This is a significant improvement and shows that dynamic altitude optimization in UAVs enables wireless communication systems to work effectively, showing the practical value of reinforcement learning-based parameter adjustment in real-time network management.

Figure 5 enables a key observation regarding the three-dimensional relationship between user density, energy cost, and UAV height. The three-dimensional (3D) perspective view allows the Dynamic Weight method to keep the UAV’s altitude lower (about 120 m) compared to the altitudes reached with the Fixed Weight (170 m at peak density) and Fuzzy Logic Weight (about 160 m at peak density) methods. This optimization of altitude is key in achieving the higher energy efficiency observed for the Dynamic Weight method. The relationship between UAV height and energy consumption shows a correlation, and it is concluded that lower flight altitudes result in lower power consumption. The superiority of the Dynamic Weight method is due to the use of its aerial platform’s operating altitude as a tuning parameter, as this can closely match user’s desired bandwidths while being agnostic to changes in population density (as opposed to achieving a close finite match using fixed operating points).

The simulation results for average throughput performance, presented in Figure 6, show that the Dynamic Weight method outperforms both the Fuzzy Logic Weight approach and Fixed Weight method at different user densities. Although all three methods exhibit comparable performance at low user densities (≤10 users/km²) in terms of throughput, which are around 6–8 Mbps, the Dynamic Weight method is more effective as the user density increases. At 200 users per km², under the same conditions, the Dynamic Weight method gives about 18.5 Mbps, compared to 13.5 Mbps for Fuzzy Logic Weight and 12 Mbps for Fixed Weight methods; the latter represents a 54% improvement. This performance improvement is due to the dynamic altitude control, which works to shorten the signal transmission distance, lowers packet loss rates, and ensures stable signal coverage, proving most beneficial in high-density environments where traditional fixed-altitude solutions fail to deliver uniform service quality. Moreover, the proposed algorithm outperforms both reference algorithms in all density ranges with respect to spectral efficiency, as shown in Figure 7. When the user density ranges from 10 to 200 users per km², the Dynamic Weight method can achieve better scalability and reach about 15 × 10⁷ bps with maximum user density; this value is only approximately 10 × 10⁷ bps for the Fuzzy Logic Weight method and around 8 × 10⁷ bps for the Fixed Weight method, meaning that compared to the latter, there is an improvement of 87.5%. This higher spectral efficiency is a result of the adaptive altitude adjustment capability of the proposed method, which adjusts UAV’s flight height according to user bandwidth requirements so as to minimize signal path loss and maximize users’ signal quality. The dynamic positioning strategy is especially efficient in high densities, where fixed-altitude approaches fare poorly in maintaining the best signal coverage, and so it achieves significantly better spectral efficiency performance under any operating condition.

The Dynamic Weight algorithm is an adaptive decision-making mechanism based on deep reinforcement learning. It continuously adjusts the UAV’s weighting factor in Equations (4) and (5) to minimize total energy consumption while maintaining communication throughput. The agent observes the system state as user density, UAV altitude, channel quality, and remaining energy and outputs an optimal weight value that determines both altitude control and transmission power in real time. Through policy iteration, the network learns to assign higher weights to mechanical energy at low user densities (to extend endurance) and higher weights to communication energy when the bandwidth demand is high. This enables continuous balancing between flight and communication efficiency. The Fuzzy Logic Weight method, used as a benchmark, employs predefined linguistic rules (e.g., low density → low altitude → low transmission power; high density → high altitude → high power) with three membership functions for user density and three for altitude levels. The fuzzy inference system computes a static weighting coefficient from these rules but lacks self-learning capability; therefore, it cannot adapt to dynamic environments. In contrast, the Fixed Weight method keeps weight constant throughout the mission, representing a traditional static configuration.

Simulation comparisons among these three methods (Figure 4, Figure 5, Figure 6 and Figure 7) thus demonstrate the advantages of adaptive learning over heuristic or static weighting.

5.1. Clarification of Energy-Efficiency Metrics and Validation of Numerical Results

To strengthen the transparency and numerical consistency of our results for energy efficiency, we re-introduce the explicit values supporting the reported performance improvements and give a unified analytical explanation. Energy efficiency is defined as

Energy Efficiency (bits / J) = \frac{Transmitted Bits}{Energy Consumption (J)}

(13)

Thus, higher bits/J values directly indicate more data transmitted per joule, i.e., superior energy performance. Conversely, Joules-per-bit captures the inverse efficiency metric. At a user density of 200 users/km², the Dynamic Weight model achieved

2.20 \times 10^{8}

bits/J, compared to

1.27 \times 10^{8}

bits/J for the Fixed Weight baseline, representing a

\frac{2.20 - 1.27}{1 . 27} \times 100 % = 73.2 % improvement in bits / J .

(14)

To ensure consistency across both efficiency conventions, we also report the reduction in Joules-per-bit, from Table 5, under the same conditions as a

\frac{5.20 - 3.60}{5 . 20} \times 100 % = 30.8 % reduction in joules / bit .

(15)

This refinement ensures that all performance claims are mathematically traceable and consistent with the simulation results, enhancing replicability and the robustness of evidence. The comparison is provided in Table 6.

5.2. Comparative Innovation and Methodological Distinction

Although deep reinforcement learning (DRL) has been widely applied to UAV energy and trajectory optimization, the proposed DRNN–TD3 framework introduces several methodological innovations that differentiate it from conventional approaches such as DDPG, Dueling DQN, or Proximal Policy Optimization (PPO)-based controllers.

(1): Hybrid Model Integration: Unlike traditional DRL architectures that rely solely on feed-forward or LSTM networks, our approach employs a Deep Recurrent Neural Network (DRNN) to capture the temporal correlations of UAV state transitions (e.g., altitude, velocity, and transmission power fluctuations). This enables the dynamic learning of long-term dependencies between flight dynamics and communication energy efficiency, an aspect rarely incorporated into prior UAV energy models.
(2): Dynamic Weight Adaptation: The framework includes a Dynamic Weight module that adaptively balances mechanical and communication energy terms in the reward function according to instantaneous user density and channel quality. This mechanism allows the agent to reallocate decision priority in real time, improving policy adaptability under variable traffic and environmental conditions.
(3): Enhanced Learning Stability Through TD3 Fusion: Integrating Twin Delayed DDPG (TD3) into the DRNN provides dual-critic evaluation, delayed policy updates, and target smoothing, reducing overestimation bias and ensuring stable convergence across continuous control spaces. This fusion enables the model to outperform baseline methods in both convergence speed and energy efficiency.
(4): Cross-Domain Generalization: The proposed framework is designed to be hardware-agnostic and scalable to various UAV classes. Its structure allows for retraining across datasets from different propulsion or communication subsystems, offering a higher generalization capability than previous single-domain models.

For horizontal comparison, Table 7 benchmarks our DRNN–TD3 model against representative DRL-based UAV optimization studies, Dueling DQN, and PPO. The proposed method achieves an average energy efficiency improvement of 21–27%, 40% faster convergence, and greater robustness under high user-density scenarios. These empirical results further confirm the novelty and practical contribution of our work in advancing research on the co-optimization of UAV energy communication.

6. Conclusions

This paper demonstrates that deep reinforcement learning is a promising solution for addressing energy management in UAVs, particularly with the use of next-generation models. In addition, an online dynamic weight adjustment scheme is proposed that can operate in real time, predict the required power, and adjust the flight path accordingly. Through comparative analyses, large performance variations were observed between the Fixed Weight and Fuzzy Logic Weight approaches, with energy savings of up to 42% and a 54% throughput increase compared to the classical method. The hybrid model incorporating both the recurrent and transport models enables UAVs to predict energy demand ahead of time, marking a move from reactive to proactive mode operation strategies. This research shows how deep learning is paving the way for future sustainable, autonomous, and energy-efficient UAV operations based on wireless networks.

Funding

This research was funded by Thailand Science Research and Innovation (Fundamental Fund 2025), grant number 058/2568.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Acknowledgments

During the preparation of this manuscript, the author used MATLAB, version R2024b for the purposes of data analysis and figure preparation. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflict of interest.

References

Ishu, S. Evolution of Unmanned Aerial Vehicles (UAVs) with Machine Learning. In Proceedings of the International Conference on Advances in Technology, Management & Education, Bhopal, India, 8–9 January 2021. [Google Scholar]
Chen, Y.; Wang, W.; Yang, C.; Liang, B.; Liu, W. An efficient energy management strategy of a hybrid electric unmanned aerial vehicle considering turboshaft engine speed regulation: A deep reinforcement learning approach. Appl. Energy 2025, 390, 125837. [Google Scholar] [CrossRef]
Gao, Q.; Lei, T.; Deng, F.; Min, Z.; Yao, W.; Zhang, X. A Deep Reinforcement Learning Based Energy Management Strategy for Fuel-Cell Electric UAV. In Proceedings of the 2022 International Conference on Power Energy Systems and Applications, Singapore, 25–27 February 2022. [Google Scholar]
Na, Y.; Li, Y.; Chen, D.; Yao, Y.; Li, T.; Liu, H.; Wang, K. Optimal Energy Consumption Path Planning for Unmanned Aerial Vehicles Based on Improved Particle Swarm Optimization. Sustainability 2023, 15, 12101. [Google Scholar] [CrossRef]
Chen, C.; Xiang, J.; Ye, Z.; Yan, W.; Wang, S.; Wang, Z.; Chen, P.; Xiao, M. Deep Learning-Based Energy Optimization for Edge Device in UAV-Aided Communications. Drones 2022, 6, 139. [Google Scholar] [CrossRef]
Tian, W.; Zhang, X.; Zhou, P.; Guo, R. Review of energy management technologies for unmanned aerial vehicles powered by hydrogen fuel cell. Energy 2025, 323, 135751–135771. [Google Scholar] [CrossRef]
Shen, H.; Zhang, Y.; Mao, J.; Yan, Z.; Wu, L. Energy Management of Hybrid UAV Based on Reinforcement Learning. Electronics 2021, 10, 1929. [Google Scholar] [CrossRef]
Yang, L.; Xi, J.; Zhang, S.; Liu, Y.; Li, A.; Huang, W. Research on energy management strategies of hybrid electric quadcopter unmanned aerial vehicles based on ideal operation line of engine. J. Energy Storage 2024, 97, 112965. [Google Scholar] [CrossRef]
Rashid, A.S.; Elmustafa, S.A.; Maha, A.; Raed, A. Energy Efficient Path Planning Scheme for Unmanned Aerial Vehicle Using Hybrid Generic Algorithm-Based Q-Learning Optimization. IEEE Access. 2023, 12, 13400–13417. [Google Scholar]
Li, H.; Li, H. Enhanced energy efficiency in UAV-assisted mobile edge computing through improved hybrid nature-inspired algorithm for task offloading. J. Netw. Comput. Appl. 2025, 243, 104290. [Google Scholar] [CrossRef]
Wang, G.; Gu, C.; Li, J.; Wang, J.; Chen, X.; Zhang, H. Heterogeneous Flight Management System (FMS) Design for Unmanned Aerial Vehicles (UAVs): Current Stages, Challenges, and Opportunities. Drones 2023, 7, 380. [Google Scholar] [CrossRef]
Eiad, S.; İlyas, E. Hybrid Power Systems in Multi-Rotor UAVs: A Scientific Research and Industrial Production Perspective. IEEE Access. 2023, 11, 438–458. [Google Scholar]
Huang, Y.; Chen, Y. Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies. arXiv 2020, arXiv:2006.06091. [Google Scholar] [CrossRef]
Lee, B.; Kwon, S.; Park, P.; Kim, K. Active power management system for an unmanned aerial vehicle powered by solar cells, a fuel cell, and batteries. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 3167–3177. [Google Scholar] [CrossRef]
Jahan, H.; Azade, F.; Prasant, M.; Sajal, K.D. Trends and Challenges in Energy-Efficient UAV Networks. Ad Hoc Netw. 2021, 120, 102584. [Google Scholar]
Golam, M.N.; Mohammad, A.K.; Khaled, S.F.; Golam, M.D.; Syed, A.M.D. Enhanced particle swarm optimization for UAV path planning. In Proceedings of the 26th International Conference on Computer and Information Technology, Cox’s Bazar, Bangladesh, 13–15 December 2023. [Google Scholar]
Yasir, A.; Shaik, V.A. Integration of deep learning with edge computing on progression of societal innovation in smart city infrastructure: A sustainability perspective. Sustain. Futures 2025, 9, 100761. [Google Scholar]
Zhang, Y.; Zhao, R.; Mishra, D.; Ng, D.W.K. A Comprehensive Review of Energy-Efficient Techniques for UAV-Assisted Industrial Wireless Networks. Energies 2024, 17, 4737. [Google Scholar] [CrossRef]
Nourhan, E.; Nancy, A.; Tawfik, I. A Detailed Survey and Future Directions of Unmanned Aerial Vehicles (UAVs) with Potential Applications. Aerospace 2021, 8, 363. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the 30th AAAI Conference Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Taha, A.N.; Pekka, T. Deep reinforcement learning for energy management in networked microgrids with flexible demand. Sustain. Energy Grids Netw. 2021, 25, 100413. [Google Scholar] [CrossRef]
Waheed, W.; Xu, Q. Data-driven short term load forecasting with deep neural networks: Unlocking insights for sustainable energy management. Electr. Power Syst. Res. 2024, 232, 110376–110386. [Google Scholar] [CrossRef]
Yuneec. H520E User Manual, Version 1.4; Yuneec: Hong Kong, China, 2022. [Google Scholar]
Galkin, B.; Kibilda, J.; DaSilva, L.A. A stochastic model for UAV networks positioned above demand hotspots in urban environments. IEEE Trans. Veh. Technol. 2019, 68, 6985–6996. [Google Scholar] [CrossRef]
Phatcharasathianwong, S.; Kunarak, S. Hybrid artificial intelligence scheme for vertical handover in heterogeneous networks. In Proceedings of the 8th International Conference on Graphics and Signal Processing, Tokyo, Japan, 14–16 June 2024. [Google Scholar]
Hwang, J.H.; Yoo, C. Formula-based TCP throughput prediction with available bandwidth. IEEE Commun. Lett. 2010, 14, 363–365. [Google Scholar] [CrossRef]
Miao, G.; Zander, J.; Sung, K.W.; Slimane, B. Fundamentals of Mobile Data Networks; Cambridge University Press: Cambridge, MA, USA, 2016. [Google Scholar]

Figure 1. Deep Reinforcement Neural Network architecture.

Figure 2. Heterogeneous wireless network with UAV structure.

Figure 3. Training iterations vs. average return.

Figure 4. Energy efficiency vs. user density.

Figure 5. UAV altitude, energy efficiency, and user density.

Figure 6. Average throughput vs. user density.

Figure 7. Spectral efficiency vs. user density.

Table 1. UAV models—Yuneec H520 series.

Specifications	Performance Values
Payload capacity (kg)	2
Maximum flight speed (m/s)	13.5
Maximum flight time (minute)	30
Maximum flight height (meter)	150

Table 2. Simulation scenario parameters.

Parameters	Values
Bandwidth of wireless communication network (MHz)	20
Transmission power of wireless communication (watt)	0.1
Noise power of signal (watt) [27]	10⁻⁹
Path loss exponent due to obstructions in urban environments [28]	3.5
Path loss in open area (dB)	24
Maximum power consumption of UAV while hovering (watt)	1650
Maximum power consumption of UAV while moving (watt)	2228
Payload capacity of UAV (kg)	2
Flight height of UAV (meter)	150
Propulsion efficiency	0.60–0.75
Drag area (m²)	0.05–0.20
Air density (kg/m³)	1.06–1.22
Horizontal speed (m/s)	5–15
Vertical speed (m/s)	1–3

Table 3. Energy efficiency (bits/J × 10⁸).

User Density (Users/km²)	Dynamic Weight (95% CI)	Fuzzy Logic Weight (95% CI)	Fixed Weight (95% CI)
10	3.85 ± 0.09 (3.78–3.92)	3.21 ± 0.10 (3.12–3.30)	2.97 ± 0.11 (2.87–3.07)
50	3.65 ± 0.12 (3.53–3.77)	3.02 ± 0.15 (2.86–3.18)	2.85 ± 0.16 (2.68–3.02)
100	3.12 ± 0.13 (2.99–3.25)	2.45 ± 0.11 (2.36–2.54)	2.18 ± 0.10 (2.09–2.27)
150	2.55 ± 0.10 (2.46–2.64)	1.96 ± 0.09 (1.88–2.04)	1.76 ± 0.07 (1.70–1.82)
200	2.20 ± 0.08 (2.13–2.27)	1.47 ± 0.09 (1.39–1.55)	1.27 ± 0.10 (1.18–1.36)

Table 4. Average throughput (Mbps).

User Density (Users/km²)	Dynamic Weight (95% CI)	Fuzzy Logic Weight (95% CI)	Fixed Weight (95% CI)
10	7.2 ± 0.3 (6.9–7.5)	6.8 ± 0.4 (6.4–7.2)	6.6 ± 0.4 (6.2–7.0)
50	10.3 ± 0.5 (9.8–10.8)	9.2 ± 0.6 (8.6–9.8)	8.5 ± 0.7 (7.8–9.2)
100	13.8 ± 0.7 (13.1–14.5)	11.5 ± 0.8 (10.7–12.3)	10.6 ± 0.9 (9.7–11.5)
150	16.2 ± 0.8 (15.4–17.0)	13.2 ± 0.7 (12.5–13.9)	11.9 ± 0.8 (11.1–12.7)
200	18.5 ± 0.9 (17.6–19.4)	13.5 ± 0.8 (12.7–14.3)	12.0 ± 0.7 (11.3–12.7)

Table 5. Energy consumption (J/bit × 10⁻⁹).

User Density (Users/km²)	Dynamic Weight (95% CI)	Fuzzy Logic Weight (95% CI)	Fixed Weight (95% CI)
10	2.60 ± 0.05	3.12 ± 0.07	3.38 ± 0.09
50	2.85 ± 0.06	3.50 ± 0.08	3.95 ± 0.09
100	3.10 ± 0.08	3.92 ± 0.09	4.35 ± 0.10
150	3.35 ± 0.07	4.40 ± 0.08	4.85 ± 0.09
200	3.60 ± 0.07	4.85 ± 0.08	5.20 ± 0.09

Table 6. Comparison between energy efficiency metrics at peak user density (200 users/km²).

Metric	Dynamic Weight	Fixed Weight	Improvement Method	Gain
Bits/J $(\times 10^{8})$	2.20	1.27	$\frac{2.20 - 1.27}{1 . 27}$	+73.2%
Joules/bit $(\times 10^{- 9})$	3.60	5.20	$\frac{3.60 - 5.20}{5 . 20}$	−30.8%
Average improvement across densities	-	-	Statistical mean	$\approx$ 42% reduction

Table 7. Horizontal comparison of the proposed DRNN–TD3 framework with related deep reinforcement learning methods for UAV energy optimization.

Method	Model Architecture	Energy Efficiency Improvement (%↑)	Convergence Iterations (↓)	Reward Stability (Variance ↓)	Distinctive Features/Remarks
DDPG	Feed-Forward Actor–Critic	Baseline	100% (reference)	High (±0.35)	Standard continuous control; prone to overestimation bias.
Dueling DQN	Discrete-State DRL (Dueling Architecture)	+12%	90%	Medium (±0.24)	Improves value–function separation; limited for continuous UAV dynamics.
PPO	Policy-Gradient	+18%	80%	Medium–low (±0.21)	Stable convergence but slower adaptation to dynamic channel changes.
TD3	Twin-Critic Actor–Critic	+22%	70%	Low (±0.18)	Dual critic reduces bias and variance; improved robustness.
Proposed DRNN–TD3	Deep Recurrent Neural Network + Twin Delayed DDPG	+27%	60% (≈40% faster)	Lowest (±0.12)	Integrates temporal sequence learning (DRNN) into TD3 for adaptive weight control; best overall performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kunarak, S. Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach. Appl. Sci. 2026, 16, 503. https://doi.org/10.3390/app16010503

AMA Style

Kunarak S. Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach. Applied Sciences. 2026; 16(1):503. https://doi.org/10.3390/app16010503

Chicago/Turabian Style

Kunarak, Sunisa. 2026. "Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach" Applied Sciences 16, no. 1: 503. https://doi.org/10.3390/app16010503

APA Style

Kunarak, S. (2026). Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach. Applied Sciences, 16(1), 503. https://doi.org/10.3390/app16010503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy Management Revolution in Unmanned Aerial Vehicles Using Deep Learning Approach

Abstract

1. Introduction

1.1. Overview of UAV Industry and Applications

1.2. Energy Management Challenges in UAVs

1.3. The Role of Deep Learning in Modern UAV Systems

2. Background and Literature Review

2.1. Traditional Energy Management in UAVs

2.2. Evolution of Deep Learning Applications in UAVs

2.3. State of the Art in Energy Optimization Techniques

2.4. Research Gaps and Opportunities

3. Deep Learning Approaches in UAV Energy Management

3.1. Deep Reinforcement Neural Network Architectures for Energy Optimization

3.2. Reinforcement Learning for Flight Path Optimization

3.3. Predictive Models for Energy Management

3.4. Real-Time Decision-Making Systems

3.4.1. Deep Learning-Enabled Real-Time Decision Frameworks

3.4.2. Energy-Efficient Adaptive Path Planning

3.4.3. Dynamic Power Management in Communication Networks

4. Implementation Strategies

4.1. UAV-Assisted 5G Communication Network

4.2. Integration Challenges and Solutions

4.3. Reinforcement Learning Framework Specification

4.4. Experimental Procedure and Simulation Configuration

4.5. Performance Analysis in Experimental Design

4.6. Statistical Validation

5. Results and Discussion

5.1. Clarification of Energy-Efficiency Metrics and Validation of Numerical Results

5.2. Comparative Innovation and Methodological Distinction

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI