Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network

Alnakhli, Mohammad Ahmed

doi:10.3390/drones10040275

Open AccessArticle

Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network

by

Mohammad Ahmed Alnakhli

Department of Electrical Engineering, College of Engineering in Wadi Addawasir, Prince Sattam Bin Abdulaziz University, Wadi Addawasir 11991, Saudi Arabia

Drones 2026, 10(4), 275; https://doi.org/10.3390/drones10040275

Submission received: 26 January 2026 / Revised: 13 March 2026 / Accepted: 17 March 2026 / Published: 10 April 2026

(This article belongs to the Section Drone Communications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A Dueling Deep Q-Network framework is proposed for joint optimization of transmit power allocation, link association, and topology control in multi-UAV IoT networks.
Graph-density–aware topology control improves the energy–latency trade-off while mitigating interference in cooperative UAV communication systems.

What are the implications of the main findings?

Adaptive topology control enables UAV networks to balance connectivity robustness and communication efficiency in dynamic environments.
The proposed hybrid learning–optimization framework supports scalable and energy-efficient aerial IoT architectures for future 6G communication systems.

Abstract

Multi-UAV communication networks face significant challenges in achieving high energy efficiency and low communication latency under dynamic topology and interference conditions. This paper proposes a Dueling Deep Q-Network (DQN) framework for joint resource optimization in 6G-enabled multi-UAV systems. The proposed approach jointly optimizes transmit power allocation, inter-UAV link association, and adaptive graph density within a unified reinforcement learning framework. By employing a dueling value–advantage decomposition, the proposed model improves learning stability and convergence compared to conventional DQN and Double DQN (DDQN) schemes. Simulation results under varying network densities and UAV scales show that the proposed Dueling DQN achieves up to 15% higher energy efficiency and 12% lower end-to-end latency, while maintaining robust performance in dense connectivity scenarios. These results demonstrate the effectiveness and scalability of the proposed framework for energy- and latency-sensitive UAV communication applications.

Keywords:

Multi-UAV networks; energy efficiency; latency optimization; graph density control; Dueling Deep Q-Network (Dueling DQN); reinforcement learning; IoT communication; network topology management

1. Introduction

The rapid expansion of wireless networks and the emergence of next-generation 6G systems have created a demand for highly flexible and intelligent communication infrastructures. Unmanned aerial vehicles (UAVs) have recently attracted significant attention as aerial communication nodes due to their rapid deployment capability, adaptive coverage, and enhanced line-of-sight (LoS) connectivity [1,2,3]. When multiple UAVs cooperate in a network, they can provide large-scale, on-demand coverage for heterogeneous users across wide geographic areas, supporting delay-sensitive and energy-constrained IoT applications [1,4]. However, the performance of such networks is constrained by the trade-offs among energy consumption, end-to-end latency, and inter-UAV connectivity, especially under dynamic channel conditions and co-channel interference [2,3].

Recent advances in the emerging low-altitude economy have further expanded the role of UAV networks beyond traditional aerial communication relays. For example, UAV-enabled Integrated Sensing and Communication (ISAC) systems allow aerial platforms to simultaneously perform wireless communication and environmental sensing, improving spectrum utilization and situational awareness. Joint trajectory and beamforming optimization for UAV-based ISAC has been investigated to enhance communication throughput while satisfying sensing requirements [5]. In addition, UAV-assisted Mobile Edge Computing (MEC) has emerged as a promising paradigm for supporting computation-intensive Internet-of-Things (IoT) applications, where adaptive multi-objective optimization frameworks can improve task completion efficiency, energy utilization, and system scalability in dynamic environments [6]. These emerging applications highlight the growing importance of efficient multi-UAV communication architectures capable of adaptive topology control, interference mitigation, and energy-aware resource allocation.

Traditional optimization frameworks, including convex relaxation, game-theoretic formulations, and heuristic power-control strategies, have been widely used to allocate communication resources such as transmit power and channel assignments for improved energy efficiency [1,2]. Despite their success, these approaches require accurate system models and centralized computation, making them unsuitable for large-scale, real-time multi-UAV environments.

Deep reinforcement learning (DRL) has recently emerged as a powerful tool for addressing high-dimensional and non-convex optimization problems in UAV communication networks. DRL frameworks such as Deep Q-Networks (DQN) and Double DQN (DDQN) have been successfully applied to adaptive power control, trajectory planning, and resource allocation without requiring explicit channel modeling [7,8,9,10,11,12]. However, most existing DRL-based solutions primarily focus on physical-layer parameters, such as transmit power or UAV mobility, while assuming fixed or implicitly defined network topologies.

From a network-level perspective, the inter-UAV graph structure plays a critical role in determining interference coupling, energy consumption, and connectivity robustness. Although several topology-control studies emphasize connectivity preservation and routing stability [13,14], they rarely integrate graph-level metrics with physical-layer resource optimization or end-to-end energy–latency objectives. In particular, graph density—which quantifies the fraction of active inter-UAV links—has a direct impact on both spectral efficiency and energy expenditure, yet remains largely unexplored in learning-based UAV optimization frameworks.

Moreover, standard DQN-based approaches often suffer from slow convergence and overestimation bias when applied to large and highly coupled action spaces, such as those arising from joint power allocation, association control, and topology adaptation. These limitations motivate the adoption of enhanced architectures, including dueling network decomposition, which separates state-value and action-advantage estimation to improve training stability and sample efficiency [15,16,17,18]. This observation further supports the need for a topology-aware and architecturally robust DRL framework for multi-UAV networks.

The interplay between network topology and physical-layer performance introduces a critical trade-off: a dense inter-UAV graph improves connectivity and reliability but increases interference and energy consumption, whereas a sparse topology reduces interference but risks network fragmentation and service disruption. Capturing this trade-off within a reinforcement learning framework requires careful reward design and constraint handling.

To address these challenges, this paper proposes a Dueling Deep Q-Network (Dueling DQN) framework for the joint optimization of transmit power, link association, and graph density in multi-UAV IoT networks. By decomposing the Q-value into a state-value and advantage component, the Dueling DQN improves learning stability, convergence speed, and policy generalization compared to conventional DQN and DDQN methods. Additionally, graph density is incorporated as a controllable parameter, allowing the UAV network to adapt its topology to balance energy efficiency, latency, and connectivity robustness. A reward normalization mechanism is applied to properly scale energy efficiency and latency metrics, addressing differences in units and ensuring meaningful joint optimization.

Unlike prior works that independently address power control, routing, or topology design in UAV networks, the proposed framework integrates these components into a unified reinforcement learning–assisted optimization architecture. The introduction of graph-density–aware topology control together with a two-timescale hybrid learning–analytical optimization distinguishes the proposed approach from existing UAV communication optimization methods.

The key contributions of this work are as follows. First, we formulate a novel joint optimization problem that integrates energy efficiency, latency, and graph density control, while considering realistic SINR, interference, and connectivity constraints. Second, we develop a Dueling DQN reinforcement learning framework capable of jointly learning optimal transmit power levels, link associations, and network topology configurations, improving convergence stability and decision robustness. Third, we investigate the impact of graph density on the energy–latency trade-off and demonstrate how adaptive topology regulation enhances connectivity and overall network performance. Finally, extensive simulations validate the proposed approach against baseline schemes, including conventional DQN, DDQN, heuristic, and equal power allocation strategies, demonstrating consistent improvements in energy efficiency, latency, and link reliability across diverse scenarios.

The remainder of this paper is organized as follows. Section 2 provides a review of related work. Section 3 describes the system model and problem formulation, followed by Section 4, which presents the proposed Dueling DQN-based optimization framework. Section 5 discusses simulation results and performance analysis, and Section 6 concludes the paper with future research directions.

2. Related Work

Energy and latency optimization in UAV-assisted communication systems has been extensively studied, particularly in the context of 6G and IoT networks. Many works focus on power or trajectory control, often neglecting topology-level parameters such as inter-UAV graph density, which directly influence connectivity robustness and interference dynamics.

Pervez et al. [1] formulated a joint communication–computation optimization problem for UAV-assisted MEC networks using iterative and water-filling methods. While demonstrating significant gains in energy and latency, the approach assumes a fixed UAV connectivity graph and centralized information.

Federated and edge learning techniques have been integrated into UAV resource management. Tang et al. [19] used federated learning and Lyapunov-based control to minimize energy and training latency, and Yuan et al. [20] proposed layered task offloading to reduce system energy. However, both methods rely on static inter-UAV links and do not incorporate adaptive topology control.

Multi-agent reinforcement learning (MARL) has been applied for distributed UAV coordination. Betalo et al. [21] employed a Multi-Agent DQN to enhance data freshness and energy harvesting, whereas Wang et al. [22] used DDQN for trajectory planning in obstacle-rich environments. Both studies, however, focus on single-dimensional objectives (scheduling or trajectory) and do not consider joint energy-latency-topology optimization.

Other approaches, including DRL combined with IRS or edge deployment strategies [17,23,24], improve energy efficiency or latency but lack explicit modeling of inter-UAV graph density or connectivity adaptation.

Unlike the above studies, which primarily optimize transmit power, trajectory, or scheduling under fixed connectivity assumptions, the proposed framework explicitly treats graph density as a controllable decision variable within the learning process. This enables true joint optimization of energy efficiency, end-to-end latency, and network topology under dynamic interference conditions. By integrating topology adaptation into the reinforcement learning architecture, the proposed approach captures the coupled relationship between connectivity robustness and physical-layer performance, which remains largely unexplored in existing UAV communication studies.

In summary, existing works either optimize a single performance metric or assume static network topologies. In contrast, the proposed Dueling DQN-based framework explicitly incorporates graph density as a controllable variable, enabling adaptive trade-offs between energy consumption, latency, and inter-UAV connectivity. To the best of our knowledge, this study is among the first to explicitly integrate graph-density adaptation with joint energy–latency optimization within a unified reinforcement learning framework.

3. System Model and Problem Formulation

We consider a multi-UAV communication network consisting of M UAVs deployed over a target area to provide cooperative aerial connectivity, as illustrated in Figure 1. Each UAV operates as an aerial communication node capable of establishing wireless links with neighboring UAVs, forming a dynamically reconfigurable network topology. The UAVs collectively support data transmission and control signaling for the underlying ground network, which may include IoT devices, users, or terrestrial base stations.

The inter-UAV network topology is modeled as an undirected graph

G = (V, E)

, where

V = {1, 2, \dots, M}

denotes the set of UAVs and E represents the set of active inter-UAV communication links [25]. The existence of a link between UAV i and UAV j is represented by a binary association variable [26]:

x_{i j} = \{\begin{matrix} 1, & if a communication link exists between UAV i and j, \\ 0, & otherwise . \end{matrix}

(1)

The overall connectivity of the UAV network is quantified using the graph density metric

δ

, defined as the ratio of active links to all possible UAV connections:

δ = \frac{2 | E |}{M (M - 1)} = \frac{2 \sum_{i, j} x_{i j}}{M (M - 1)} .

(2)

Graph density serves as a system-level control variable that captures the tradeoff between connectivity robustness and interference intensity. A higher density improves routing reliability and cooperation among UAVs, while a lower density reduces interference and energy consumption.

Each UAV i is located in three-dimensional space at coordinates

(x_{i}, y_{i}, z_{i})

. The Euclidean distance between UAV i and UAV j is given by

d_{i j} = \sqrt{{(d_{i j}^{H})}^{2} + {(d_{i j}^{V})}^{2}},

(3)

where

d_{i j}^{H}

and

d_{i j}^{V}

denote the horizontal and vertical separations, respectively. Assuming dominant line-of-sight (LoS) propagation, the path loss between UAV i and UAV j is modeled as [27]

P L_{i j} = P L (d_{0}) + 10 α {log}_{10} (\frac{d_{i j}}{d_{0}}),

(4)

For analytical clarity and controlled performance evaluation, the channel gain is modeled using a deterministic large-scale path-loss model. This modeling choice allows the impact of topology control, power allocation, and interference management to be clearly analyzed without additional variability introduced by fast fading. Nevertheless, the proposed reinforcement learning framework is not restricted to deterministic channels and can naturally accommodate stochastic channel variations (e.g., Rayleigh or Rician fading) by incorporating instantaneous channel realizations into the state representation.

The received power at UAV j from UAV i is

P_{i j}^{r} = P_{i j} - P L_{i j},

(5)

Note that although path loss and transmit power may be expressed in dB form for modeling convenience, all SINR and interference calculations in the simulations are performed using linear-scale power values after appropriate conversion.

The resulting signal-to-interference-plus-noise ratio (SINR) is expressed as

λ_{i j} = \frac{P_{i j}^{r}}{\sum_{k \neq i} x_{k j} P_{k j}^{r} + ρ_{0}},

(6)

Assuming dominant line-of-sight (LoS) propagation, the path loss between UAV i and UAV j is modeled as [27]

P L_{i j} = P L (d_{0}) + 10 α {log}_{10} (\frac{d_{i j}}{d_{0}}),

(7)

where

P L (d_{0})

is the reference path loss at distance

d_{0}

and

α

is the path-loss exponent.

The received power at UAV j from UAV i is

P_{i j}^{r} = P_{i j} - P L_{i j},

(8)

where

P_{i j}

is the transmit power allocated to link

(i, j)

. The resulting signal-to-interference-plus-noise ratio (SINR) is expressed as

λ_{i j} = \frac{P_{i j}^{r}}{\sum_{k \neq i} x_{k j} P_{k j}^{r} + ρ_{0}},

(9)

where

ρ_{0}

denotes the additive white Gaussian noise power. The achievable data rate for link

(i, j)

is

R_{i j} = B {log}_{2} (1 + λ_{i j}),

(10)

with B being the channel bandwidth. The energy efficiency (EE) of link

(i, j)

is defined as the successfully transmitted data per unit transmission energy:

η_{i j} = x_{i j} \frac{R_{i j}}{P_{i j}} .

(11)

The transmission latency for a packet of size S is given by

L_{i j} = \frac{S}{R_{i j}} + τ_{prop},

(12)

where

τ_{prop}

denotes the propagation delay.To ensure a meaningful joint optimization, both metrics are normalized as

{\tilde{η}}_{i j} = \frac{η_{i j}}{η_{max}}, {\tilde{L}}_{i j} = \frac{L_{i j}}{L_{max}},

(13)

where

η_{max}

and

L_{max}

represent reference maximum values. The joint optimization objective is formulated as a weighted, dimensionless utility function:

U_{i j} = w_{1} {\tilde{η}}_{i j} - w_{2} {\tilde{L}}_{i j},

(14)

where

w_{1}

and

w_{2}

control the relative importance of energy efficiency and latency. The network-wide optimization problem is expressed as

max_{{P_{i j}, x_{i j}}} \sum_{i, j} x_{i j} U_{i j}

(15)

subject to

\begin{matrix} 0 \leq P_{i j} \leq P_{max}, \forall i, j, \end{matrix}

(16)

\begin{matrix} λ_{i j} \geq γ_{min}, \forall i, j, \end{matrix}

(17)

\begin{matrix} \sum_{k \neq i} x_{k j} P_{k j}^{r} \leq I_{max}, \forall j, \end{matrix}

(18)

\begin{matrix} | δ - ρ | \leq ϵ, \end{matrix}

(19)

\begin{matrix} \sum_{j} x_{i j} \leq k_{max}, \forall i, \end{matrix}

(20)

\begin{matrix} x_{i j} \in {0, 1}, \forall i, j . \end{matrix}

(21)

Here,

P_{max}

denotes the maximum transmit power,

γ_{min}

is the minimum SINR requirement,

I_{max}

represents the maximum tolerable interference at each UAV receiver,

k_{max}

limits the maximum number of connections per UAV,

ρ

is the target graph density, and

ϵ

is a small tolerance parameter.

The resulting optimization problem is a mixed-integer nonlinear programming (MINLP) problem, characterized by non-convex fractional objectives and combinatorial topology constraints. The exponential growth of the solution space renders conventional optimization methods computationally infeasible for real-time multi-UAV networks, motivating the adoption of a learning-based solution, as described in the following section.

4. Solving the Optimization Problem Using Dueling DQN

The joint energy–latency optimization problem introduced in Section 3 is highly non-convex due to coupled interference, discrete link association variables, and the combinatorial nature of graph-density control. Classical optimization methods, such as convex relaxation or dual decomposition, become computationally intractable in large-scale multi-UAV networks, and are unsuitable for real-time deployment. To address these challenges, a Dueling DQN framework is adopted, enabling UAVs to learn optimal policies for transmit power, link association, and network topology directly from interaction with the environment without requiring explicit channel or interference models [17]. As illustrated in Figure 2, the proposed Dueling DQN consists of an online network and a target network, where the action-value function is decomposed into state-value and advantage streams.

4.1. Two-Timescale Hybrid Optimization

To enhance learning efficiency and ensure feasibility, we adopt a two-timescale hybrid optimization approach. At the slower timescale, the Dueling DQN learns the optimal inter-UAV link associations and graph-density configuration. At the faster timescale, the transmit power of each active link is optimized using a Newton–Bisection solver under a fixed topology [28]. Formally, the hierarchical optimization can be expressed as

max_{x (t), δ (t)} E [U (x (t), δ (t), P^{★} (t))],

(22)

where the optimal power vector is obtained from

P^{★} (t) = arg max_{P (t)} U (P (t) ∣ x (t), δ (t)) .

(23)

This decomposition reduces the action space for the DQN while preserving near-optimal continuous power allocation. It is important to emphasize that the Dueling DQN does not directly compute the final transmit power values. Instead, it determines high-level discrete decisions, including link association, graph-density configuration, and coarse power adjustment direction. Once these structural decisions are fixed, the Newton–Bisection solver performs continuous per-link power refinement to maximize the utility function under the current interference conditions. This hybrid design allows the reinforcement learning agent to focus on the combinatorial network optimization while the analytical solver efficiently handles continuous power optimization.

4.2. Per-Link Power Optimization and Interference Awareness

For a given topology and association state, the utility of a link

(i, j)

is defined as a weighted trade-off between normalized energy efficiency and latency:

U_{i j} (P) = w_{1} \frac{η_{i j}}{η_{max}} - w_{2} \frac{L_{i j}}{L_{max}},

(24)

where

w_{1}

and

w_{2}

are weight coefficients, and

η_{max}

,

L_{max}

denote the maximum expected values used for normalization to ensure both terms are dimensionless and comparable. With fixed interference

I_{j}

, the per-link utility reduces to

U (P) = w_{1} \frac{B {log}_{2} (1 + c P)}{P η_{max}} - w_{2} \frac{S}{B {log}_{2} (1 + c P) L_{max}},

(25)

where

c = g_{i j} / (I_{j} + ρ_{0})

. The optimal per-link transmit power

P_{t, i j}^{★}

satisfies

\frac{d U}{d P} = 0,

(26)

and is computed numerically using a Newton–Bisection solver (see Appendix A).

To incorporate interference coupling among UAVs, the interference at UAV j is iteratively updated as

I_{j} (t) = \sum_{k \neq i} x_{k j} (t) P_{t, k j} (t) g_{k j},

(27)

allowing the power update to adapt to the current network state.

4.3. Constraint Handling via Projection and Topology Pruning

Feasibility of the solution is guaranteed by projecting continuous power actions onto the allowable range:

P_{t, i j} (t) \leftarrow Π_{[0, P_{max}]} (P_{t, i j} (t)) .

(28)

Graph-density constraints are enforced by pruning the lowest-utility links such that the total number of active links satisfies

| E (t) | = ⌊ρ \frac{M (M - 1)}{2}⌋,

(29)

This deterministic pruning ensures connectivity while mitigating interference.

4.4. Markov Decision Process Formulation

For conciseness and to avoid repetition of widely known reinforcement learning formulations, only problem-specific elements of the MDP are detailed here, while standard definitions of Markov Decision Processes and DQN training mechanisms are summarized and properly referenced. This condensation enhances clarity without affecting mathematical completeness.

The Dueling DQN is trained on a Markov Decision Process (MDP) defined as

(S, A, P, R, γ)

[29]. The state

s_{t} \in S

encodes the network conditions:

s_{t} = {λ_{i j} (t), E_{i} (t), δ (t), x_{i j} (t)},

(30)

where

λ_{i j}

is the link SINR,

E_{i}

is residual UAV energy,

δ

is graph density, and

x_{i j}

is the link association indicator. The action

a_{t} \in A

determines discrete adjustments in power and link associations. The reward function is defined as

\begin{matrix} r_{t} = \frac{1}{N_{a}} \sum_{(i, j)} x_{i j} (t) U_{i j} (t) - α_{1} \sum_{(i, j)} I {λ_{i j} < γ_{min}} \\ - α_{2} \sum_{j} I \{\sum_{k \neq i} x_{k j} P_{r, k j} > I_{max}\} - α_{3} | δ (t) - ρ |, \end{matrix}

(31)

where

α_{1}

,

α_{2}

, and

α_{3}

are penalty coefficients for SINR violation, excessive interference, and graph-density deviation, respectively.

4.5. Dueling DQN Architecture and Learning

The following expressions follow the standard Dueling DQN value–advantage decomposition and are included for completeness, with emphasis on their adaptation to the proposed multi-UAV joint optimization framework.

The Dueling DQN approximates the action-value function as

Q (s, a; θ) = V (s; θ) + A (s, a; θ) - \frac{1}{| A |} \sum_{a^{'}} A (s, a^{'}; θ),

(32)

where

V (s; θ)

is the state-value function and

A (s, a; θ)

is the advantage function. The temporal-difference (TD) target is

y_{t} = r_{t} + γ max_{a^{'}} Q (s_{t + 1}, a^{'}; θ^{-}),

(33)

and the network parameters are updated by minimizing

L (θ) = E_{(s, a, r, s^{'}) \sim D} [{(y_{t} - Q (s_{t}, a_{t}; θ))}^{2}] .

(34)

The target network parameters are softly updated via

θ^{-} \leftarrow τ θ + (1 - τ) θ^{-},

(35)

with

τ ≪ 1

, and an

ϵ

-greedy policy maintains exploration.

The Dueling DQN network consists of two fully connected hidden layers with 256 neurons each, using ReLU activation functions. The learning rate is set to

10^{- 3}

, with discount factor

γ = 0.99

. The replay buffer size is 50,000, and a mini-batch size of 64 is used for stochastic gradient updates. The soft target update factor is

τ = 0.005

, and training is conducted over 1000 episodes to ensure stable convergence.

4.6. Convergence and Performance Metrics

The proposed hybrid Dueling DQN framework operates in a non-convex, high-dimensional action space due to coupled interference, discrete link association, and graph-density control. Under fixed interference conditions, the per-link power optimization admits a unique stationary solution. Furthermore, the Dueling DQN is trained with bounded rewards and a finite action space.

Due to the use of deep reinforcement learning in a non-convex environment, formal optimality guarantees cannot be strictly established. However, under bounded rewards and a finite action space, the training process empirically converges to a stable policy that consistently improves the joint energy–latency performance across different network configurations.

To quantitatively evaluate the performance of the multi-UAV network under the learned policy, standard metrics are adopted. These metrics capture the trade-offs between energy efficiency, latency, and reliability.

These metrics directly correspond to the objectives of the optimization framework: maximizing energy efficiency, minimizing latency, and ensuring robust inter-UAV connectivity. By monitoring the performance metrics listed in Table 1, namely

\bar{η}

,

\bar{L}

, and

P_{net}^{out}

, the proposed Dueling DQN policy can be systematically evaluated and compared against baseline schemes across diverse network scenarios. This framework enables UAVs to autonomously adjust transmit power, link associations, and network topology, thereby achieving a balanced trade-off among energy consumption, latency, and connectivity reliability in real-time multi-UAV IoT networks.

4.7. Algorithm Description and Complexity Analysis

Algorithm 1 summarizes the proposed Dueling DQN-based framework for joint optimization of transmit power, inter-UAV link association, and graph density. At each time step, UAVs observe the network state and select actions using an

ϵ

-greedy policy. The Dueling DQN decomposes the action-value function into state-value and advantage components for stable learning. For each active link, transmit power is refined using a Newton–Bisection solver and projected onto

[0, P_{max}]

. Graph-density constraints are enforced by retaining the highest-utility links to satisfy the target density

ρ

. Experience replay and soft target updates are employed to stabilize training.

The computational complexity consists of three parts: Dueling DQN updates, per-link power optimization, and graph-density pruning. Let B denote the mini-batch size and

N_{links}

the number of active links. The Dueling DQN update has complexity

O (B | S | | A |)

. Power optimization requires

O (N_{links} K_{NB})

operations, where

K_{NB}

is the number of Newton–Bisection iterations. Graph-density pruning has complexity

O (N_{links} log N_{links})

. Therefore, the overall complexity per learning step is

O (B | S | | A | + N_{links} K_{NB} + N_{links} log N_{links}),

(36)

Although the theoretical MDP action space grows with the number of UAVs, the proposed framework incorporates several mechanisms that improve scalability. The two-timescale hybrid design restricts the reinforcement learning agent to discrete topology and association decisions, while continuous power allocation is solved analytically via the Newton–Bisection method. In addition, graph-density control and connectivity-aware pruning limit the number of active links, ensuring that the effective decision space remains structured and computationally manageable even for larger UAV networks.

Algorithm 1: Dueling DQN for Joint Power, Association, and Graph Density Optimization

5. Simulation and Results Analysis

The simulation environment considers a multi-UAV IoT network deployed over a

2 \times 2 km

smart city area. Each UAV operates as a mobile access point serving ground IoT devices distributed according to a Poisson Point Process (PPP). UAV altitudes are constrained within 80–120 m to ensure sufficient coverage overlap and dynamic inter-UAV connectivity. The average network graph density

ρ

is varied between

0.2

and

0.5

to examine different connectivity regimes. UAV horizontal positions follow a bounded random waypoint mobility model with limited speed. The wireless channel follows a probabilistic LoS model with distance-dependent path loss and log-normal shadowing. UAVs are subject to limited onboard energy, with realistic hovering and propulsion power consumption models summarized in Table 2.

The proposed dueling DQN framework jointly optimizes energy efficiency and end-to-end latency, including transmission and queuing delays, by adaptively adjusting transmit powers and inter-UAV associations. Reward weights

w_{1}

and

w_{2}

are selected as 0.6:0.4 to prioritize energy sustainability while maintaining latency fairness.

The reward formulation represents a linear scalarization of the underlying bi-objective optimization problem, where energy efficiency and latency are inherently conflicting objectives. Varying the weight ratio corresponds to selecting different operating points along the Pareto frontier of feasible energy–latency trade-offs. A sensitivity analysis was conducted by varying the weight ratio across (0.3:0.7), (0.5:0.5), and (0.8:0.2), confirming that increasing

w_{1}

improves energy efficiency at the cost of moderate latency increase, while increasing

w_{2}

reduces latency with higher energy consumption. The selected configuration (0.6:0.4) lies near a balanced Pareto-efficient region and demonstrates stable convergence behavior.

The convergence behavior, stability, and generalization capability of the proposed scheme are evaluated under varying numbers of UAVs, traffic loads, and graph densities.

To demonstrate the effectiveness of the proposed framework, its performance is compared against several baseline schemes under identical simulation settings. The first benchmark is a convolutional DQN, which employs convolutional layers to extract features from the state representation but does not incorporate the dueling value–advantage decomposition [39]. The second baseline is DDQN, which mitigates Q-value overestimation by decoupling action selection and evaluation while maintaining a standard non-dueling network architecture [11].

For non-learning-based approaches, a heuristic power allocation (PA) scheme with fixed UAV associations and iterative water-filling optimization is considered [40], along with an equal power (EP) scheme assuming static connectivity [41]. These baselines enable a comprehensive assessment of convergence speed, stability, and performance gains in terms of energy efficiency, latency, and outage probability.

Figure 3 illustrates the training convergence behavior of the proposed Dueling DQN compared with Conv-DQN, and DDQN. The energy efficiency, measured in bps/W, is plotted against the number of training episodes. The proposed Dueling DQN converges significantly faster and attains the highest steady-state performance, approaching

3.4 \times 10^{9}

bps/W. In contrast, Conv-DQN and DDQN exhibit slower convergence and stabilize at lower energy-efficiency levels. These results confirm that the dueling value–advantage decomposition improves learning stability and accelerates convergence.

Figure 4 compares the energy-efficiency performance under different network densities and UAV scales. Across all evaluated scenarios, the proposed Dueling DQN consistently outperforms the benchmark methods, demonstrating strong robustness to topology variations.

Under sparse connectivity conditions (

ρ = 0.2

), the proposed framework achieves an average energy efficiency of approximately

3.45 \times 10^{9}

bps/W with eight UAVs, yielding performance gains of about 11% and 17% over DDQN and Conv-DQN, respectively. The heuristic power allocation scheme attains only

2.5 \times 10^{9}

bps/W, highlighting the limitations of static optimization approaches in dynamic interference environments.

As the network density increases, the robustness of the proposed Dueling DQN becomes more pronounced. In fully connected topologies (

ρ = 1.0

), the proposed method maintains an energy efficiency of

3.40 \times 10^{9}

bps/W, corresponding to a marginal degradation of only 1.4% relative to the sparse case. In comparison, DDQN and Conv-DQN experience larger performance reductions due to increased state–action complexity and interference coupling. This resilience is attributed to the ability of the dueling architecture to decouple state values from action advantages, enabling more reliable policy updates in dense network conditions.

Scalability is further evaluated by increasing the number of UAVs under different graph densities. The proposed Dueling DQN sustains near-optimal performance as the network scales, maintaining energy efficiency above

3.42 \times 10^{9}

bps/W for UAV counts ranging from four to eight at

ρ = 0.5

. In contrast, Conv-DQN and DDQN begin to degrade beyond six UAVs, while non-learning baseline schemes exhibit limited scalability.

Figure 5 presents the end-to-end latency performance under sparse, moderate, and dense connectivity scenarios. The proposed Dueling DQN consistently achieves the lowest latency across all evaluated conditions.

Under sparse connectivity (

ρ = 0.2

), the proposed approach maintains an average latency below

0.45

s with eight UAVs, achieving latency reductions of approximately 52% and 67% compared with the uncertainty-based and Double-FQPC schemes, respectively. The random selection strategy suffers from excessive delays exceeding

1.8

s.

For moderate connectivity (

ρ = 0.5

), the proposed framework exhibits graceful scalability, with latency increasing from

0.38

s to

0.58

s as the number of UAVs grows from four to eight. Competing methods experience steeper latency growth, indicating inferior interference management under increasing network size. In dense topologies (

ρ = 1.0

), the proposed Dueling DQN maintains latency below

0.82

s with eight UAVs, representing an increase of only 18% compared to sparse conditions.

Figure 6 and Figure 7 illustrate the impact of graph density on energy efficiency and latency for a 30-UAV network. As graph density increases, all schemes experience performance degradation due to intensified interference and coordination overhead. Nevertheless, the proposed Dueling DQN consistently achieves the highest energy efficiency and the lowest latency across all density levels.

Figure 8 depicts the energy–latency trade-off among the evaluated schemes. The proposed Dueling DQN occupies the Pareto-optimal region, achieving both high energy efficiency (approximately

3.15 \times 10^{9}

bps/W) and low latency (approximately

0.08

s). In comparison, Conv-DQN and DDQN offer moderate trade-offs, while heuristic and random schemes cluster in the low-efficiency, high-latency region. This confirms the superiority of the proposed approach in jointly optimizing conflicting performance objectives.

Table 3 summarizes the statistical energy-efficiency performance of all evaluated methods. In addition to achieving the highest mean energy efficiency, the proposed Dueling DQN exhibits the lowest coefficient of variation (2.7%), indicating improved learning stability and robustness. Compared with Conv-DQN and DDQN, the reduced variance demonstrates that the dueling architecture not only enhances average performance but also yields more consistent and reliable convergence behavior.

6. Conclusions

This paper presented a Dueling Deep Q-Network (DQN) framework for joint energy efficiency and latency optimization in multi-UAV communication networks. Unlike conventional DRL-based approaches that optimize transmit power or trajectory in isolation, the proposed framework jointly learns transmit power allocation, inter-UAV link association, and adaptive graph density regulation within a unified learning model. This integrated design enables each UAV to autonomously balance energy consumption, end-to-end latency, and connectivity robustness in dynamic, interference-limited environments.

Extensive simulation results demonstrated that the proposed Dueling DQN consistently outperforms conventional DQN, Double DQN (DDQN), and non-learning heuristic schemes. Specifically, the proposed approach achieves up to 15% improvement in energy efficiency, reduces end-to-end latency by up to 12%, and exhibits significantly enhanced convergence stability across varying network densities and UAV scales. By explicitly incorporating graph density as a controllable decision variable, the framework dynamically adapts the network topology, preserving performance even under dense connectivity conditions. Furthermore, the dueling value–advantage decomposition improves learning efficiency by stabilizing Q-value estimation and accelerating convergence in high-dimensional state–action spaces.

Overall, the results confirm that the proposed Dueling DQN framework provides a robust, scalable, and data-driven solution for real-time resource management in 6G-enabled multi-UAV networks. Future work will extend this framework toward multi-agent coordination and federated reinforcement learning, as well as the integration of digital-twin-assisted environments, to further enhance scalability, security, and energy-aware intelligence in UAV-assisted smart city and emergency response applications.

Funding

This research was funded by Prince Sattam bin Abdulaziz University, grant number PSAU/2025/01/32903.

Data Availability Statement

The data supporting the findings of this study are available from the author upon reasonable request.

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/01/32903).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Newton–Bisection Solver for Per-Link Power Optimization

This appendix provides the numerical procedure used to obtain the per-link optimal transmit power

P_{t, i j}^{★}

that satisfies the first-order condition in (4). The utility function for a single link is given by

U (P) = w_{1} \frac{B {log}_{2} (1 + c P)}{P} - w_{2} \frac{S}{B {log}_{2} (1 + c P)},

(A1)

where

c = g_{i j} / (I_{j} + ρ_{0})

and

P \in [0, P_{max}]

. The stationary point

P^{★}

satisfies

\frac{d U}{d P} = 0

, where

\begin{matrix} \frac{d U}{d P} = w_{1} B (\frac{c P}{(1 + c P) ln 2} - {log}_{2} (1 + c P)) \frac{1}{P^{2}} \\ + w_{2} \frac{S c}{B (1 + c P) {({log}_{2} (1 + c P))}^{2}} . \end{matrix}

(A2)

Because (A2) is nonlinear and has no closed-form root, we employ a hybrid Newton–Raphson and bisection method for stable convergence.

Algorithm A1: Newton–Bisection Hybrid Solver for

P_{t, i j}^{★}

The algorithm first attempts a Newton step to accelerate convergence and falls back to a bisection update whenever the Newton update leaves the feasible interval or the derivative becomes ill-conditioned. The resulting

P_{t, i j}^{★}

provides a deterministic reference for validating the reinforcement learning outcomes and initializing the Dueling DQN replay buffer.

References

Pervez, F.; Sultana, A.; Yang, C.; Zhao, L. Energy and latency efficient joint communication and computation optimization in a multi-UAV-assisted MEC network. IEEE Trans. Wirel. Commun. 2023, 23, 1728–1741. [Google Scholar] [CrossRef]
Wen, C.; Chen, G.; Gu, X.; Wang, W. Joint optimization of communication rates for multi-UAV relay systems. Complex Intell. Syst. 2025, 11, 289. [Google Scholar] [CrossRef]
Shen, X.; Gu, L.; Yang, J.; Shen, S. Energy efficiency optimization for UAV-RIS-assisted wireless powered communication networks. Drones 2025, 9, 344. [Google Scholar] [CrossRef]
Alnakhli, M. Joint optimization of spectrum and power control in 6G multi-UAV networks: Managing data redundancy. IEEE Access 2025, 13, 132913–132925. [Google Scholar] [CrossRef]
Lyu, Z.; Zhu, G.; Xu, J. Joint Trajectory and Beamforming Design for UAV-Enabled Integrated Sensing and Communication. In Proceedings of the IEEE International Conference on Communications (ICC), Seoul, Republic of Korea, 11 August 2022; pp. 1–6. [Google Scholar] [CrossRef]
Al-Bakhrani, A.A.; Li, M.; Obaidat, M.S.; Amran, G.A. MOALF-UAV-MEC: Adaptive Multiobjective Optimization for UAV-Assisted Mobile Edge Computing in Dynamic IoT Environments. IEEE Internet Things J. 2025, 12, 20736–20756. [Google Scholar] [CrossRef]
Sarathchandra, S.; Eldeeb, E.; Shehab, M.; Alves, H.; Mikhaylov, K.; Alouini, M.-S. Age and power minimization via meta-deep reinforcement learning in UAV networks. IEEE Trans. Veh. Technol. 2025, 74, 16839–16849. [Google Scholar] [CrossRef]
Yi, M.; Wang, X.; Liu, J.; Zhang, Y.; Hou, R. Meta-reinforcement learning for timely and energy-efficient data collection in solar-powered UAV-assisted IoT networks. IEEE Trans. Commun. 2025, 73, 7535–7551. [Google Scholar] [CrossRef]
Gong, S.; Wang, M.; Gu, B.; Zhang, W.; Hoang, D.T.; Niyato, D. Bayesian optimization enhanced deep reinforcement learning for trajectory planning and network formation in multi-UAV networks. IEEE Trans. Veh. Technol. 2023, 72, 10933–10948. [Google Scholar] [CrossRef]
Omoniwa, B.; Galkin, B.; Dusparic, I. Communication-enabled deep reinforcement learning to optimise energy-efficiency in UAV-assisted networks. Veh. Commun. 2023, 43, 100640. [Google Scholar] [CrossRef]
Lee, S.; Ban, T.-W.; Lee, H. Network-wide energy efficiency maximization in UAV-aided IoT networks: A quasi-distributed deep reinforcement learning approach. IEEE Internet Things J. 2025, 12, 15404–15414. [Google Scholar] [CrossRef]
Ali, A.S.; Al-Habob, A.A.; Naser, S.; Bariah, L.; Dobre, O.A.; Muhaidat, S. Deep reinforcement learning for energy-efficient data dissemination through UAV networks. IEEE Open J. Commun. Soc. 2024, 5, 5567–5583. [Google Scholar] [CrossRef]
Devaraju, S.; Garg, S.; Ihler, A.; Bentley, E.S.; Kumar, S. Pipe routing with topology control for decentralized and autonomous UAV networks. Drones 2025, 9, 140. [Google Scholar] [CrossRef]
Zhou, L.; Yin, H.; Zhao, H.; Wei, J.; Hu, D.; Leung, V.C.M. A comprehensive survey of artificial intelligence applications in UAV-enabled wireless networks. Digit. Commun. Netw. 2024; in press.
Samma, H.; El-Ferik, S. Autonomous UAV visual navigation using an improved deep reinforcement learning. IEEE Access 2024, 12, 79967–79977. [Google Scholar] [CrossRef]
Zeng, Y.; Xu, X.; Jin, S.; Zhang, R. Simultaneous navigation and radio mapping for cellular-connected UAV with deep reinforcement learning. IEEE Trans. Wirel. Commun. 2021, 20, 4205–4220. [Google Scholar] [CrossRef]
Sharma, N.; Kumar, K. A novel latency-aware resource allocation and offloading strategy with improved prioritization and DDQN for edge-enabled UDNs. IEEE Trans. Netw. Service Manag. 2024, 21, 6260–6272. [Google Scholar] [CrossRef]
Gao, Y.; Yuan, X.; Yang, D.; Hu, Y.; Cao, Y.; Schmeink, A. DRL-based joint terminal scheduling and UAV 3D trajectory design in UAV-assisted MEC systems. IEEE Trans. Veh. Technol. 2024, 73, 10164–10180. [Google Scholar] [CrossRef]
Tang, J.; Nie, J.; Zhang, Y.; Xiong, Z.; Jiang, W.; Guizani, M. Multi-UAV-assisted federated learning for energy-aware distributed edge training. IEEE Trans. Netw. Serv. Manag. 2023, 21, 280–294. [Google Scholar] [CrossRef]
Yuan, H.; Wang, M.; Bi, J.; Shi, S.; Yang, J.; Zhang, J.; Zhou, M.C.; Buyya, R. Cost-efficient task offloading in mobile edge computing with layered unmanned aerial vehicles. IEEE Internet Things J. 2024, 11, 30496–30509. [Google Scholar] [CrossRef]
Betalo, M.L.; Leng, S.; Abishu, H.N.; Seid, A.M.; Fakirah, M.; Erbad, A.; Guizani, M. Multi-agent DRL-based energy harvesting for freshness of data in UAV-assisted wireless sensor networks. IEEE Trans. Netw. Serv. Manag. 2024, 18, 1297–1310. [Google Scholar] [CrossRef]
Wang, B.; Tu, D.; Wang, J. Balancing energy consumption and latency in vehicle edge computing for 6G networks. In 2024 International Wireless Communications and Mobile Computing (IWCMC 2024); IEEE: New York, NY, USA, 2024; pp. 309–314. [Google Scholar]
Raja, K.; Kottursamy, K.; Ravichandran, V.; Balaganesh, S.; Dev, K.; Nkenyereye, L.; Raja, G. An efficient 6G federated learning-enabled energy-efficient scheme for UAV deployment. IEEE Trans. Veh. Technol. 2024, 74, 2057–2066. [Google Scholar] [CrossRef]
Wang, X.; Li, J.; Wu, J.; Guo, L.; Ning, Z. Energy efficiency optimization of IRS and UAV-assisted wireless powered edge networks. IEEE J. Sel. Top. Signal Process. 2024, 18, 1297–1310. [Google Scholar] [CrossRef]
Alnakhli, M. Optimizing spectrum efficiency in 6G multi-UAV networks through source correlation exploitation. EURASIP J. Wirel. Commun. Netw. 2024, 2024, 6. [Google Scholar] [CrossRef]
Alnakhli, M.; Anand, S.; Chandramouli, R. Joint spectrum and energy efficiency in device-to-device communication enabled wireless networks. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 217–225. [Google Scholar] [CrossRef]
Alnakhli, M.A.; Mohamed, E.M.; Fouda, M.M. Bandwidth allocation and power control optimization for multi-UAV-enabled 6G networks. IEEE Access 2024, 12, 67405–67415. [Google Scholar] [CrossRef]
Sabharwal, C.L. An iterative hybrid algorithm for roots of non-linear equations. Eng 2021, 2, 80–98. [Google Scholar] [CrossRef]
Li, S.; Shi, P.; Yang, A.; Qi, H.; Dong, X. DPD3QN: A dueling double deep Q-network with dual-priority experience replay for autonomous driving. Algorithms 2025, 18, 291. [Google Scholar]
Azari, M.M.; Rosas, F.; Pollin, S. Cellular connectivity for UAVs: Network modeling, performance analysis, and design guidelines. IEEE Trans. Wirel. Commun. 2019, 18, 3366–3381. [Google Scholar] [CrossRef]
Mozaffari, M.; Saad, W.; Bennis, M.; Nam, Y.-H.; Debbah, M. A tutorial on UAVs for wireless networks. IEEE Commun. Surv. Tutor. 2019, 21, 2334–2360. [Google Scholar] [CrossRef]
Al-Hourani, A.; Kandeepan, S.; Lardner, S. Optimal LAP altitude for maximum coverage. IEEE Wirel. Commun. Lett. 2014, 3, 569–572. [Google Scholar] [CrossRef]
Zeng, Y.; Wu, Q.; Zhang, R. Accessing from the sky: A tutorial on UAV communications for 5G and beyond. Proc. IEEE 2019, 107, 2327–2375. [Google Scholar] [CrossRef]
Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Abeywickrama, H.V.; Jayawickrama, B.A.; He, Y.; Dutkiewicz, E. Empirical power consumption model for UAVs. In Proceedings of the 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 27–30 August 2018; pp. 1–5. [Google Scholar] [CrossRef]
Beigi, P.; Rajabi, M.S.; Aghakhani, S. An overview of drone energy consumption factors and models. In Handbook of Smart Energy Systems; Springer: Cham, Switzerland, 2023; pp. 529–548. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings International Conference on Machine Learning (ICML); PMLR: New York, NY, USA, 2016; pp. 1995–2003. [Google Scholar]
Iqbal, A.; Al-Habashna, A.; Wainer, G.; Bouali, F.; Boudreau, G.; Wali, K. Deep reinforcement learning-based resource allocation for secure RIS-aided UAV communication. In Proceedings 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Wei, J.; Lei, Y.; Wen, Z.; Xiao, Y.; Ma, P.; Sun, L.; Su, L. A user association and resource allocation algorithm for UAV-assisted smart grid. Sensors 2024, 24, 8195. [Google Scholar] [CrossRef] [PubMed]
Siddiqui, A.B.; Aqeel, I.; Alkhayyat, A.; Javed, U.; Kaleem, Z. Prioritized user association for sum-rate maximization in UAV-assisted emergency communication: A reinforcement learning approach. Drones 2022, 6, 45. [Google Scholar] [CrossRef]

Figure 1. Illustration of the multi-UAV communication network architecture showing UAV coverage, inter-UAV connectivity, and cooperative aerial topology.

Figure 2. Overall architecture of the proposed Dueling DQN framework and its interaction with the multi-UAV IoT environment.

Figure 3. Training convergence of DQN variants, showing Dueling DQN achieving the highest energy efficiency of

3.4 \times 10^{9}

bps/W.

Figure 3. Training convergence of DQN variants, showing Dueling DQN achieving the highest energy efficiency of

3.4 \times 10^{9}

bps/W.

Figure 4. This multi-panel figure comparatively analyzes the energy efficiency (in bps/W) versus the number of UAVs for various methods across different graph densities of 0.2, 0.5, and 1.0.

Figure 5. This multi-panel figure presents a comparative analysis of latency performance (in seconds) versus the number of UAVs for various resource allocation methods across different network graph densities (0.2, 0.5, and 1.0).

Figure 6. Impact of graph density on energy efficiency for 30 UAVs under different algorithms.

Figure 7. Impact of graph density on latency for 30 UAVs under different algorithms.

Figure 8. Energy-Latency trade-off analysis demonstrating the proposed Dueling DQN’s Pareto dominance, achieving optimal balance between high energy efficiency and minimal latency.

Table 1. Performance metrics for multi-UAV network evaluation.

Metric	Expression
Average network energy efficiency	$\bar{η} = \frac{E [\sum_{(i, j)} x_{i j} R_{i j}]}{E [\sum_{(i, j)} x_{i j} P_{t, i j}]}$
Average end-to-end latency	$\bar{L} = E [\frac{1}{N_{a}} \sum_{(i, j)} x_{i j} L_{i j}]$
Network outage probability	$P_{net}^{out} = \frac{1}{\| E \|} \sum_{(i, j)} Pr {λ_{i j} < γ_{min}}$

Table 2. Simulation parameters for the dueling DQN-based UAV–IoT network.

Parameter	Value/Note
M	10–100 UAVs [30]
$N_{d}$	15 IoT devices per UAV [31]
Simulation area	$2 km \times 2 km$ [30]
UAV altitude range	80–120 m [31]
$f_{c}$	2.4 GHz [32]
W	10 MHz [33]
$N_{0}$	$- 174$ dBm/Hz [34]
$P_{t_{max}}$	30 dBm [33]
$P^{hover}$	80 W [35,36]
$P^{move}$	20 W [35,36]
$γ_{min}$	10 dB [33]
$I_{max}$	$- 90$ dBm [31,33]
$w_{1} : w_{2}$	0.6:0.4 [22]
$α$	$1 \times 10^{- 4}$ [37]
$γ$	0.95 [37]
$τ$	0.01 [38]
$ϵ$	$1.0 \to 0.05$ [37]
$\| D \|$	$5 \times 10^{4}$ [38]
Batch size B	64 [37]

Table 3. Statistical energy-efficiency performance comparison.

Method	Mean (×10⁹ bps/W)	Std	CV (%)
Equal Power (EP)	1.99	0.28	14.0
Heuristic PA	2.42	0.25	10.2
Conv-DQN	2.78	0.20	7.1
DDQN	3.10	0.15	4.9
Proposed Dueling DQN	3.51	0.10	2.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alnakhli, M.A. Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network. Drones 2026, 10, 275. https://doi.org/10.3390/drones10040275

AMA Style

Alnakhli MA. Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network. Drones. 2026; 10(4):275. https://doi.org/10.3390/drones10040275

Chicago/Turabian Style

Alnakhli, Mohammad Ahmed. 2026. "Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network" Drones 10, no. 4: 275. https://doi.org/10.3390/drones10040275

APA Style

Alnakhli, M. A. (2026). Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network. Drones, 10(4), 275. https://doi.org/10.3390/drones10040275

Article Menu

Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network

Highlights

Abstract

1. Introduction

2. Related Work

3. System Model and Problem Formulation

4. Solving the Optimization Problem Using Dueling DQN

4.1. Two-Timescale Hybrid Optimization

4.2. Per-Link Power Optimization and Interference Awareness

4.3. Constraint Handling via Projection and Topology Pruning

4.4. Markov Decision Process Formulation

4.5. Dueling DQN Architecture and Learning

4.6. Convergence and Performance Metrics

4.7. Algorithm Description and Complexity Analysis

5. Simulation and Results Analysis

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Newton–Bisection Solver for Per-Link Power Optimization

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI