Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach

Al-Masry, Ahmed Ali; Ibrahim, Michael; Elbadawy, Hesham; El-Hennawy, Hadia; Ahmed, Mehaseb

doi:10.3390/telecom7030066

Open AccessArticle

Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach

by

Ahmed Ali Al-Masry

^1,2,*,

Michael Ibrahim

²

,

Hesham Elbadawy

³

,

Hadia El-Hennawy

² and

Mehaseb Ahmed

^1,*

¹

Electronics and Communications Department, Faculty of Engineering Science and Arts, Misr International University, Cairo 11828, Egypt

²

Electronics and Communications Department, Faculty of Engineering, Ain Shams University, Cairo 11517, Egypt

³

Research and Development Central Administrative, Ministry of Communications and Information Technology (MCIT), Cairo 11524, Egypt

^*

Authors to whom correspondence should be addressed.

Telecom 2026, 7(3), 66; https://doi.org/10.3390/telecom7030066

Submission received: 13 March 2026 / Revised: 27 May 2026 / Accepted: 29 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Wireless Communications for UAVs, IoT, 5G Technologies, Information and Coding Theory)

Download

Browse Figures

Versions Notes

Abstract

The rapid increase in interest for Vehicle-to-Everything (V2X) networks has created significant challenges in efficient radio resource management. This paper addresses the problem of joint subcarrier assignment and power allocation to maximize the spectral efficiency of the system. First, this paper mathematically formulates resource allocation and power allocation as an optimization problem, which is solved using conventional optimization methodologies to establish a baseline for performance benchmarking. To overcome the high computational complexity associated with traditional optimization, we subsequently propose a Multi-Agent Deep Q-Network (Multi-DQN) agent framework based on deep reinforcement learning (DRL). The proposed agent learns optimal allocation strategies through interaction with the environment, enabling adaptive and real-time decision-making. The system performance is investigated in different environments under both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios, addressing a gap in prior approaches. Simulation results demonstrate that the proposed Multi-DQN agent approach significantly outperforms the enhanced conventional benchmark, achieving higher spectral efficiency (SE) while substantially reducing the computational complexity.

Keywords:

Vehicle-to-everything (V2X); deep reinforcement learning (DRL); multi-agent deep q-Network (Multi-DQN); spectral efficiency (SE); line-of-sight (LOS); non-line-of-sight (NLOS)

1. Introduction

The advancement of Vehicle-to-Everything (V2X) communications is transforming transportation systems into intelligent, interconnected, and autonomous systems. V2X communication is essential for this advancement since it enables information exchange among various entities, including Vehicle to Infrastructure (V2I), Vehicle to Vehicle (V2V), Vehicle to Pedestrians (V2P), and Vehicle to Networks (V2N) [1]. Utilizing low-latency, highly dependable connectivity, V2X aims to enhance road safety, traffic efficiency, and infotainment services. Meeting these strict requirements is dependent upon the integration of V2X with advanced cellular technology. The Third Generation Partnership Project (3GPP) initially introduced Cellular V2X (C-V2X) in Release 14, offering fundamental vehicle communication over LTE, with further enhancements introduced in Release 15 [2,3]. The 3GPP expanded the V2X features in the following releases (Release 16) by introducing a new cellular V2X standard through the implementation of 5G New Radio (5G NR) [4]. Therefore, all standards will be according to 3GPP Releases 14–18. Although basic V2X safety messages use broadcast topologies, this work focuses on the advanced V2X use cases offered in 3GPP Release 16. These high-data-rate applications critically require V2V unicast (pair-wise) communication, which needs sophisticated resource allocation and NOMA algorithms with great efficiency, as described in this work [5].

Furthermore, 5G NR offers enhanced capabilities for connected and automated driving applications that require low latency and high reliability and bandwidth requirements [1]. The development of 6G systems will allow V2X to achieve these strict requirements, introducing revolutionary characteristics such as sub-millisecond latency, terabit-per-second data speeds, ubiquitous artificial intelligence (AI), integrated sensing and communication (ISAC), and semantic-aware transmission [6]. Additionally, 6G technologies such as digital twinning, ultra-precise localization, and real-time high-fidelity environment mapping for fully autonomous vehicles are expected to fulfill the future requirements of V2X [6]. Recent research has proposed new frameworks and practical instruments for 6G-supported V2X. Moreover, another research direction examines millimeter-wave (mmWave) networks in order to achieve ultra-high data rates and manage congested traffic environments [7].

Due to the huge attention to artificial intelligence (AI), developers are increasingly utilizing AI for real-time spectrum tuning [6]. In the past few years, AI and machine learning (ML) have attracted considerable interest in the field of communication systems owing to their remarkable efficacy in diverse domains, including V2X communication. The dynamic topology, high mobility, and diverse requirements of vehicle networks significantly challenge the maintenance of service quality and reliability. Under these conditions, conventional optimization methods relying on static models and rule-based heuristics are increasingly inadequate. Therefore, a number of optimization-based methods require solving NP-hard problems, especially when it comes to allocating resources (like power, spectrum, and scheduling). As the size of the network grows, the computational complexity grows very quickly, making it difficult to use in real time [8]. Moreover, recent research indicates that Multi-Agent Deep Reinforcement Learning (MADRL) models enable V2V links to allocate resources and transmitted power. This increases the throughput of the whole system and handles traffic changes that occur at random times without the coordination of a central server such as the base station [9]. This has led to increasing interest in AI-driven optimization methods that utilize real-time data and adaptive learning algorithms to enhance decision-making across various levels of V2X. Machine learning (ML) and DRL have emerged as essential tools for enhancing spectrum allocation, beamforming, power control, network slicing, and collaborative behavior between infrastructure and cars [10]. Furthermore, the importance of energy awareness in DRL optimization in V2X communication systems has attracted strong interest. DRL frameworks have been used to jointly optimize AoI and energy consumption in C-V2C-enabled IoV environments [11].

V2X confronts numerous complex challenges because of the constantly evolving vehicle surroundings and wide variety of vehicle types involved. This includes the functionality of ultra-reliable low-latency communications (URLLC), managing spectrum efficiency and congestion, addressing security and privacy concerns, and incorporating new network technologies such as the millimeter-wave (mmWave) and terahertz (THz) bands [6,12]. The simultaneous presence of legacy systems such as Dedicated Short-Range Communication (DSRC) and new systems like NR C-V2X with emerging high-band V2X implementations complicates interference management and spectrum utilization. This necessitates flexible coordination and dynamic resource distribution [12]. A significant concern is safety and privacy: V2X systems transmit critical information such as speed and position, rendering them susceptible to spoofing, eavesdropping, and Sybil attacks. Robust cryptographic frameworks and real-time behavior-monitoring systems are essential to ensure message security and user trust [13]. Moreover, a dedicated ITS spectrum is used for basic safety messages, but this paper explores V2X use cases that require high SE. To overcome spectrum limitations, in-band underlay spectrum allocation complying with 3GPP NR-V2X Mode 1 [2,14] is used, where V2V side link communications operate across the downlink cellular frequency. Moreover, NOMA is used to enable power-domain sharing of the same subcarrier.

1.1. NOMA-V2X Literature Review

The Non-Orthogonal Multiple Access (NOMA) scheme is one of the most promising candidates for 5G/6G because of its effectiveness in using the available bandwidth, high connectivity, and higher achieved data rate and spectral efficiency. Therefore, combining NOMA with the V2X communication system has gained a lot of attention in the previous few years. Moreover, V2X communication systems are highly dynamic, where channels suffer from significant changes (severe interference, Doppler shifts due to high mobility of vehicles, and shadowing). Therefore, the NOMA technique has an advantage over other multiple-access techniques to be used in V2X communication systems, enhancing the overall system performance [15]. The NOMA scheme is designed for modern vehicles equipped with multiple radio access technology (Multi-RAT) capabilities. This ensures orderly coexistence with legacy LTE/5G C-V2X vehicles. Therefore, legacy hardware will continue utilizing the dedicated 5.9 GHz ITS band, while Multi-RAT vehicles concurrently leverage cellular bands for advanced, high-throughput V2X services. Furthermore, deploying Multi-RAT architectures in V2X communication systems requires careful attention to both backward and forward compatibility. Specifically, ensuring the reliable transmission of critical safety messages across mixed-generation V2X environments remains a vital point for future research.

Due to the above combination and its potential benefits, a lot of research has been conducted in this area. In [13], a two-stage method is proposed that integrates centralized resource allocation and distributed power regulation by using graph-based matching and game theory methods. The authors in [16] propose a joint resource allocation mechanism based on weighted max-min fairness to improve fairness and spectrum efficiency in NOMA-V2X communication. This problem is then divided into three subproblems, which are solved using matching theories and iterative algorithms. The authors of [17] presented a NOMA mixed centralized/distributed (NOMA-MCD) approach that employs centralized semi-persistent scheduling (SPS) at the BS for resource allocation and distributed power control for V2X broadcasting. In [18], multi-channel resource allocation for 5G Device-to-Device (D2D) based on V2X is proposed to improve the V2I ergodic capacity, V2V reliability, and bandwidth utilization. This is achieved through power optimization utilizing three different allocation schemes. Moreover, the authors of [19] aimed to enhance the energy efficiency (EE) in a NOMA-based V2X system by optimizing power and spectrum allocation through a two-layer block coordinate descent (BCD) technique that integrates Dinkelbach’s method with a concave–convex process (CCCP) [20]. A summary of the NOMA-V2X literature review is presented in Table 1.

1.2. Using ML and DRL with V2X Literature Review

Recent studies have focused on the application of Multi-Agent Deep Reinforcement Learning (MADRL) to address the resource allocation issues in NOMA-V2X. Regarding recent studies, ref. [21] explored enhanced multi-agent frameworks to optimize resource allocation and power in dynamic topologies. The authors in [22] studied a matching-combined heterogeneous MADDPG algorithm to maximize the sum delivery rate for V2I uplinks while ensuring that V2V reliability is maintained in NOMA-V2X networks. Moreover, ref. [23] combined NOMA with 5G NR-V2X Mode 2, using a DRL framework to find the best solution between the age of information (AoI) and energy consumption by changing the resource reservation intervals and transmitted power. In [24], a data-driven MADRL scheme to meet the needs for high capacity among cellular users and high reliability among V2V users is presented.

Furthermore, researchers have recently investigated the application of DRL for resource and power allocation in V2X communication systems. In [25], a hybrid DRL system employs a Deep Q-Network (DQN) for sub-band allocation and Deep Deterministic Policy Gradient (DDPG) for continuous power control. An extension of meta-reinforcement learning was incorporated to facilitate rapid adaptation to novel circumstances. The authors in [26] proposed a decentralized multi-agent DQN architecture that considers roadside units (RSUs) as clustered virtual agents with limited actions, employing a weighted global reward system. The authors of the work represented in [27] developed a Multi-Agent Deep Q-Network (AMARL) utilizing DQN agents that emphasize attention for V2X joint spectrum and power allocation. This enhances V2I sum rates and V2V latency while adjusting to environmental variations. Moreover, ref. [28] proposes a DRL-based architecture utilizing decentralized DQN agents for V2X, which concurrently improves mode selection and resource allocation, outperforming heuristic methods in V2I throughput and V2V delay. In [29], a decentralized DRL architecture employing DQN enables V2V agents to autonomously allocate sub-bands and power for both unicast and broadcast, thereby improving the V2I capacity and V2V latency performance. Table 2 summarizes the reviewed literature on using ML and DRL with V2X.

1.3. Contribution and Organization

In this paper, the V2X resource allocation problem is investigated using both conventional optimization and AI-based approaches. First, we deploy an exhaustive search approach, whereas other published papers have used other techniques, such as matching theory [16,30,31]. Meanwhile, introducing the AI methodology will improve the overall system performance parameters, such as the spectral efficiency (SE), power consumption, and allocation scheme. Therefore, the paper introduces a replacement for the previously used model, based on the exhaustive search approach, by means of the DRL deployment scenario. The main contributions of this work can be summarized as follows.

Problem formulation and benchmarking: We formulate the joint subcarrier (SC) assignment and power allocation problem specifically for NOMA downlink V2X communication systems. The paper introduces an enhanced conventional algorithm that utilizes an exhaustive search approach. Unlike concurrent work that utilizes local optimization techniques, the proposed enhanced conventional algorithm conducts an exhaustive search over the subcarrier assignment space together with iterative power optimization. Therefore, the algorithm can find a near-optimal solution with respect to the considered search space and system assumptions, at the cost of considerably increased computational complexity. The enhanced conventional methodology is initially applied to solve this problem, serving as a baseline for performance comparison.
Proposal of Multi-DQN framework: This paper introduces a novel Multi-Agent Deep Q-Network (Multi-DQN) framework designed to optimize subcarrier (SC) assignment and power allocation in V2X environments. Unlike the conventional models presented in [10], our approach utilizes a decentralized architecture where each VUE pair operates as an autonomous DQN agent. By shifting the decision-making process to the edge, the model significantly reduces the cross-correlation between VUE and CUE operations, thereby simultaneously enhancing the performance and reliability of both user groups.
Performance evaluation: We benchmark the proposed Multi-DQN agent against established conventional models. Our simulation results reveal a dual advantage: the Multi-DQN approach not only achieves superior spectral efficiency (SE) but also minimizes the computational complexity, proving that it is both more effective and more scalable than current benchmarks. Moreover, a comprehensive performance evaluation including EE, fairness, and convergence behavior is provided. Furthermore, we investigate the system performance in different environments under both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios, addressing a gap in prior research.

The remainder of this paper is organized as follows. Section 2 describes the details of the system model. Section 3 outlines the problem formulation and discusses solutions derived using enhanced conventional optimization methodologies. Section 4 presents the structure of the V2X Multi-DQN agent framework. Section 5 provides a comparative analysis of the simulation results between enhanced conventional optimization and the proposed V2X Multi-DQN algorithms. Finally, Section 6 concludes the paper.

2. System Model

Consider a V2X communication system enhanced with downlink NOMA functionalities, as illustrated in Figure 1.

The system model comprises a roadside unit (RSU) equipped with N subcarriers (SCs) to serve M cellular user equipment (CUE), designated as V2I users, and Q pairs of vehicle user equipment (VUE), referred to as V2V users. Hence, CUE represents vehicles or users that communicate (either uplink or downlink) directly with the infrastructure (V2I links or V2N links). Furthermore, VUE pairs are vehicles that communicate with other vehicles directly via V2V links. In practical V2X environments, a single vehicle dynamically alternates between infrastructure communication (CUE mode) and V2V communication (VUE mode); this process is known as mode selection. In this study, it is assumed that the mode selection phase is predetermined by higher-layer protocols, allowing this work to focus on optimizing the joint subcarrier and power allocation for the established CUE and VUE links. Table 3 shows all the parameters that will be used in this paper.

To adequately satisfy the significant link capacity requirements of the M CUE, a power-domain NOMA method is employed for connecting with the RSU. In a traditional NOMA system, the allocation of a single SC among multiple CUE is standard practice. Thus, it is assumed that each CUE is assigned at most one SC. The comprehensive CUE set is represented as

M = {1,2, 3, \dots, m}

, concurrently with Q pairs of VUE, designated as

Q = {1,2, 3, \dots, q}

, which may utilize the identical downlink frequency resources allocated for CUE. Each VUE pair interacts via D2D communication. Moreover, the SC carrier allocated to a specific VUE pair cannot be utilized by another VUE pair, hence reducing the complexity and interference for both CUE and VUE pairs. In the NOMA downlink of a V2X communication system, the CUE receives messages from the RSU and an adjacent VUE pair on an identical subcarrier

{S C}_{n}

. Consequently, the received signal

y_{i}

at CUE i on

{S C}_{n}

is expressed as follows:

y_{i} = \sum_{i \in M_{n}} \sqrt{p_{i}} H_{i, B}^{n} s_{i} + \sqrt{p_{j}^{v}} H_{i, j}^{n} s_{j}^{n} + n_{o},

(1)

where

p_{i}

and

s_{i}

are the allocated power and signal for the CUE;

p_{j}^{v}

and

s_{j}^{v}

are the transmit power and signal of VUE pair

j

;

n_{o}

is additive white Gaussian noise with variance

σ^{2}

. Moreover,

H_{i, B}^{n}

is the channel gain between CUE

i

and the BS using

{S C}_{n}

;

H_{i, j}^{n}

is the channel gain between CUE

i

and the transmitter of VUE pair

j

using

{S C}_{n}

. Therefore, the channel gains

H_{i, B}^{n}, H_{i, j}^{n}, H_{j}^{n}, H_{j, B}^{n}

are all expressed with the same equation as follows

H_{i, B}^{n} = {| h_{i, B}^{n} |}^{2} β_{i, B} {P L}^{- α},

(2)

where

{| h_{i, B}^{n} |}^{2}

denotes the small-scale fast fading (Rayleigh coefficient) component,

β_{i, B}

is log-normal shadowing with standard deviation

ξ

,

P L

is the path loss model, and

α

is the path loss exponent. Using the same definitions, all other channel gains are expressed with the same equation. The channel gain between VUE pair

j

using

{S C}_{n}

is expressed by

H_{j}^{n}

, and

H_{j, B}^{n}

is the channel gain between the BS and the receiver of VUE pair

j

using

{S C}_{n}

.

In downlink NOMA, a strategic approach is employed where CUE with a poor channel gain is allocated greater transmit power than one with better channel gain. The interference level originating from the signals designated for CUE with poorer channel gains, all utilizing the same SC, can be diminished through a successive interference cancellation (SIC) methodology. Ideal SIC with perfect channel state information (CSI) is assumed in this work. However, imperfect SIC and channel estimation significantly degrade the system performance in highly dynamic V2X environments. Thus, the CUE with the smallest channel gain faces interference originating from all other CUE allocated to the same SC, including the VUE pair utilizing an identical SC. Conversely, the CUE with the maximum channel gain experiences interference solely from the VUE pair. The presence of the VUE pairs disrupts the decoding sequence of the CUE. Consequently, both

x_{i, n}

and

x_{j, n}^{v}

are indicator functions to indicate which CUE and VUE pairs are allocated to which

{S C}_{n}

, respectively.

x_{i, n}

is the CUE indicator function indicating whether

{S C}_{n}

is allocated to CUE

i

or not, which is an element of the binary matrix

X_{i}^{n}

.

x_{i, n}

is described as follows:

x_{i, n} = {\begin{array}{l} 1, i f S C n i s a l l o c a t e d t o C U E i \\ 0, o t h e r w i s e \end{array} .

(3)

Likewise, for VUE, the indicator function will be

x_{j, n}^{v}

, which is an element of the binary matrix

X_{j}^{n}

for SC assignment:

x_{j, n}^{v} = {\begin{array}{l} 1, i f S C n i s a l l o c a t e d t o V U E p a i r j \\ 0, o t h e r w i s e \end{array} .

(4)

Consequently, without loss of generality, the channel gains of the CUE are arranged in descending order as

H_{1, B}^{n} \geq H_{2, B}^{n} \geq H_{3, B}^{n} \geq \dots \geq H_{m, B}^{n}

, where

m_{n}

is the number of CUE on an identical SC. The signal-to-interference plus noise ratio (SINR) of CUE

i

on

{S C}_{n}

can be expressed as follows:

γ_{i, n} = \frac{p_{i} H_{i, B}^{n}}{σ^{2} + \sum_{k \neq 1} p_{k} H_{i, B}^{n} + \sum_{j = 1}^{Q} x_{j, n}^{v} p_{j}^{v} H_{i, j}^{n}},

(5)

If the

{S C}_{n}

bandwidth is

W

, the achievable data rate of CUE

i

on

{S C}_{n}

can be expressed as

R_{i, n} = W \log_{2} (1 + γ_{i, n}) .

(6)

For the VUE pair, the SINR and the achievable data rate of VUE pair j on

{S C}_{n}

(by assuming one VUE pair at most assigned to the SC) can be written as

γ_{j, n}^{v} = \frac{p_{j}^{v} H_{j}^{n}}{σ^{2} + \sum_{i = 1}^{m_{n}} x_{i, n} p_{i} H_{j, B}^{n}},

(7)

R_{j, n}^{v} = W {l o g}_{2} (1 + γ_{j, n}^{v}) .

(8)

3. Enhanced Conventional Optimization Algorithm

This section proposes an enhanced conventional optimization algorithm to find the optimum solution for subcarrier allocation and power allocation problem for a downlink NOMA-V2X system to serve both VUE and CUE. The power and resource allocation problem addresses the needs of V2I and V2V links and concurrently distributes the resources for CUE and VUE pairs. V2I links require a high data rate for CUE users, while V2V links’ main requirement is link reliability for VUE pairs. Therefore, the main goal is to increase the CUE data rate while maintaining the minimum SINR requirement for VUE pairs. To satisfy the requirements mentioned above for the resource allocation problem in the downlink NOMA-V2X communication system, the optimization problem can be expressed as follows:

\underset{X, X^{v}, p, p^{v}}{m a x} \underset{i \in M}{m i n} R_{i, n}

(9a)

γ_{j, v}^{n} \geq γ_{o}

(9b)

p_{j}^{v} \leq P_{m a x}

(9c)

\sum_{i = 1}^{M} p_{i} \leq P_{t o t a l}

(9d)

\sum_{n = 1}^{N} x_{i, n} = 1, x_{i, n} = {0,1}, \forall i

(9e)

\sum_{i = 1}^{M} x_{i, n} = m_{n}, \forall n

(9f)

\sum_{n = 1}^{N} x_{j, n}^{v} = 1, x_{j, n}^{v} = {0,1}, \forall j

(9g)

\sum_{j = 1}^{K} x_{j, n}^{v} \leq 1, \forall n

(9h)

where

p = {p_{1}, p_{2}, p_{3}, \dots, p_{M}}^{T}

and

p^{v} = {p_{1}^{v}, p_{2}^{v}, p_{3}^{v}, \dots, p_{K}^{v}}^{T}

are the power allocation matrices for CUE users and VUE pairs, respectively. Constraint (9b) ensures that the VUE pair’s SINR requirement is met to minimize the interference at VUE pairs. Constraint (9c) guarantees that the VUE transmitted power does not exceed

P_{m a x}

. Constraint (9d) indicates that the sum of the transmitted power to all CUE users does not exceed the total transmitted power from the BS. Constraint (9e,f) ensure that each CUE user is assigned to a single subcarrier and that multiple CUE users can be assigned to the same subcarrier according to the NOMA concept. Constraint (9g,h) show that each VUE pair is assigned only one SC and that each VUE pair is assigned to at most one SC.

The formulated optimization problem is a mixed-integer non-linear programming (MINLP) problem because of the coupled subcarrier assignment and power allocation variables [32]. Such problems are usually non-convex and are challenging to solve optimally for large-scale scenarios [33]. In this work, the proposed enhanced conventional algorithm iteratively solves the associated power allocation problem and performs an exhaustive search over the feasible subcarrier assignment combinations. Therefore, the obtained solution can be near-optimal in the feasible search space explored under the considered system assumptions. However, the computational complexity of the system is

O ({(N!)}^{M})

, which increases exponentially with the network size; this motivates the development of the proposed DRL-based framework.

The proposed enhanced conventional optimization algorithm is shown in Algorithm 1, which illustrates the solution of the optimization problem in (9a). In this algorithm, the SC and power allocation are jointly optimized in an iterative feasible space exploration approach to satisfy the QoS and interference constraints of the V2I and V2V links. The inputs to Algorithm 1 are the available SC, the number of CUE users and VUE pairs, all channel gains on all links, the maximum VUE transmit power, and the RSU total transmitted power. The expected output of the algorithm is the allocated power and SC assignment for both the CUE users and VUE pairs.

Algorithm 1: Enhanced conventional algorithm

Inputs:

N, M, Q, H_{i, B}^{n}, H_{i, j}^{n}, H_{j}^{n}, H_{j, B}^{n}, P_{m a x}, P_{t o t a l}

1: Step 1: Swapping Step

2: for each swap do

3: Step 1a: Apply Algorithm 2
4: Step 1b: Apply Algorithm 3

5: end for

6: Step 2: Find

τ^{*}

.

7: Step 3: Find

X

8: Step 4: Find

P

9: Step 5: Find

X^{v}

and

p^{v}

Outputs:

X

,

P

,

X^{v}

,

p^{v}

Furthermore, included in this algorithm are two sub-algorithms called Algorithm 2 and Algorithm 3, whose details will be discussed later in this section. These two algorithms are considered as key components of the main algorithm to improve its overall efficiency and performance.

Algorithm 2: Power allocation for CUE users

Inputs:

p^{v} = P_{m a x}

,

Χ

,

Χ^{v}

1: While

| τ^{u p} - τ^{d o w n} | > e r r o r

do

2:

τ (k) = (τ^{d o w n} + τ^{u p}) / 2

3: Find the spectral radius of

τ (k)

using Equation (13)
4: if

f (τ (k)) > 1

do

5:

τ^{d o w n} = τ (k)

6: else

7:

τ^{u p} = τ (k)

8: end if

9: end while

10:

τ^{*} = τ (k)

Output: optimum power allocated vector

p^{*}

as shown in Equation (14)

Similarly to [16], the proposed algorithm starts with the swapping step, which finds all possible SC assignments for CUE users. In step 2, each CUE SC assignment is tested to find the optimum power allocation for CUE, the optimum SC assignment for VUE, and the power allocated to the VUE pair. At the beginning of the algorithm (before the swapping step), the CUE users assume that the VUE pairs are transmitting

P_{m a x}

. In steps 1a and 1b the power allocation algorithm for CUE users is applied, as shown in Algorithm 2, and the power allocation and SC assignment algorithm for VUE pairs is applied, as shown in Algorithm 3, respectively.

In step 1a, Algorithm 2 is applied. We assume that CUE users have different weights. To find the optimum control rate

(τ^{*})

, a binary search algorithm is used to find the optimum allocation binary matrix

X_{i}^{n}

. The control rate is defined as the minimum weighted rate among all CUE in the system. Therefore, the control rate is assumed to be defined as

τ = \underset{i \in M}{m i n} \frac{R_{i}}{w_{i}}

, and

τ \leq \frac{\sum_{i = 1}^{M} R_{i}}{\sum_{i = 1}^{M} w_{i}},

(10)

The numerator in (10) can be assumed to be the system throughput of CUE users. Therefore, to determine the upper bound for the system throughput, we assume that the CUE

l

with the highest channel gain among all CUE users is assigned the total transmit power. It appears that interference among CUE and VUE pairs results in

\sum_{i = 1}^{M} R_{i} < {l o g}_{2} (1 + \frac{P_{t o t a l} H_{l, B}^{n}}{σ^{2}}),

(11)

So, the upper bound of the search interval of the binary search algorithm can be defined as follows:

τ^{u p} = \frac{{l o g}_{2} (1 + \frac{P_{t o t a l} H_{l, B}^{n}}{σ^{2}})}{\sum_{i = 1}^{M} w_{i}},

(12)

Since

τ^{*} > 0

, we can assume the lower bound of the search interval to be

τ^{d o w n} = 0

, resulting in the search interval being limited to

(τ^{u p}, τ^{d o w n})

. To find the optimal control rate, we must find the spectral radius as illustrated in [16]:

f (τ) = m a x {ρ ((d i a g (2^{τ w}) - I) (B + \frac{1}{q_{j}} v e_{j}^{T} Q))},

(13)

Therefore, to find

τ^{*}

, we start by applying the bisection method, as shown in Algorithm 2. After obtaining

τ^{*}

, the optimum power allocation vector

p^{*}

is calculated using the following equation:

p^{*} = {(I - (d i a g (2^{τ^{*} w}) - I) B)}^{- 1} (d i a g (2^{τ^{*} w}) - I) v .

(14)

In this part (step 1b), we apply Algorithm 3—the power allocation algorithm for VUE pairs and SC assignment for VUE pairs. For the current swapping (SC assignment for CUE user), the power allocated to the CUE users is known. Then, we apply Algorithm 3 to find the power allocation and SC assignment for VUE pairs. VUE SC assignment is modeled as a bipartite graph matching problem [16] and will be solved based on maximum weight matching (MWM), as shown in Figure 2. The vertex set of the bipartite graph shown in Figure 2 is divided into two subsets: the SC set and the VUE pair set. For each edge (link) between the SC and VUE pair, the reliability of the link is tested by calculating

p_{j}^{v}

, as shown below:

p_{j}^{v} = \frac{γ_{0} (\sum_{i = 1}^{m_{n}} p_{i} H_{j, B}^{n} + σ^{2})}{H_{j}^{n}}

(15)

The solid edge (link) meets constraint (9c) in optimization, which means that

p_{j}^{v} < P_{m a x}

and SC 1 can be assigned to VUE pair 2, as shown in Figure 2. On the other hand, a dashed line indicates that constraint (9c) is not met, which means that

p_{j}^{v} > P_{m a x}

and SC 2 cannot be assigned to VUE pair 2, as shown in Figure 2. Therefore, any SC that is connected to a solid edge is the only SC that can be assigned to the VUE pair. However, each solid edge is weighted with

\underset{i \in M_{n}}{m i n} \frac{R_{i, n}}{w_{i}}

, where the weighted rate is calculated for each CUE user assigned to the SC. Therefore, to solve the SC assignment problem, we apply the Kuhn–Munkres algorithm [34]. The power allocation algorithm for VUE pairs and SC assignment for VUE pairs is presented in Algorithm 3. Then, Algorithm 2 and Algorithm 3 are repeated for all possible SC for CUE users (swapping phase), which is step 1 in the proposed enhanced conventional optimization algorithm shown in Algorithm 1. Furthermore, at the end of the swapping step, the maximum optimum control rate is calculated, and, according to this value, the optimum SC assignment and power allocation vector for CUE users and the SC assignment and power allocation vector for VUE pairs are found.

Algorithm 3: Power allocation algorithm for VUE pairs and SC assignment for VUE pairs

Inputs: Power allocation and SC assignment for CUE users

p, X

1: for n = 1:N (each SC) do

2: for j = 1:Q (each VUE pair) do

3: Find the reliability of each link between

{S C}_{n}

and VUE pair

j

by calculating

4:

p_{j}^{v}

using Equation (15)

5: if

p_{j}^{v} < P_{m a x}

, do

6: calculate

\underset{i \in M_{n}}{m i n} \frac{R_{i, n}}{w_{i}}

7: end if

8: end for

9: end for

10: Apply Kuhn–Munkres Algorithm to find

X^{v}

and

p^{v}

Output: Find

p^{v}

and

X^{v}

4. DRL Approach for VUE Pair Resource Allocation

In this section, the basics of DRL are presented along with its most recent advancements. Additionally, the optimization problem in Equation (9a) is mapped into a DRL environment, and a decentralized method based on DRL is suggested to address the VUE SC assignment and power allocation problem.

4.1. DRL Definitions and Preliminaries

Reinforcement learning (RL) is a key area of machine learning that considers how to improve action policy and make judgments that adapt to changes in the environment by having a smart agent interact with it often [35]. RL can generally be defined by the state space, action space, transition probabilities, and immediate rewards. Furthermore, RL can be split into model-based and model-free learning based on what is already known about the environment and the next state. In model-based learning, the environment, state space, and action space are modeled as an MDP, and the learning process anticipates the next state depending on the transmission state defined in the MDP. However, model-free algorithms have attracted a lot of research interest since the transition probability and rewards are often uncertain in real-world settings when the environment is not modeled as an MDP [36]. Deep neural networks (DNNs) and RL are combined to create deep reinforcement learning (DRL) to adjust to a large-scale, continuously changing environment [37]. A DNN establishes a Q-table, where updates to the Q-table are converted into updates to the DNN’s network weights. There are two advanced methods, called the experience replay technique and fixed target network, which have been created for DRL to speed up the training process and improve convergence. The experience replay technique updates DRL models with randomly chosen transition histories to break the links between continuous transition tuples [28]. In the fixed target network technique, a target Q-network is created to guess the target Q-value, and a delayed update to the target Q-network is used to speed up and stabilize the training process. Therefore, DRL can be used in situations where the state space has continuous values.

As mentioned earlier, the current environment is more complicated; therefore, the Deep Q-Network (DQN) algorithm is implemented. The Deep Q-Network (DQN) algorithm is an off-policy reinforcement learning technique designed for situations characterized by a discrete action space. A DQN agent trains a Q-value function critic to approximate the value of the optimal policy, while adhering to an epsilon-greedy policy informed by the critic’s estimations. Epsilon-greedy is considered as an action strategy in order to balance between exploration and exploitation [35,38]. DQN is a modification of Q-learning that incorporates a target critic and an experience replay buffer.

The definition of the DQN agent includes several options (hyperparameters) that must be defined, starting with the discount factor, which indicates the relative value of future rewards compared to immediate ones. When the value of the discount factor equals 0, this means that the agent is concerned with the first reward only, while, as the value approaches 1, the agent is more concerned with future rewards [35]. The exploration–exploitation method is commonly implemented as an epsilon-greedy policy. Throughout training, ε is often annealed to ensure sufficient exploration during the initial phases [39]. Furthermore, the experience buffer size in DRL denotes the maximum number of experiences (transitions) that can be stored in the replay buffer, also referred to as the experience replay buffer. When the agent requires learning, it does not solely rely on the most recent event; it selects random prior experiences from the buffer [35]. The mini-batch size indicates the number of samples to be drawn from the buffer for each gradient update. Moderate batch sizes enhance the computational efficiency and diminish volatility [40]. The target network update frequency indicates how frequently the target network should synchronize its weights with the online network (hard update) or gradually adjust them using a smoothing factor (Polyak averaging) [39].

4.2. Multi-DQN Agent Decentralized DRL Algorithm

DRL is a key part of ML that uses a DNN to make RL more efficient. In this section, a DQN agent based on a decentralized DRL algorithm is presented to solve the resource allocation and power allocation for VUE pairs. A decentralized DRL algorithm enables multiple VUE agents to learn and make decisions independently based on local observations from the environment, rather than relying on a central unit, which is the RSU. Therefore, using decentralized DRL improves scalability and results in faster computational speeds and reduced communication overhead [41]. The main requirements for V2V pairs are the latency and reliability requirements. The latency and reliability requirements of V2V pairs, resource management among V2V pairs, and unreliable V2V links add considerable complexity to interference control, as well as a continuous-value state space and wide action space. Due to this complexity, we transform the optimization problem in (9a) into an MDP problem based on the main principle of DRL. The DRL training framework, with a DQN agent for the resource allocation optimization of VUE pairs, is illustrated in Figure 3. In this scenario, every VUE pair is considered as an intelligent DRL agent that communicates with the environment. The training process is an iterative interaction loop between the DQN agent and the wireless network environment at discrete time steps

t

. At each time step, the agent receives a continuous state space

s (t)

that describes the current network conditions, such as CUE power allocation, channel gains, SC assignments, and the measured interference levels on CUE and VUE. Based on the state observation processed by its neural network, the agent decides on an action,

a (t)

, from the action space defined to select a particular SC and allocate transmitted power to the VUE pair. After executing

a (t)

in the environment, the network states change and the agent receives an immediate reward,

r (t)

, which evaluates the action performance. Usually, the reward is given for successful VUE transmissions, while it is deducted for intense interference caused by the CUE. These experiences are then gradually used to update the neural network weights, by iterating through the experiences, to improve the policy, approximate the optimal Q-values, and increase the expected cumulative long-term reward. Therefore, it is essential to define three key elements of the process in DRL: the continuous state space, the action space, and the reward.

State Space

The continuous (environment) state space is an essential part of policy learning, as it encompasses valuable information, including channel gains, interference levels, etc. The state space is represented as

S

, encompassing the states of all agents for each transmission time interval (TTI). The state space of a VUE pair agent at each TTI (subframe)

t

contains six parameters, namely the power allocated to CUE users, the sum of the power allocated to CUE users per subcarrier, the channel gain, the CUE user’s subcarrier assignment, interference at CUE users, and interference received at the VUE receiver. Specifically, the power allocated to CUE users at time slot

t

can be expressed as

p_{i} [t] = {p_{1} [t], p_{2} [t], \dots, p_{M} [t]}

, and the power on each SC is represented as

P_{n} [t] = {P_{1} [t], P_{2} [t], \dots, P_{N} [t]}

. Moreover, the channel gain for the j-th VUE pair agent on

{S C}_{n}

at time slot

t

is expressed as

G_{j}^{n} [t] = {H_{i, B}^{n} [t], H_{i, j}^{n} [t], H_{j}^{n} [t], H_{j, B}^{n} [t]}

, which includes the channel gain between the i-th CUE user and the BS using

{S C}_{n}

, the channel gain between the current j-th VUE transmitter and the i-th CUE user, the channel gain of the current j-th VUE pair, and the channel gain between the j-th VUE receiver and the BS. The CUE SC assignment at the N SCs is represented as

X_{i}^{n} [t] = {x_{1}^{n} [t], x_{2}^{n} [t], \dots, x_{M}^{n} [t]}

. Finally, the interference power at the VUE pair and the CUE users on each SC at time slot

t

can be written as

I_{j}^{n} [t] = {I_{1}^{n} [t], I_{2}^{n} [t], \dots, I_{Q}^{n} [t]}

,

I_{i}^{n} [t] = {I_{1}^{n} [t], I_{2,}^{n} [t], \dots, I_{M}^{n} [t]}

, respectively. Therefore, the continuous (environment) state

s_{t}

at time slot

t

on

{S C}_{n}

for the j-th VUE pair agent is expressed as follows [28]:

s_{t} = {p_{i} [t], P_{n} [t], G_{j}^{n} [t], X_{i}^{n} [t], I_{j}^{n} [t], I_{i}^{n} [t]}

(16)

Action State
Each VUE pair agent will independently determine its SC assignment selection $n$ and the power allocated for the VUE pair $p_{j}^{v}$ based on the observed state from the environment. Therefore, we define the action space for each VUE pair agent as $a_{t} = {n, p_{j}^{v}}$ $n \in N$ , and $p_{j}^{v} \in {0, \frac{1}{L_{p} - 1} P_{m a x}, \frac{2}{L_{p} - 1} P_{m a x}, \dots, P_{m a x}}$ , which represents the VUE pair agent SC allocation and the power level of the VUE pair agent, respectively. The transmit power of VUE has $L_{p}$ power levels. Consequently, the size of the action space is $N L_{p}$ .
Although transmit power allocation is naturally a continuous variable, the suggested framework divides the VUE transmit power into a limited set of predetermined power levels to be compatible with the DQN architecture. The DQN architecture works on discrete action spaces. This discretization greatly decreases the action space dimensionality and enhances the training stability and convergence behavior.
Reward
One of the main advantages of DRL is its ability to address decision-making problems and to build a flexible reward structure to deal with complex, multi-constraint objectives and constrained problems [25]. The environment will provide an immediate reward after the agent executes an action in accordance with the policy and observed state. The reward indicates the success of the decision taken by the proposed policy. Therefore, the reward function is expressed as follows:

$r_{t} = c_{1} \frac{\sum_{i \in M} R_{i, n}}{B W} + c_{2} \sum_{j \in Q} (R_{j, n}^{v} > R_{o}) + c_{3} \sum_{j \in Q} (γ_{j, n}^{v} \geq γ_{o})$

(17)

The first term in (17) aims to maximize the spectral efficiency of the CUE users to enhance the overall system throughput. The second term guarantees the achievable data rate of each VUE pair to satisfy the minimum Quality of Service (QoS) requirement. The third term indicates the reliability constraint by the minimum SINR threshold. The weighting coefficients

c_{1}, c_{2}, a n d c_{3}

were chosen empirically, giving priority to reliable V2V communication while ensuring high spectral efficiency for V2I users. The multi-objective reward formulation allows the DQN agent to learn a balanced resource allocation policy in wireless environments. At the beginning of each episode (subframe

t

), each VUE agent starts to assess its own state

s_{t}

; it then performs its action in terms of SC assignment and power allocation for VUE pairs based on the action value function

Q (s_{t}, a_{t}; θ)

. In our framework, the action value function can be defined as in [28]:

Q (s_{t}, a_{t}; θ) = E [\sum_{t^{'} = t}^{T} γ^{t^{'} - t} r_{t^{'}} | s_{t} = s, a_{t} = a; θ],

(18)

where

T

is the terminal step of each episode, and

0 < γ < 1

is the discount factor that reflects the influence of future rewards.

E [.]

is the expected value,

s_{t}

is the current state,

a_{t}

is the action,

r_{t}

is the reward, and

θ

is the weight.

Subsequently, following the actions executed by various agents, the environment transitions to a new state

s_{t + 1}

after the agent receives the immediate reward

r_{t}

from the environment. According to the actions taken by the agent, each agent calculates the immediate reward based on Equation (17). Depending on

r_{t}

and

s_{t + 1}

, VUE pairs can adjust the weights of the DQN by minimizing the loss function

L (θ)

at each step of the episode. Similarly to [28], the mean square error is selected as the loss function:

L (θ) = E {{(y_{t} - Q (s_{t}, a_{t}; θ))}^{2}},

(19)

where

y_{t} = r_{t} + γ \max_{a_{t + 1}} \hat{Q} (s_{t + 1}, a_{t + 1}; θ^{-})

. Meanwhile,

\hat{Q} (s_{t + 1}, a_{t + 1}; θ^{-})

is the target Q-network value.

The DRL processes used to address the original problem are outlined in Algorithm 4. The epsilon-greedy policy indicates that the agent randomly selects an action

a_{t} \in A

with a probability of

ϵ

, while selecting the optimal action according to

a_{t} = \underset{a}{argmax} Q (s_{t}, a_{t}; θ)

with a probability of

1 - ϵ

. Therefore,

ϵ

represents the exploration factor.

Algorithm 4 shows the training procedure of the Multi-DQN agent decentralized DRL algorithm. Each VUE agent independently interacts with the environment, observes the state of the system, selects actions according to the epsilon-greedy policy, and updates its DQN parameters based on the rewards that it receives and the experiences that it replays. The training process starts with the random initialization of the DQN parameters and the replay memory. In each training episode, each VUE agent observes the current network state and takes an action consisting of SC assignment and power allocation from the action pool according to the epsilon-greedy policy. The environment then returns the appropriate reward and the new system state. The collected transition tuples are stored in the replay memory and sampled periodically to update the DQN parameters by mini-batch gradient descent. The target network parameters are periodically synchronized to stabilize the learning process.

Algorithm 4: Multi-DQN agent decentralized DRL algorithm

Inputs: Discount factor

γ

, learning rate

β

, replay memory size, target update frequency, epsilon-greedy, mini-batch size, number of episodes

Initialize: Initialize the DNN for each agent with random weights

θ

the same as the action value function

Q (s, a, θ)

. Initialize the discrete state space and the discrete action space. Then, the VUE pairs randomly select actions until sorting number of transitions in the replay memory.

1: for each episode do

2: Observe the state

s_{1}

3: for each step in each episode do

4: Each DQN agent (VUE pair) selects an action

a_{t}

5: Obtain the current reward

r_{t}

and next state

s_{t + 1}

, then store the transition
tuples

(s_{t}, a_{t}, r_{t}, s_{t + 1})

in the replay memory

6: end for

7: end for

The proposed Multi-DQN framework is equipped with several stabilization mechanisms to improve the training stability and avoid divergence during the learning process. First, DRL employs experience replay to interpret the training samples and improve data efficiency by randomly sampling transition tuples from the replay memory. Second, a target network with periodic parameter updates is employed to update and stabilize convergence. In addition, the epsilon-greedy exploration policy shifts from exploration to exploitation during training, facilitating efficient policy improvement and preventing convergence to poor actions prematurely.

4.3. Computational Complexity

The enhanced conventional algorithm performs an exhaustive search of all feasible SC assignment combinations and then iteratively optimizes the power. Therefore, the computational complexity exponentially increases with the number of users and SCs. On the other hand, the computational complexity of the proposed Multi-DQN framework mostly comes from the forward propagation mechanism of the fully connected DNN. The architecture of the DNN used consists of an input layer, three fully connected hidden layers, and an output layer. The total computational complexity of the Multi-DQN framework can be roughly described as

O (Q (2 Q + M) + N) .

(20)

The expression

Q (2 Q + M) + N

accounts for the complexity of the states coming from the environment, interference evaluation, and action selection for all VUE agents. In particular, the

2 Q

term reflects the interaction and interference among VUE pairs, whereas the

M

term captures the effects of CUE users on the shared wireless resources. The term

N

also refers to the operations for distributing subcarriers across the available spectrum resources. Since the proposed Multi-DQN framework realizes the decentralized decision-making process by forward propagation only in the online deployment phase, the corresponding computational complexity increases polynomially with the network size, which is considerably lower than that of the enhanced conventional algorithm based on the exhaustive search. Thus, the proposed framework is computationally efficient and suitable for real-time V2X resource allocation in highly vehicular situations. Table 4 summarizes the computational complexity of the enhanced conventional algorithm and the proposed Multi-DQN agent framework.

5. Simulation Results

In this section, the simulation results are introduced to verify the effectiveness of the proposed conventional model and the proposed DRL framework. Extensive simulations were conducted using MATLAB 2024. The simulation parameters are presented in Table 5, which are consistent with the 3GPP R-18 standards. The system performance is evaluated through two simulation scenarios designed to assess both validity and robustness. First, to verify the correctness of the proposed model, we benchmark the enhanced conventional algorithm and the DQN VUE agent against the results established in [16]. Second, we analyze the operational behavior of both the proposed conventional model and DRL DQN VUE agent under varying wireless channel conditions, specifically comparing line-of-sight (LOS) and non-line-of-sight (NLOS) environments.

To ensure consistency with the work used in the comparison, the path loss model

128.1 + 37.6 \log_{10} (d i n k m)

and

148.1 + 40 \log_{10} (d i n k m)

is used for the CUE and VUE, respectively. Moreover, the RSU transmitted power is set to

40 d b m

. Furthermore, all additional factors pertaining to the DQN VUE agent are enumerated in Table 6. The implemented DNN in the simulation is a fully connected neural network comprising an input layer, three hidden layers, and an output layer. The numbers of neurons in the hidden layers are 500, 250, and 120, respectively [25,26,29], with a ReLu activation function. The Adam optimizer has been employed in this simulation.

The selected hyperparameters were empirically chosen according to the training stability and convergence performance. The discount factor was set to 0.99 to take long-term rewards into consideration in the V2X communication environment, and the learning rate was set to 0.01 to balance the convergence speed and stability. The replay memory size and mini-batch size were chosen to provide sufficient training diversity and computational efficiency.

In alignment with [16], the RSU is placed at the center of the road. All operational parameters in [16] are used in this work for comparison. Throughout the simulation, it is assumed that there will be 4 SCs (N = 4), 12 CUE (M = 12), and 4 VUE (Q = 4). To simplify the SIC decoding, we assume that the number of CUE users assigned to each SC will be the same and equal to

M / N

.

5.1. DRL Framework and Enhanced Conventional Model Benchmarking

Figure 4 shows that the total allocated power for CUE users decreases as the SINR requirement for VUE increases, which means that the interference caused by VUE pairs is increased, affecting the power allocated to the CUE users. For the proposed enhanced conventional approach, the RSU is the main controlling part that coordinates between the VUE pairs and the CUE users. Therefore, the RSU must decrease its power to satisfy the VUE pairs’ SINR requirement. The results in Figure 4 show that the maximum power allocated for CUE is 35.5 dBm at a 0 dB SINR requirement for VUE pairs, and it decreases as the VUE pairs’ SINR requirement increases. On the other hand, the RSU in the DRL framework will operate separately from the DQN VUE agent. The DQN VUE agent will be handled by the environment in the DRL framework with respect to the RSU transmit power. Therefore, as shown in Figure 4, the RSU does not decrease the power to meet the SINR requirement of the DQN VUE agent.

Figure 5 illustrates the CUE average SE versus the minimum SINR requirement of VUE pairs. For the proposed enhanced conventional model, it shows that, as the SINR requirement for VUE increases, the CUE users’ SE decreases. Furthermore, from [16], the CUE’s SE decreases as the SINR requirement for VUE increases, but a at lower rate than in the proposed enhanced conventional model. This is due to the fact that the proposed enhanced conventional model allocates lower power to the CUE than that allocated in [16]. Moreover, the proposed DRL framework model achieves higher CUE SE than the other two models. Therefore, this proves that there is a tradeoff between the power allocated for CUE and the SE achieved by CUE. Overall, the proposed enhanced conventional model outperforms the other two models in terms of power consumption, at the cost of achieving lower SE. On the other hand, the proposed Multi-DQN Agent achieves higher power consumption as well as higher SE for CUE.

Moreover, the average SE of the CUE links versus the minimum SINR requirement of the VUE for the proposed Multi-DQN framework and the Hetero-V2X (MADQN) framework in [22] is shown in Figure 5. The proposed Multi-DQN agent always achieves higher SE for all SINR requirements. In particular, the proposed framework can achieve relatively stable SE performance even with more stringent SINR constraints, while Hetero-V2X (MADQN) suffers from obvious degradation with an increase in the SINR requirement.

Figure 6 illustrates the VUE average SE versus the minimum SINR requirement of VUE pairs. The proposed Multi-DQN agent framework achieves 5.3 bps/Hz at a 0 dB SINR requirement for VUE, which continues to increase as the SINR requirement increases. Moreover, the SE achieved with the DRL framework surpasses that of the proposed enhanced conventional model.

5.2. DRL Framework and Enhanced Conventional Model for LOS Scenario

In this part, the operational behavior of both the proposed enhanced conventional model and the DRL framework under varying wireless channel conditions, specifically comparing line-of-sight (LOS) and non-line-of-sight (NLOS) environments, is analyzed.

Figure 7 illustrates the system performance for the CUE average SE versus the RSU transmitted power in the case of the LOS path loss model. The proposed enhanced conventional model is compared to the proposed DRL framework. The test has been performed using different SINR VUE requirements across four scenarios (0, 5, 10, and 15 dB). In Figure 7a, it is shown that, at a 0 dB VUE SINR requirement, the RSU power increases and the CUE average SE from the DRL framework increases until it outperforms the enhanced conventional model at 17 dBm. Furthermore, as shown in Figure 7b, the DRL DQN VUE agent starts outperforming the enhanced conventional model at an earlier RSU power level (2.5 dBm). Moreover, as the SINR requirement of the VUE pairs increases, the crossover point that appears at a 0 dB and 5 dB requirement shifts backwards, which proves the consistency of the system.

Generally, as the RSU power increases, the DRL framework significantly outperforms the proposed enhanced conventional model. The slope of the DRL framework’s curves increases sharply, indicating it can effectively leverage the extra power offered by the RSU to boost the SE. In contrast, the proposed enhanced conventional model’s results tend to be saturated; it is limited by interference constraints from the VUE rather than power availability. However, using the DRL DQN VUE agent offers different power levels that can be used by the VUE to limit the interference over the CUE channels. The DRL framework still demonstrates a strong ability to optimize the CUE SE without violating the VUE constraint. The proposed enhanced conventional model struggles to improve the SE beyond 2.8 bps/Hz at 0 dB, even with the maximum power, likely because it conservatively manages interference to protect the sensitive VUE links. However, the DRL framework learns more complex interference patterns and improves the SE to up to 3.5 bps/Hz at 0 dB. If the system can afford higher transmit power, the DRL framework is far superior. It breaks through the interference-limited ceiling that limits the conventional model.

Figure 8 evaluates the performance of VUE pairs versus RSU transmitted power under LOS conditions, verifying that the DQN VUE agent still satisfies the safety-critical VUE constraints while RSU is simultaneously able to deliver significantly higher data rates for cellular users, as shown in Figure 7. In other words, the DRL framework can satisfy the QoS constraints of the V2V system. From Figure 8, the proposed enhanced conventional model keeps the SE of the VUE constant at 2.4 bps/Hz, 3.8 bps/Hz, 5.4 bps/Hz, and 7 bps/Hz, meeting the strict requirements regarding the VUE SINR at 0, 5, 10, and 15 dB, respectively. However, the DRL framework maintains a significant lead over the proposed enhanced conventional model regarding the SE of VUE, even at the maximum RSU power (highest interference for VUE). Unlike the CUE SE, which is improved as the RSU power increases, the DQN VUE agent’s performance generally degrades as the RSU power increases. The RSU transmits higher power to CUE users, whereas, for the DQN VUE agent, the RSU signal acts as interference. Therefore, higher RSU power lowers the SINR for the DQN VUE agents, reducing their SE. The DRL framework is highly effective in exploiting low-interference environments. When the RSU is transmitted at low power (0–10 dBm), the DRL framework maximizes the DQN VUE agent’s SE, achieving rates far higher than the requirement, while maintaining the performance of CUE users. However, the enhanced conventional model appears to view the SINR requirement as a target rather than a floor. It allocates power to strictly meet the reliability constraint (0, 5, 10, or 15 dB) but fails to utilize the excess capacity available when interference is low. As the reliability requirements become stricter (approaching 15 dB), the system has less freedom to optimize. Under the most severe conditions, with a strict 15 dB requirement on the SINR and high 20 dBm RSU interference, the flexible DQN VUE agent struggles to find a solution that is better than the rigid conventional scheme, leading to the crossover point seen in Figure 8d. However, the DRL framework still provides better efficiency than the enhanced conventional method baseline, which maintains a strict requirement for the DQN VUE agent, as noted.

5.3. DRL Framework and Enhanced Conventional Model for NLOS Scenario

Figure 9 evaluates the system performance by plotting the CUE average SE versus RSU transmitted power in the case of the NLOS path loss model. The proposed enhanced conventional model is compared against the proposed DRL framework. The test has been performed for different SINR VUE requirements using four scenarios (0, 5, 10, and 15 dB). From Figure 9a, it can be seen that, at a VUE SINR requirement of 0 dB, the enhanced conventional model achieves extremely low SE at 0.2 bps/Hz, which rises linearly to 3.2 bps/Hz at 25 dBm. However, utilizing the DRL DQN VUE agent improves the CUE SE, which starts significantly higher at 1.4 bps/Hz and increases linearly until it achieves 4.2 bps/Hz. Hence, the DRL framework reduces the interference over the CUE and provides an improvement in performance even at low power; it also maintains a consistent gap of about 1 bps/Hz over the enhanced conventional model throughout all ranges of RSU power.

Figure 10 evaluates the system performance by plotting the VUE average SE versus RSU transmitted power in the case of the NLOS path loss model. The test has been performed for different SINR VUE requirements using four scenarios (0, 5, 10, and 15 dB). In an NLOS environment, path loss likely attenuates the interfering signal from the RSU to the VUE more than it affects the VUE-to-VUE channel. Since the NLOS condition implies a worse channel, and the RSU acts as an interferer to the DQN VUE agent, the SE for the DQN VUE agent is significantly higher in NLOS scenarios than in LOS scenarios. From Figure 10, it can be seen that the DQN VUE agent achieves peak SE in the LOS scenario equal to 2.39 bps/Hz for the enhanced conventional model and 8.4 bps/Hz for the proposed DRL framework (at a 0 dB requirement). On the other hand, the DQN VUE agent achieves peak SE in NLOS scenarios equal to 3.98 bps/Hz for the enhanced conventional model and ~12.0 bps/Hz for the DRL framework (at a 0 dB requirement). Therefore, this shows that the system benefits from NLOS conditions for DQN VUE agents because the physical obstruction in case of NLOS naturally suppresses interference. The DQN VUE agent exploits this massive SINR advantage to push the data rates up to 13 bps/Hz, whereas the enhanced conventional algorithm fails to capitalize on it, staying flat at around 4–8.5 bps/Hz, as shown in Figure 10. Therefore, the DQN VUE agent learns the non-linear relationship between power and interference. It effectively performs opportunistic power allocation (based on the 11 power levels defined in the agent). The DRL framework can shift the DQN VUE agent to different resources or exploit the channel with lower CUE power to reduce the interference.

5.4. Overall System Performance

Figure 11 and Figure 12 illustrate the entire system CUE and VUE SE vs. RSU transmitted power. In Figure 12, the overall system SE is shown in the case of the LOS scenario. The interference from the RSU to the VUE is direct and strong, making interference management difficult. From Figure 11a, it can be seen that the enhanced conventional algorithm starts at ~5.5 bps/Hz and rises slowly; this is because the enhanced conventional algorithm cannot effectively balance the CUE gain against the VUE SINR losses. However, using the DQN VUE agent significantly increases the overall SE to ~9.3 bps/Hz; it then drops slightly as the power increases but remains superior. The decrease in SE is due to the drop in the VUE SE due to higher interference from the increase in RSU power.

The overall performance of the system is similarly shown in Figure 11a,b. The difference between the enhanced conventional model and DRL framework narrows slightly, as shown in Figure 11a, but the DQN VUE agent still offers a 35% gain overall. From Figure 11b at an SINR threshold of 10 dB, the enhanced conventional model reaches 6 bps/Hz. The enhanced conventional model prioritizes VUE, which keeps the overall system SE constant. Generally, in LOS environments, optimization using the enhanced conventional model faces a saturation point, while the DQN VUE agent achieves better performance at low reliability requirements (0 and 5 dB). The DQN VUE agent’s performance diminishes as the constraint approaches 15 dB. The strong direct interference limits the freedom of the DQN VUE agent to find optimum resource and power allocation without interfering with the CUE.

Figure 12 illustrates the overall SE vs. the RSU transmitted power in the case of the NLOS scenario. In the NLOS scenario, RSU interference is naturally attenuated by obstacles, creating an opportunity for the DRL DQN VUE agent to easily optimize the VUE performance. Figure 12a (at SINR = 0 dB) shows that the enhanced conventional model starts at 4.0 bps/Hz and rises marginally to 7.0 bps/Hz. It fails to notice that the interference is weak, so it remains conservative. However, the DQN VUE agent increases the overall system SE to 13.5 bps/Hz at 0 dB. Therefore, at low interference from the RSU (low power levels from the RSU), the VUE SE is maximized while serving the CUE users simultaneously. From Figure 12d (at SINR = 15 dB), the enhanced conventional model starts at 8.8 bps/Hz and rises marginally to 9.2 bps/Hz. However, the DQN VUE agent remains superior at 13.8 bps/Hz to 12.8 bps/Hz. Hence, the DQN VUE agent achieves superior performance over the proposed enhanced conventional model. The proposed enhanced conventional model fails to interact with the environment and exploit the natural interference degradation in the case of NLOS. However, the DQN VUE agent learns the channel characteristics—specifically, that the interference link is weak—and exploits them to double the system capacity. If the deployment is in a dense urban city (NLOS), the DQN VUE agent is mandatory for efficiency.

Figure 13a,b illustrate the percentage improvements in the overall system SE for the proposed DQN VUE agent in comparison to the enhanced conventional method for different SINRs and RSU Tx power and for both LOS and NLOS scenarios. The improvement is presented as a percentage of the minimal VUE SINR threshold for LOS and NLOS conditions. The enhancement enables a comparison between the SE gains of the DRL framework and the enhanced conventional model. In the LOS situation, the suggested DQN VUE agent consistently outperforms the enhanced conventional model, regardless of the transmit power level or SINR threshold. The enhancement is most significant when the SINR needs are minimal (at 0 dB or 5 dB). At the minimum transmitted power, it may reach up to 90%. As the SINR limitation becomes more stringent, the enhancement diminishes.

This trend arises from the increased difficulty in altering resource allocation when Quality of Service (QoS) standards are elevated. In the NLOS scenario, there are far greater spectral efficiency enhancements, particularly when the SINR restrictions are minimal, with improvements exceeding 200%. The DQN VUE agent demonstrates superior adaptability to adverse propagation conditions and significant interference. The suggested approach maintains a considerable performance advantage over the usual model across all operating points, although the enhancement diminishes as the SINR requirements increase. The results indicate that employing the DRL framework for resource management enhances the spectrum efficiency for V2X communications under both favorable and unfavorable channel conditions.

Figure 14 illustrates the average SE of both CUE and VUE pairs in relation to the number of training episodes for the proposed DQN VUE agent at various transmit power levels at SINR requirement 5 dB. Figure 14a,b show the CUE average SE and VUE average SE for the LOS scenario, while Figure 14c,d show the CUE average SE and VUE average SE for the NLOS scenario. The figure illustrates the learning process of the suggested agent and its convergence. Overall, the average SE exhibits significant fluctuations due to the exploratory behavior of the DQN VUE agent. The learning policy becomes increasingly stable with a greater number of training episodes. This enhances the smoothness of the SE trajectories and improves the achievement of steady-state values. The convergence typically occurs within a limited number of episodes, indicating that the suggested method learns and adapts rapidly. Moreover, elevated transmit power levels result in increased convergence among CUE SE values, indicating that the acquired strategy effectively utilizes the available transmitted power while maintaining stability in VUE and CUE links. The observed convergence tendency indicates that the proposed DQN VUE agent resource allocation paradigm is both stable and robust.

Furthermore, the Multi-DQN agent shows stable convergence behavior during training, with the accumulated reward increasing and stabilizing over episodes. In practice, convergence is achieved when the change in the cumulative episode reward is small enough across consecutive training episodes.

Figure 15 illustrates the convergence performance of the proposed DQN VUE agent in terms of the average cumulative reward versus the number of training episodes in both LOS and NLOS scenarios at an SINR requirement of 5 dB. The results demonstrate that the proposed Multi-DQN framework converges successfully for different RSU transmitted power levels. The agent experiences faster convergence and higher reward values in the case of NLOS conditions. On the other hand, the LOS scenario has slower convergence due to the higher channel uncertainty and larger interference fluctuations. In addition, Figure 15 compares the achieved maximum cumulative reward for multiple transmitted power levels (5, 10, and 20 dBm). This comparison indicates more efficient interference mitigation and resource allocation performance for lower transmitted power. In general, the proposed DQN framework demonstrates stable learning abilities and adaptability in NOMA-V2X environments.

5.5. Additional Performance Metric Evaluation

In addition to SE, the energy efficiency (EE) performance of the proposed Multi-DQN framework is evaluated as follows [42]:

E E = \frac{\sum_{i = 1}^{M} R_{i} + \sum_{j = 1}^{Q} R_{j}^{v}}{(P_{t o t a l} + \sum_{j = 1}^{Q} p_{j}^{v})}

(21)

where the numerator is the total achieved throughput of the system, while the denominator is the total transmitted power. The proposed Multi-DQN framework improves the EE by learning interference-aware power allocation policies that do not require unnecessary transmitted power, while guaranteeing the QoS requirements of both V2I and V2V links.

Figure 16 shows the EE performance of the proposed Multi-DQN framework versus the RSU transmitted power for different minimum SINR thresholds in the LOS and NLOS V2X environments. The results for both scenarios show that the EE slowly decreases as excessive transmitted power causes more interference and additional energy consumption. Moreover, a tighter SINR requirement and VUE constraint results in a significant decrease in EE as the proposed agent must allocate more transmitted power to meet the higher QoS requirements. Furthermore, the comparison between the LOS and the NLOS scenarios shows that the NLOS environment always has better EE performance due to the lower interference between VUE and CUE, which improves the EE. On the other hand, the LOS case is degraded with respect to the EE due to the higher interference levels. The proposed Multi-DQN agent demonstrates stable adaptation capabilities and efficient power allocation under both propagation conditions.

Moreover, the fairness metric can be used to assess the fairness of the proposed framework when assigning radio resources to users and preventing excessive throughput concentration to users with good channel conditions. Therefore, to evaluate the resource allocation fairness among VUE users, Jain’s fairness index is adopted and is expressed as follows [43]:

J (R_{1}, R_{2}, R_{3}, \dots, R_{M}) = \frac{{(\sum_{j = 1}^{Q} R_{j, n}^{v})}^{2}}{Q \sum_{j = 1}^{Q} {R_{j, n}^{v}}^{2}}

(22)

The VUE fairness index versus the RSU transmitted power for the proposed DQN VUE agent in both LOS and NLOS environments at different SINR requirements for VUE is illustrated in Figure 17. The fairness index generally decreases with the increase in RSU transmitted power under both channel conditions. The fairness degradation is due to the increase in RSU transmitted power, which increases the interference, therefore leading to more imbalanced resource allocation among VUE pairs. Moreover, the results show that the proposed Multi-DQN framework provides high fairness performance in all investigated scenarios, where the fairness index is close to 1, even at higher transmitted power levels from the RSU. Overall, the results obtained confirm the robustness of the proposed Multi-DQN framework in terms of fairness in different transmitted power and channel propagation conditions.

6. Conclusions

This paper presented a decentralized Multi-Agent Deep Q-Network (Multi-DQN) framework tailored to the joint optimization of SCs and power allocation in V2X communication environments. By transitioning from centralized models to edge-based architecture, our approach effectively decouples VUE and CUE operations. Experimental evaluations under both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios confirm that the VUE Multi-DQN framework consistently outperforms established benchmarks using conventional methods. Specifically, the proposed model achieved a 90% and 225% increase in spectral efficiency (SE) in the case of LOS and NLOS, respectively, while simultaneously reducing the computational complexity. These results demonstrate that shifting decision-making to the edge not only enhances system reliability and performance but also provides a scalable solution for high-density vehicular networks where conventional optimization methods often struggle due to high complexity in the case of many CUE and VUE. Building on these results, future research will aim to achieve global coordination between VUE and CUE pairs to guarantee full system optimality. By implementing a federated deep reinforcement learning (FDRL) framework, we will enable decentralized agents to collaboratively optimize their policies, facilitating simultaneous coordination across the network while maintaining the low latency benefits of edge-based processing. Furthermore, integrating dynamic ‘mode selection’ algorithms with our proposed Multi-DQN framework represents another direction for future research.

Author Contributions

Conceptualization, A.A.A.-M., M.A., H.E., H.E.-H. and M.I.; methodology, A.A.A.-M., M.A., H.E. and M.I.; software, A.A.A.-M. and M.A.; validation, A.A.A.-M., M.A., H.E., H.E.-H. and M.I.; formal analysis, A.A.A.-M. and M.A.; resources, H.E.-H.; writing—original draft preparation, A.A.A.-M.; writing—review and editing, A.A.A.-M., M.A., H.E.-H., H.E. and M.I.; visualization, A.A.A.-M.; supervision, H.E., H.E.-H., M.I. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Garcia, M.H.C.; Molina-Galan, A.; Boban, M.; Gozalvez, J.; Coll-Perales, B.; Sahin, T.; Kousaridas, A. A Tutorial on 5G NR V2X Communications. IEEE Commun. Surv. Tutor. 2021, 23, 1972–2026. [Google Scholar] [CrossRef]
3GPP (3rd Generation Partnership Project). Technical Specification Group Radio Access Network; Study on LTE-Based V2X Services; TR 36.885 V14.0.0; 3GPP: Nice, France, 2016. [Google Scholar]
Chen, S.; Hu, J.; Shi, Y.; Peng, Y.; Fang, J.; Zhao, R.; Zhao, L. Vehicle-to-Everything (V2x) Services Supported by LTE-Based Systems and 5G. IEEE Commun. Stand. Mag. 2017, 1, 70–76. [Google Scholar] [CrossRef]
3GPP. Study on Evaluation Methodology of New V2X Use Cases; 3GPP: Nice, France, 2017. [Google Scholar]
3GPP (3rd Generation Partnership Project). Technical Specification Group Services and System Aspects; Release 16 Description; Summary of Rel-16 Work Items; 3GPP: Nice, France, 2022. [Google Scholar]
Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Moessner, K.; Saad, W.; Poor, H.V. 6G for Vehicle-to-Everything (V2X) Communications: Enabling Technologies, Challenges, and Opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
Giordani, M.; Polese, M. Towards 6G Networks: Use Cases and Technologies. IEEE Commun. Mag. 2020, 58, 55–61. [Google Scholar] [CrossRef]
Ji, M.; Wu, Q.; Fan, P.; Cheng, N.; Chen, W.; Wang, J.; Letaief, K.B. Graph Neural Networks and Deep Reinforcement Learning-Based Resource Allocation for V2X Communications. IEEE Internet Things J. 2025, 12, 3613–3628. [Google Scholar] [CrossRef]
Sun, M.; Xu, J.; Wang, J. State-Aware Resource Allocation for V2X Communications. Sensors 2026, 26, 344. [Google Scholar] [CrossRef]
Wang, D.; Qiu, A.; Zhou, Q.; Schotten, H.D. A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications. In 2025 IEEE 11th World Forum on Internet of Things (WF-IoT); IEEE: New York, NY, USA, 2025. [Google Scholar]
Zhang, Z.; Wu, Q.; Fan, P.; Cheng, N.; Chen, W.; Letaief, K.B. DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV. IEEE Trans. Green Commun. Netw. 2025, 9, 2144–2159. [Google Scholar] [CrossRef]
Gyawali, S.; Xu, S.; Qian, Y.; Hu, R.Q. Challenges and Solutions for Cellular Based V2X. IEEE Commun. Surv. Tutor. 2020, 23, 222–255. [Google Scholar] [CrossRef]
Zhang, F.; Wang, M.M.; Bao, X.; Liu, W. Centralized Resource Allocation and Distributed Power Control for NOMA-Integrated NR V2X. IEEE Internet Things J. 2021, 8, 16522–16534. [Google Scholar] [CrossRef]
Rehman, A.; Valentini, R.; Cinque, E.; Di Marco, P.; Santucci, F. On the Impact of Multiple Access Interference in LTE-V2X and NR-V2X Sidelink Communications. Sensors 2023, 23, 4901. [Google Scholar] [CrossRef]
ETSI. Multiple Access Techniques (MAT); Classification of Candidate Multiple Access Techniques for 6G and Their Comparison with Specified 3GPP Features; ETSI: Antipolis, France, 2026; Volume 1. [Google Scholar]
Zheng, H.; Li, H.; Hou, S.; Song, Z. Joint Resource Allocation With Weighted Max-Min Fairness for NOMA-Enabled V2X Communications. IEEE Access 2018, 6, 65449–65462. [Google Scholar] [CrossRef]
Di, B.; Song, L.; Li, Y.; Li, G.Y. Non-Orthogonal Multiple Access for High-Reliable and Low-Latency V2X Communications in 5G Systems. IEEE J. Sel. Areas Commun. 2017, 35, 2383–2397. [Google Scholar] [CrossRef]
Hussein, H.H.; Radwan, M.H.; Elsayed, H.A.; Abd El-Kader, S.M. Multi V2X Channels Resource Allocation Algorithms for D2D 5G Network Performance Enhancement. Veh. Commun. 2021, 31, 100371. [Google Scholar] [CrossRef]
Shan, L.; Gao, S.; Chen, S.; Xu, M.; Zhang, F.; Bao, X.; Chen, M. Energy-Efficient Resource Allocation in NOMA-Integrated V2X Networks. Comput. Commun. 2023, 197, 23–33. [Google Scholar] [CrossRef]
Phillips, A. Quadratic Fractional Programming: Dinkelbach Method. In Encyclopedia of Optimization; Springer: Boston, MA, USA, 2008; pp. 3149–3153. [Google Scholar]
Li, J.; Leng, Q.; Cheng, M. Resource Allocation in NOMA-V2X Networks with Multi-Agent Parameterized Action Space Reinforcement Learning. IEEE Trans. Veh. Technol. 2026, 1–16. [Google Scholar] [CrossRef]
Gao, A.; Zhu, Z.; Zhang, J.; Liang, W.; Hu, Y. Matching Combined Heterogeneous Multi-Agent Reinforcement Learning for Resource Allocation in NOMA-V2X Networks. IEEE Trans. Veh. Technol. 2024, 73, 15109–15124. [Google Scholar] [CrossRef]
Song, S.; Zhang, Z.; Wu, Q.; Fan, P.; Fan, Q. Joint Optimization of Age of Information and Energy Consumption in NR-V2X System Based on Deep. Sensors 2024, 24, 4338. [Google Scholar] [CrossRef]
Zhao, J.; Hu, F.; Li, J.; Nie, Y. Multi-Agent Deep Reinforcement Learning Based Resource Management in Heterogeneous V2X Networks. Digit. Commun. Netw. 2025, 11, 182–190. [Google Scholar] [CrossRef]
Yuan, Y.; Zheng, G.; Wong, K.; Letaief, K.B. Meta-Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications. IEEE Trans. Veh. Technol. 2021, 70, 8964–8977. [Google Scholar] [CrossRef]
Lee, I.; Kim, D.K. Decentralized Multi-Agent DQN-Based Resource Allocation for Heterogeneous Traffic in V2X Communications. IEEE Access 2023, 12, 3070–9084. [Google Scholar] [CrossRef]
Ding, Y.; Huang, Y.; Tang, L.; Qin, X.; Jia, Z. Resource Allocation in V2X Communications Based on Multi-Agent Reinforcement Learning with Attention Mechanism. Mathematics 2022, 10, 3415. [Google Scholar] [CrossRef]
Zhang, X.; Peng, M.; Yan, S.; Sun, Y. Deep-Reinforcement-Learning-Based Mode Selection and Resource Allocation for Cellular V2X Communications. IEEE Internet Things J. 2020, 7, 6380–6391. [Google Scholar] [CrossRef]
Ye, H.; Li, G.Y.; Juang, B.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol. 2019, 68, 3163–3173. [Google Scholar] [CrossRef]
Zhao, J.; Liu, Y.; Chai, K.K.; Elkashlan, M.; Chen, Y. Matching with Peer Effects for Context-Aware Resource Allocation in D2D Communications. IEEE Commun. Lett. 2017, 21, 837–840. [Google Scholar] [CrossRef]
Zhao, J.; Liu, Y.; Chai, K.K.; Chen, Y.; Elkashlan, M. Many-To-Many Matching with Externalities for Device-To-Device Communications. IEEE Wirel. Commun. Lett. 2017, 6, 138–141. [Google Scholar] [CrossRef]
Bazaraa, M.; Sherali, H.; Shetty, C.M. Nonlinear Programming Theory and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2006; Volume 2. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University: Cambridge, UK, 2004. [Google Scholar]
Yaw, B.; Kuhn, H.W. The Hungarian Method For the Assignment Problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 2015. [Google Scholar]
Plaat, A. Deep Reinforcement Learning; Springer: Singapore, 2023. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Kumar, A.; Singh, D. Adaptive Epsilon Greedy Reinforcement Learning Method in Securing IoT Devices in Edge Computing. Discov. Internet Things 2024, 4, 27. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
Lu, S.; Liu, S.; Zhu, Y.; Liang, W.; Li, K.; Lu, Y. A DRL-Based Decentralized Computation Offloading Method: An Example of an Intelligent Manufacturing Scenario. IEEE Trans. Ind. Inform. 2023, 19, 9631–9641. [Google Scholar] [CrossRef]
Liu, Q.; Tan, F.; Lv, T.; Gao, H. Energy Efficiency and Spectral-Efficiency Tradeoff in Downlink NOMA Systems. In 2017 IEEE International Conference on Communications Workshops (ICC Workshops); IEEE: New York, NY, USA, 2017. [Google Scholar]
Jain, R.; Chiu, D.M.; Hawe, W. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems. arXiv 1984. [Google Scholar] [CrossRef]

Figure 1. V2X communication environment system model.

Figure 2. Bipartite matching example for VUE pair SC assignment.

Figure 3. Multi-DQN VUE agent.

Figure 4. RSU transmitted power for NVRA algorithm [16], Enhanced conventional algorithm and Proposed Multi-DQN Agent vs. minimum SINR requirement of VUE pairs.

Figure 5. CUE average SE for NVRA algorithm [16], Enhanced conventional algorithm, Proposed Multi-DQN Agent, and Hetero-V2X_(MADQN) [22] vs. minimum SINR requirement of VUE.

Figure 6. VUE average SE vs. minimum SINR requirement of VUE pairs.

Figure 7. CUE average SE vs. RSU transmitted power for LOS path loss model at different minimum SINR requirements for VUE pairs for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 8. VUE average SE vs. RSU transmitted power for LOS path loss model at different minimum SINR requirements for VUE for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 9. CUE average SE vs. RSU transmitted power for NLOS path loss model at different minimum SINR requirements for VUE for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 10. VUE average SE vs. RSU transmitted power for NLOS path loss model at different minimum SINR requirements for VUE for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 11. Overall system SE vs. RSU transmitted power for LOS path loss model at different SINR requirements for VUE for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 12. Overall system SE vs. RSU transmitted power for NLOS path loss model at different SINR requirements for VUE for both enhanced conventional model and DRL framework: (a) 0 dB, (b) 5 dB, (c) 10 dB, and (d) 15 dB.

Figure 13. The overall SE improvement percentage (%) vs. minimum VUE SINR requirement at different RSU Tx power for (a) the LOS path loss model and (b) the NLOS path loss model.

Figure 14. Average SE for both CUE users and VUE pairs versus the number of episodes for the proposed DQN VUE agent for both LOS (a,b) and NLOS (c,d) scenarios at SINR requirement 5 dB.

Figure 15. Average cumulative reward versus the number of episodes for the proposed DQN VUE agent for both LOS (a) and NLOS (b) scenarios at SINR requirement 5 dB.

Figure 16. Energy efficiency (EE) versus RSU transmitted power for the proposed DQN VUE agent for both LOS (a) and NLOS (b) scenarios at different SINR requirements for VUE.

Figure 17. VUE fairness index versus RSU transmitted power for the proposed DQN VUE agent for both LOS (a) and NLOS (b) scenarios at different SINR requirements for VUE.

Table 1. Summary of NOMA-V2X literature review.

Ref.	Approach	Methodology	Results	Challenges
[13]	Two-stage centralized/decentralized resource allocation for NOMA-NR-V2X	Graph matching theory for resource allocation and game theory for power control	Increases capacity by 5% and reduces power consumption by 36%	High vehicle mobility, extensive negotiation
[16]	Joint resource allocation mechanism based on max-weight fairness	Problem divided into 3 subproblems and solved via matching theories and iterative algorithms	Outperforms OMA and equal power distribution methods in terms of fairness and throughput	High computational complexity and convergence issues in dense environments
[17]	NOMA-MCD for V2X broadcasting	Centralized SPS at BS for resource allocation and distributed iterative power control among vehicles	Better packet reception and lower delay than OMA	Computational complexity from power control
[18]	Multi-channel resource allocation in 5G D2D-based V2X	Power optimization utilizing three different allocation schemes	Enhanced V2I capacity and bandwidth utilization	Trade-off exists among fairness, processing time, and the loss of V2V links during significant movement
[19]	EE optimization in NOMA-V2X system	Two-layer BCD that integrates Dinkelbach’s method	Superior EE performance over other benchmarks	System complexity, interference management, imperfect CSI

Table 2. Summary of reviewed literature on using ML and DRL with V2X.

Ref.	Approach	Methodology	Results	Challenges
[21]	Multi-agent DRL with parameterized action space (res-MAPDDPG)	Decomposes the problem into V2I and V2V problems. Convex optimization NOMA grouping is applied for V2I links, while a res-MAPDDPG framework is utilized for V2V links.	Improves spectral efficiency and system capacity and reduces outage probability compared to OMA/NOMA baselines.	High training and computational complexity due to hybrid action spaces
[22]	Matching-combined heterogeneous MADDPG for NOMA-V2X resource allocation	Uses one-to-many matching for channel allocation and heterogeneous MADDPG for distributed power control of V2I and V2V links.	Improves convergence speed, spectral efficiency, and outage probability.	Scalability limitations under dense V2V deployment
[23]	DRL-based joint optimization of AoI and energy consumption in NOMA-enabled NR-V2X using MPDQN	Formulates AoI and energy consumption as an optimization problem in NR-V2X Mode 2.	Reduces AoI and energy consumption, demonstrating better performance than LTE-V2X under dense vehicular conditions.	High complexity fairness issues in NOMA resource allocation
[24]	Multi-agent DRL for heterogeneous V2X resource management	Cooperative learning for spectrum allocation and power control in dynamic V2X networks.	Improves resource allocation efficiency, spectrum utilization, and interference mitigation.	High complexity and convergence challenges
[25]	Hybrid DRL for sub-band and power control	DQN for sub-band allocation, DDPG for power control, and meta-RL for fast adaptation.	Increased throughput and flexibility over quantized power systems.	High computational demands due to many DRL models
[26]	Decentralized Multi-DQN agents with RSU clustering	RSUs are modeled as agents with limited actions and weighted global reward.	Near-optimal PRR performance, outperforming light and heavy traffic.	High complexity
[27]	AMARL system for joint spectrum and power allocation	Each V2V is a DQN agent enhanced by an attention mechanism.	Higher V2I sum rates and lower V2V latency.	Needs careful design for computational efficiency and practical application
[28]	DRL-based mode selection and resource allocation	Each V2V pair is a DQN agent that takes actions in terms of mode selection, spectrum, and power allocation.	Increase in total capacity and reduced latency. Outperforms heuristic algorithms.	High complexity, scalability problems under heavy traffic
[29]	Decentralized DRL for unicast and broadcast scenarios	Each V2V link is considered as a DQN agent that selects sub-band and power via local environment	Higher V2I capacity, lower V2V latency over conventional models.	Higher computational complexity

Table 3. List of parameters with definitions.

Parameter	Definition	Unit
$N$	Number of subcarriers $(S C)$	-
$M$	Cellular user equipment (CUE)	-
$Q$	Vehicle user equipment (VUE)	-
$y_{i}$	Received signal	Volt (V)
$p_{i}$	CUE allocated power	Watt (W)
$s_{i}$	CUE signal	Volt (V)
$p_{j}^{v}$	Transmit power for VUE pair	Watt (W)
$s_{j}^{v}$	VUE pair j signal	Volt (V)
$n_{o}$	Additive white Gaussian noise with variance $σ^{2}$
$H_{i, B}^{n}$	Channel gain between CUE $i$ and BS using ${S C}_{n}$	-
$H_{i, j}^{n}$	Channel gain between CUE $i$ and VUE pair $j$ transmitter using ${S C}_{n}$	-
$H_{j}^{n}$	Channel gain between VUE pair $j$ using ${S C}_{n}$	-
$H_{j, B}^{n}$	Channel gain between BS and the receiver of VUE pair j using ${S C}_{n}$	-
${\| h_{i, B}^{n} \|}^{2}$	Small-scale fast fading (Rayleigh coefficient) component	-
$β_{i, B}$	Log-normal shadowing with standard deviation $ξ$	-
$P L$	Path loss model	-
$α$	Path loss exponent	-
$x_{i, n}$	Binary indicator element for CUE SC assignment	-
$X_{i}$	Binary matrix for CUE SC assignment	-
$x_{j, n}^{v}$	Binary indicator element for VUE pair SC assignment	-
$X_{j}^{n}$	Binary matrix for VUE pair SC assignment	-
$γ_{i, n}$	SINR of CUE $i$ on ${S C}_{n}$	-
$W$	Bandwidth	Hertz (Hz)
$R_{i, n}$	Achievable data rate of CUE $i$ on ${S C}_{n}$	Bit per second (bps)
$γ_{j, n}^{v}$	SINR of VUE pair $j$ on ${S C}_{n}$	-
$γ_{o}$	Minimum SINR VUE pair requirement	dB
$R_{o}$	Minimum achievable data rate of VUE pair	Bit per second (bps)
$R_{j, n}^{v}$	Achievable data rate of VUE pair $j$ on ${S C}_{n}$	Bit per second (bps)
$P_{t o t a l}$	RSU total transmitted power	dBm
$P_{m a x}$	Maximum VUE transmit power	dBm
$p$	Power allocation matrix for CUE users	Watt (W)
$p^{v}$	Power allocation matrix for VUE pairs	Watt (W)
$Χ$	CUE subcarrier assignment	-
$Χ^{v}$	VUE pair subcarrier assignment	-
$τ$	Control rate	Bit per second (bps)
$τ^{*}$	Optimum control rate	Bit per second (bps)
$I$	Identity matrix	-
$η$	All possible swaps between CUE and SC	-

Table 4. Computational complexity comparison.

Model	Computational Complexity
Enhanced Conventional Algorithm	$O ({(N!)}^{M})$
Multi-DQN Agent Framework	$O (Q (2 Q + M) + N)$

Table 5. Simulation parameters.

Parameter		Value
Street width		50 m
Street length		600 m
RSU maximum transmit power		25 dBm
VUE maximum transmit power		15 dBm
Minimum SINR requirement for VUE pairs		0, 5, 10, 15 dB
LOS path loss (R18)	CUE	$38.40 + 21.0 \log_{10} (d i n k m)$
LOS path loss (R18)	VUE	$44.23 + 16.7 \log_{10} (d i n k m)$
NLOS path loss (R18)	CUE	$38.40 + 31.9 \log_{10} (d i n k m)$
NLOS path loss (R18)	VUE	$42.52 + 30.0 \log_{10} (d i n k m)$
Standard deviation of shadow fading (CUE)		8 dB
Standard deviation of shadow fading (VUE)		3 dB
Noise power		−174 dBm/Hz
Number of SCs		4
Number of CUE		12 (3 per SC)
Number of VUE pairs		4 (one for each SC)
Maximum number of users per SC		4 (3 CUE and 1 VUE pair)

Table 6. DRL DQN VUE agent hyperparameters.

Parameter	Value
Discount factor	0.99
Learning rate	0.01
Total number of episodes	1000
Maximum step size of each episode	500
Replay memory size	$1 \times 10^{6}$
Target update frequency	4
Mini-batch size	64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Masry, A.A.; Ibrahim, M.; Elbadawy, H.; El-Hennawy, H.; Ahmed, M. Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach. Telecom 2026, 7, 66. https://doi.org/10.3390/telecom7030066

AMA Style

Al-Masry AA, Ibrahim M, Elbadawy H, El-Hennawy H, Ahmed M. Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach. Telecom. 2026; 7(3):66. https://doi.org/10.3390/telecom7030066

Chicago/Turabian Style

Al-Masry, Ahmed Ali, Michael Ibrahim, Hesham Elbadawy, Hadia El-Hennawy, and Mehaseb Ahmed. 2026. "Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach" Telecom 7, no. 3: 66. https://doi.org/10.3390/telecom7030066

APA Style

Al-Masry, A. A., Ibrahim, M., Elbadawy, H., El-Hennawy, H., & Ahmed, M. (2026). Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach. Telecom, 7(3), 66. https://doi.org/10.3390/telecom7030066

Article Menu

Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach

Abstract

1. Introduction

1.1. NOMA-V2X Literature Review

1.2. Using ML and DRL with V2X Literature Review

1.3. Contribution and Organization

2. System Model

3. Enhanced Conventional Optimization Algorithm

4. DRL Approach for VUE Pair Resource Allocation

4.1. DRL Definitions and Preliminaries

4.2. Multi-DQN Agent Decentralized DRL Algorithm

4.3. Computational Complexity

5. Simulation Results

5.1. DRL Framework and Enhanced Conventional Model Benchmarking

5.2. DRL Framework and Enhanced Conventional Model for LOS Scenario

5.3. DRL Framework and Enhanced Conventional Model for NLOS Scenario

5.4. Overall System Performance

5.5. Additional Performance Metric Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI