1. Introduction
In recent years, the rapidly growing number of private vehicles [
1,
2] has led to a steady increase in road congestion and traffic accidents [
3]. Intelligent Transportation Systems (ITS) [
4] and Cooperative Intelligent Transportation Systems (C-ITS) [
5] have thus attracted considerable attention from researchers as promising solutions to these challenges. C-ITS facilitates coordinated cooperation among vehicles, infrastructure, and other road users through wireless communication, data exchange, and real-time information sharing. It enhances road safety and traffic flow efficiency while mitigating congestion [
6]. Cellular Vehicle-to-Everything (C-V2X) has received extensive attention and rapid development since its inception, owing to its high data transmission rates, low latency, and high reliability [
7]. These attributes have established C-V2X as the primary enabling technology for C-ITS. However, due to the highly dynamic nature of vehicular environments—characterized by continuous high-speed mobility—C-V2X must operate under a complex, time-varying, and highly stochastic channel condition when performing resource allocation. This necessitates a high-performance channel resource allocation algorithm to efficiently manage communication resources.
Early research on communication resource allocation algorithms established the fundamental understanding that resource allocation problems in communication systems can be equivalently modeled as optimization problems [
8]. Building upon this foundation, the study of communication resource allocation algorithms has roughly undergone several phases: convex optimization and fairness modeling, approximate solution of non-convex problems, distributed game theory and structured approaches, and learning-driven and transferable intelligence. Among these, the convex optimization and fairness phase primarily utilized Lagrangian duality theory and distributed pricing mechanisms to model and solve resource allocation problems, achieving a critical transition from idealized theoretical models to implementable distributed algorithms. This phase constitutes one of the key theoretical frameworks for modern communication resource allocation [
9]. Subsequently, to address the widespread non-convex characteristics in practical wireless systems, researchers proposed iterative approximation methods such as sequential convex approximation (SCA) and block successive upper-bound minimization (BSUM), which ensured convergence while maintaining engineering feasibility, and provided “expert algorithms” for subsequent learning-based resource allocation methods to imitate [
10]. Furthermore, with the increase in network scale and system autonomy, distributed game theory and structured approaches (such as auction mechanisms and matching theory) were introduced into communication resource allocation research, offering rigorous mathematical tools for resource coordination in multi-agent autonomous systems and becoming an important theoretical framework [
11].
Multi-Agent Deep Reinforcement Learning (MADRL) offers the combined advantages of Deep Reinforcement Learning (DRL) and multi-agent collaboration [
12]. MADRL aims to enable multiple agents to cooperatively interact within a shared environment to achieve optimal system-wide performance [
13]. The inherent capability of MADRL to support cooperative interactions among multiple agents renders it particularly well-suited for addressing resource allocation challenges in distributed vehicular networks. Specifically, MADRL enables vehicle-to-vehicle (V2V) links to collaboratively allocate communication resources, thereby improving the overall V2V transmission success rate and enhancing the throughput of vehicle-to-infrastructure (V2I) links.
Current research primarily focuses on mitigating multi-user interference among V2X links and improving spectrum utilization. Although Multi-Agent Deep Reinforcement Learning (MADRL) has been widely applied to V2X resource allocation [
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25] and achieved significant progress, these studies have not explicitly addressed how to enhance cooperation among agents in such environments. In existing approaches, inter-agent collaboration emerges implicitly through individual interactions with the environment. However, since each agent can only observe partial, localized information, the resulting level of cooperation remains limited. The dynamic nature of vehicular networks, combined with the independent decision-making of agents, makes it particularly challenging to improve collaborative capabilities. In V2X resource allocation tasks, poor cooperation often leads multiple links to concentrate on a small subset of channels, significantly increasing mutual interference and causing inefficient resource usage—ultimately degrading system-wide performance.
This paper focuses on addressing three key limitations in current research: limited utilization of observable data by agents, weak environmental perception, and insufficient inter-agent collaboration. Ji et al. [
26] took both the agent’s own state information and the state information of surrounding nodes as inputs to a graph neural network GNN. By leveraging the GNN to aggregate the state information of these nodes, they used the aggregated information as the input to a DDQN. Zhang et al. [
27], while using a mean-field approach to maintain environmental stationarity, enhanced the representational capacity of the input state with respect to the environment. Both studies improve the environmental representational capability of the input state by augmenting the input information of the reinforcement learning algorithm. Inspired by these studies, this paper innovatively incorporates the historical impact of agents on the environment into the state space and proposes a state-aware resource allocation scheme for vehicular networks. By explicitly modeling agent-environment interactions and enhancing state awareness through conditional attention, the proposed algorithm significantly improves each agent’s perception of the global context. This enhanced awareness enables agents to make more informed and coordinated decisions, thereby fostering effective collaboration even under partial observability and high environmental dynamics. As a result, the proposed approach achieves more balanced channel utilization, reduced interference, and improved overall spectral efficiency in complex V2X scenarios.
The main contributions of this paper are as follows:
This study explicitly embeds the historical impact of agents on the environment into the state space. This innovation enhances the utilization efficiency of observable information and strengthens the representational capacity of inputs for environmental dynamics, thereby providing a crucial foundation for improving the perceptual capabilities of agents.
This paper proposes a conditional attention model, which injects the influence of other agents on the environment as conditional information into the backbone network. This significantly enhances the model’s perception of the environment, enabling it to sensitively capture environmental changes and thereby improve cooperation among different agents.
The research framework diagram of this paper is shown in
Figure 1. The remainder of this paper is organized as follows:
Section 2 reviews related work.
Section 3 introduces the C-V2X communication network model used in this paper, the architecture of the agents, the information aggregation method, and the state-aware network design.
Section 4 presents the simulation results.
2. Related Work
In the early stages of research on vehicular network resource allocation, researchers primarily focused on traditional methods to address these challenges. Sun et al. [
28] proposed the Separate resource block and power allocation (SOLEN) algorithm, which first transforms the problem into a maximum weighted matching (MWM) problem on a bipartite graph and solves it efficiently using the Hungarian algorithm, ensuring orthogonal resource allocation and initial satisfaction of power constraints. Subsequently, based on the allocated resource blocks, power is further optimized via convex optimization and dual decomposition to improve cellular user rates while strictly meeting the signal-to-interference-plus-noise ratio (SINR) requirements of V2V links. Mei et al. [
29] introduced a Long-Term Evolution (LTE)-based resource allocation scheme for V2V communication, aiming to jointly optimize wireless resources, power allocation, and modulation and coding schemes (MCS) to meet specific latency and reliability requirements. Ashraf et al. [
30] proposed a decentralized algorithm to jointly optimize transmission delay and success probability. Yang et al. [
31] formulated a dual-timescale resource allocation framework, addressing both long-term mobility patterns and short-term channel variations. Wang et al. [
32] designed a hybrid architecture combining decentralized clustering, inter-cluster distributed decision-making, and intra-cluster centralized communication to balance scalability and performance. In these conventional approaches, resource allocation is typically formulated as an optimization problem-often NP-hard-making it computationally challenging to solve, especially in large-scale or dynamic environments. Moreover, traditional optimization methods exhibit significant limitations when dealing with complex, uncertain, or rapidly changing conditions. Additionally, heuristic-based resource allocation algorithms generally require a multi-step process involving information collection, centralized computation, and command dissemination. This procedure introduces substantial time overhead, making such approaches less suitable for highly dynamic vehicular networks where low-latency responses are critical.
Following the introduction of the Deep Q-Network (DQN) algorithm [
33], researchers began to adopt deep reinforcement learning (DRL)-based approaches for solving resource allocation problems in vehicular networks. In [
14], each V2V link is modeled as an independent agent, with agents making decentralized decisions during execution. The framework adopts the centralized training with decentralized execution (CTDE) paradigm and introduces low-dimensional action–observation histories—referred to as fingerprints—to address the non-stationarity issue inherent in multi-agent environments. Compared to random and several traditional baseline methods, multi-agent reinforcement learning (MARL) algorithm significantly improves the sum throughput of V2I links and enhances the success rate of V2V messages delivered within their latency budgets. Although this study has certain limitations-such as limited inter-agent collaboration and relatively low V2V link throughput-it represents one of the early pioneering efforts to apply reinforcement learning to resource allocation in vehicular networks, thus greatly stimulating subsequent research in this field. Building upon the foundation laid by [
14], researchers have subsequently introduced more advanced DRL algorithms into vehicular communication resource allocation, including Double DQN (DDQN) [
34] and Dueling Double DQN (D3QN) [
35], aiming to further improve training stability, action evaluation accuracy, and overall system performance.
Reference [
18] proposes a two-stage, dual-model resource allocation algorithm that employs the DQN algorithm for channel assignment and the DDPG algorithm for power control. This approach ensures reliable transmission performance while achieving high robustness and low bit error rate. Reference [
19] designs a centralized vehicular communication resource allocation algorithm using the DQN framework. The algorithm treats channel allocation and power control as a joint optimization problem and leverages reinforcement learning to maximize spectrum utilization while minimizing multi-user interference among V2X links. In [
20], the D3QN framework is applied to vehicular communication resource allocation. The study demonstrates the effectiveness of reinforcement learning in addressing resource allocation under spectrum-constrained scenarios, showing superior performance in dense and dynamic network environments. Reference [
21] proposes a communication mode selection model based on the DQN algorithm, which enables V2V links to dynamically select appropriate communication modes (e.g., direct V2V or infrastructure-assisted) according to the observed state. This adaptive selection mechanism achieves high reliability and low communication latency, enhancing the overall efficiency and resilience of the network.
Although these studies [
18,
19,
20] have successfully applied deep reinforcement learning algorithms to vehicular communication resource allocation and achieved promising performance—demonstrating the effectiveness of such approaches in this domain and promoting further research progress-they remain primarily exploratory in nature. These works do not incorporate environment-specific adaptations tailored to the highly dynamic and time-varying characteristics of vehicular networks. In particular, they fail to address two critical challenges: the limited environmental perception capability of individual agents and the insufficient level of cooperation among multiple agents in complex V2X scenarios. As a result, there remains a need for more advanced frameworks that enhance situational awareness and foster collaborative decision-making to fully unlock the potential of multi-agent learning in realistic vehicular environments.
Reference [
15] investigates the challenges of limited observability for individual agents and unstable joint training in the context of vehicular communication resource allocation. In this framework, each V2V link is modeled as an independent agent, and D3QN is employed within each agent to reduce action-value estimation bias and improve learning efficiency. Furthermore, the study integrates Federated Learning (FL) into the multi-agent training process: each agent trains its D3QN model locally using private experience data, while a central server periodically aggregates the local model parameters through federated averaging and broadcasts the updated global model back to the agents. This approach avoids direct transmission of raw data, thereby reducing communication overhead and enhancing privacy protection. Experimental results show that the proposed method outperforms baseline schemes—both with and without FL or D3QN—in terms of cellular network sum rate and V2V packet transmission success rate. Reference [
26] addresses the issue of high environmental dynamics in vehicular networks by introducing a Graph Neural Network (GNN) as a feature extractor within the deep reinforcement learning framework. The extracted features are then fed into a DDQN model for decision-making. By incorporating GNNs, the algorithm enhances the representation capability of spatial-temporal network states, significantly improving agents’ perception of their environment and leading to better resource utilization. However, this approach requires collecting information from neighboring nodes for message passing and aggregation, which introduces additional computational and communication latency. Reference [
27] tackles the scalability challenge in large-scale, high-density vehicular networks, where traditional Multi-Agent Reinforcement Learning (MARL) becomes infeasible due to the exponential growth in interaction complexity as the number of agents increases. To address this, the study proposes a Mean-Field Multi-Agent Reinforcement Learning (MF-MARL) framework, in which each agent does not interact directly with every other agent. Instead, agents make decisions based on the average effect or mean-field approximation of the entire population’s behavior. This simplification enables scalable learning and coordination in massive vehicular networks, achieving efficient resource allocation without requiring full peer-to-peer observation or communication.
These studies [
15,
26,
27] not only recognize the advantages of deep reinforcement learning in vehicular communication resource allocation but also adapt reinforcement learning frameworks to address the unique characteristics of this domain. While these works have further advanced the application and development of deep reinforcement learning in vehicular networks, they still fall short in fostering effective cooperation among agents. Specifically, the interactions between different agents remain largely independent or implicitly coordinated, without explicit mechanisms to enhance collaborative decision-making. As a result, the potential for synergy among agents is not fully exploited, limiting the overall system performance in highly dynamic and interference-prone environments.
To enhance the utilization of observable information, improve agents’ perception of the environment, and strengthen cooperation among agents, this paper proposes a state-aware communication resource allocation algorithm for vehicular networks. The proposed algorithm incorporates the impact of each agent on the environment as a conditional input. To maximize the utilization of observable information, this environmental impact is extended from the current time step to a historical time window, capturing temporal dynamics and long-term interactions. Furthermore, additional contextual information from the environment-such as channel states, interference levels, and mobility patterns-is incorporated into the state representation. This state information is repeatedly injected into the proposed state-aware backbone network at multiple stages during the learning process. By conditioning the network on both the agent’s influence and rich environmental context throughout the decision-making pipeline, the model achieves significantly enhanced environmental awareness. This improved perception enables agents to better anticipate the consequences of their actions and respond cooperatively to the behaviors of others, thereby promoting effective collaboration in dynamic and complex V2X scenarios.
4. Simulation Results
In this section, the proposed resource allocation algorithm is evaluated through simulation experiments. The simulation is implemented in Python using PyTorch (version 2.3.0) to construct the network architecture. The experiments are conducted on a PC equipped with a CPU (32 GB RAM) and a GPU (NVIDIA RTX 4070 Super, 12 GB VRAM). The simulation environment is configured strictly according to the urban scenario defined in 3GPP TR 36.885 [
27], Annex A, including parameters such as the number of lanes, transmit power levels, bandwidth, and vehicle mobility speed. The key simulation parameters are listed in
Table 1 and
Table 2, and unless otherwise specified, all default experimental settings are based on these tables. Each agent consists of three Q-networks, and is trained for 11,000 iterations. The Q-network employs the Mean Squared Error (MSE) loss function and is optimized using the Adam optimizer with a learning rate of 0.00001. The batch size is set to 512, and the experience replay buffer has a capacity of 1024, with stored transitions updated dynamically over time. To ensure statistical reliability, performance metrics reported in the simulations are averaged over 4000 interactions with the environment, except for quantities that cannot be meaningfully averaged (cases where averaging is not applied are explicitly noted in the text). The evaluation metrics include: the average transmission rate of V2I links, the throughput per transmission period of V2V links, and the communication delay of V2V links. This section first presents training-related results of the proposed algorithm, including the reward curve and ablation studies. Subsequently, comparative simulation results are provided to demonstrate the performance of the proposed algorithm across the aforementioned evaluation metrics.
Table 1.
V2X Network Parameters Parameter.
Table 1.
V2X Network Parameters Parameter.
| | V2I Link | V2V Link |
|---|
| Path loss model | 128.1 + 37.6log10 d, d in km | LOS in WINNER +B1 Manhattan |
| Shadowing distribution | Log-normal | Log-normal |
Shadowing standard deviation ξ | 8 dB | 3 dB |
| Decorrelation distance | 50 m | 10 m |
| Path loss and shadowing update | A.1.4 in every 100 ms | A.1.4 in every 100 ms |
| Fast fading | Rayleigh fading | Rayleigh fading |
| Fast fading update | Every 1 ms | Every 1 ms |
Table 2.
Channel Model Parameters.
Table 2.
Channel Model Parameters.
| Parameter | Value |
|---|
| Carrier frequency | 2 GHz |
| Bandwidth | 1.5 MHz |
| BS antenna height | 25 m |
| BS antenna gain | 8 dBi |
| BS receiver noise figure | 5 dB |
| Vehicle antenna height | 1.5 m |
| Vehicle antenna gain | 3 dBi |
| Vehicle receiver noise figure | 9 dB |
| Absolute vehicle speed v | From 36 to 54 km/h |
| Vehicle drop and mobility model | Urban case of A.1.2 |
| V2I transmit power | 23 dBm |
| Noise power | −114 dBm |
Time constraint of V2V payload transmission T | 100 ms |
| V2V payload size B | [1, 2, …] × 1060 Bytes |
| V2V transmit power | 23 dBm |
To better evaluate the proposed algorithm, this paper introduces the following baseline and comparative algorithms:
DDQN-CA: The proposed state-aware communication resource allocation algorithm based on DDQN.
DDQN: The DQN-based resource allocation algorithm proposed by Liang [
20], but with the network implemented using only an MLP, serving as a baseline for DDQN.
DDQN-C: The proposed state-aware communication resource allocation algorithm, but with the network implemented using only an MLP.
DDQN-A: The proposed state-aware communication resource allocation algorithm without the use of information aggregation.
GNN: An algorithm proposed by Ji [
26] that combines GNN with reinforcement learning. This algorithm exhibits strong state perception capabilities but requires the collection of information from neighboring nodes. Graph Neural Networks and Deep Reinforcement Learning-Based Resource Allocation for V2X Communications.
GA: GA is a well-established heuristic optimization method that has demonstrated strong performance in solving various complex optimization problems. However, due to its lack of adaptability and dynamic decision-making capability, GA requires multiple iterations over the joint action space at each time step, along with continuous feedback from the environment. This high computational overhead and latency make the genetic algorithm difficult to apply in real-world V2X networks, where timely and responsive resource allocation is critical.
In DRL-based methods, the process of updating network parameters typically plays a dominant role. Consider a DRL agent with H hidden layers, where the h-th hidden layer (h = 1, 2, …, H) contains h neurons. Let Z denote the dimension of the input layer, and O denote the dimension of the output layer (generally corresponding to the size of the action space). The number of trainable parameters in such a network architecture is , with a computational complexity of , where B is the batch size. The embedding dimension d is 128.
DDQN, DDQN-C, GNN: The neural networks used in DDQN, DDQN-C, and GNN all adopt a three-layer fully connected architecture consistent with neuron counts of 520, 250, and 120 per layer, respectively. The GNN algorithm consists of a centralized graph neural network and a DDQN-based reinforcement learning algorithm. The trainable parameters of the graph neural network component are significantly fewer than those of the DDQN component, making the total trainable parameters of the GNN algorithm approximately equal to . Consequently, the computational complexities of DDQN, DDQN-C, and GNN are , , and , respectively.
DDQN-A, DDQN-CA: The DDQN algorithms based on the attention mechanism consist of 3 attention modules and an output head. The embedding dimension for each vector is d = 128, and the trainable parameters for each attention module are . DDQN-CA includes one additional output head compared to DDQN-A. Therefore, the computational complexities of DDQN-A and DDQN-CA are and , respectively, where .
Based on the computational complexity analysis, the DDQN algorithm exhibits the lowest complexity but also the poorest performance. In contrast, while the DDQN-CA algorithm has the highest number of trainable parameters, it achieves the best performance. Comparatively, the GNN algorithm appears to strike a favorable balance between performance and complexity. However, it relies on a centralized processing mechanism, which requires aggregating information from surrounding nodes via a graph neural network as input for the current node, thereby introducing additional communication overhead and time latency. Therefore, despite the trainable parameters of the DDQN-CA algorithm increasing by approximately times, the significant performance improvement it delivers remains acceptable.
Figure 5 illustrates the variation in the training reward with respect to the number of training iterations under three different parameter settings in the proposed algorithm. The setting labeled “4-V2V link” corresponds to a scenario with 4 V2V links and 4 V2I links, where the packet size to be transmitted over each V2V link is 5 × 1060 Bytes. The “8-V2V link” setting includes 8 V2V links and 4 V2I links, with a packet size of 3 × 1060 Bytes per V2V link. The “15-V2V link” setting comprises 15 V2V links and 15 V2I links, also with a packet size of 3 × 1060 Bytes per V2V link. As shown in the figure, the reward function increases with the number of iterations across all three configurations, indicating that the network is effectively trained and gradually learns to make better decisions. However, the reward curve for the “15 V2V link” case exhibits larger fluctuations and a slower convergence rate compared to the other two cases. This phenomenon arises from the significantly larger action space and increased mutual interference among the numerous V2V and V2I links in this setting, which intensifies the non-stationarity and complexity of the environment, thereby making the learning process more challenging.
Table 3 presents the simulation results obtained during the process of determining the optimal value of
δ, under the configuration of 8 V2V links, 4 V2I links, and a packet size of 6 × 1060 Bytes. As shown in the table, the best performance is achieved when
δ = 5. The channel resource allocation task is conducted in a highly dynamic and stochastic environment. When
δ is too large, the historical information becomes outdated due to channel variations over time, leading to ineffective state representation. Conversely, when
δ is too small, the amount of accumulated historical data is insufficient to capture meaningful temporal patterns in channel conditions. As a result, the performance exhibits a unimodal trend-first increasing and then decreasing-as observed in
Table 3. Based on this empirical evaluation, the value of
δ is set to 5 in this study.
Figure 6 presents the simulation results for determining the optimal values of σ and
τ, under the configuration of 8 V2V links, 4 V2I links, and a packet size of 6 × 1060 Bytes. Given that
σ and
τ are interrelated within the same formula, they must be discussed together.
Figure 6a shows the surface plot of the V2V transmission success rate, while
Figure 6b provides the surface plot of the average V2I transmission rate. Observations from
Figure 5 reveal that parameter combinations yielding high V2I transmission success rates and high average V2I transmission rates are predominantly located near the diagonal. This suggests that both the V2V transmission success rate and the average V2I transmission rate are influenced not only by the absolute values of
σ and
τ, but also by the difference between them. Based on these findings, this paper sets
σ to −0.7 and
τ to 0.7. These settings aim to achieve an optimal balance between V2V transmission success and V2I average transmission rates, thereby maximizing overall system performance.
Figure 7 illustrates the convergence curves of the two reward functions. DDQN-CAR uses the new reward value, while DDQN-CA uses the original reward value. As shown in the figure, both reward functions achieve stable convergence, though their convergence points differ. Since DDQN-CAR includes an additional success rate gain term in its reward function, its final convergence value is higher than that of DDQN-CA. However, in subsequent performance evaluations, both methods exhibit comparable system performance. This phenomenon can be explained by the following two reasons: (a) The optimization objectives of this study encompass both maximizing transmission rate and maximizing transmission success rate. Although the reinforcement learning reward function defined in Equation (16) is presented in a weighted form, explicit constraints on the transmission success rate were imposed during the problem modeling stage. During model training, any state violating these constraints results in a significantly low rate reward, thereby being autonomously avoided by the agent. (b) Due to the difficulty in completing payload transmission within extremely short time intervals, the transmission success rate therefore exhibits minimal fluctuations across adjacent time slots. Therefore, the primary optimization driver of the reward function remains the weighted sum rate. From a communication system perspective, higher transmission rates often rely on favorable channel conditions and signal-to-noise ratios, which simultaneously provide a physical layer foundation for improving transmission success rates. Thus, continuous optimization of the weighted sum rate inherently and effectively drives the fulfillment of success rate constraints, indicating that the two objectives share an inherently consistent optimization direction at the system level. Based on this, and to maintain the simplicity of the reward function, Equation (16) is ultimately adopted as the reward function for training in this study.
Figure 8 illustrates the relationship between instantaneous transmission rates and channel selection for six randomly selected V2V links over one period, under the configuration of 4 V2I links, 8 V2V links, and a packet size of 4 × 1060 Bytes for V2V links. The figure depicts the channel selection and corresponding transmission rates of each V2V link within a transmission period as they vary over time. Since this set of simulation experiments aims to reflect the instantaneous changes in channel conditions and transmission rates over time, no averaging has been performed. In the figure, −1 indicates that the channel is off, while the values 0, 1, 2, and 3 on the left vertical axis represent the four channels used in this experiment. The right vertical axis indicates the transmission rate values of the V2V links. This visualization provides insights into how dynamically changing channel conditions affect the transmission performance of individual V2V links within a given period, without smoothing out short-term variations through averaging.
As shown in
Figure 8, this experiment tracks two key metrics for each agent over time: the selected communication channel and the achieved transmission rate. By observing
Figure 8c,e, it can be readily observed that once these agents find a channel capable of satisfying their transmission requirements, they do not continue to compete for potentially better channel resources to achieve higher rates. Furthermore, although the maximum achievable rate for V2V links exceeds 10, none of the agents persistently contend for optimal channels once their transmission needs are met. Instead, after securing a suitable channel, each agent refrains from further competition. This behavior demonstrates that the proposed algorithm enables effective cooperation among multiple agents. Moreover,
Figure 8 shows that the V2V-1, V2V-2, V2V-4, and V2V-7 links complete their transmission tasks first and subsequently deactivate their communication links. Following this, V2V-3 and V2V-5 quickly perceive the change in the environment—specifically the reduction in channel contention—and promptly switch to channel 1 and channel 2, respectively, for their transmissions. This dynamic response clearly illustrates that the proposed algorithm is capable of sensitively detecting environmental changes and reacting swiftly to them. Additionally, by examining the transmission rates during the initial 20 ms across all subplots in
Figure 8, it is evident that not every V2V link successfully finds a suitable channel within this period. This is due to the limited number of available channels—only one-third of the total number of transmission links—which results in intense competition. Differences in individual link conditions and experienced interference further contribute to varying access delays. In summary, through comprehensive analysis of
Figure 8, it is demonstrated that the proposed algorithm enables agents to rapidly adapt to dynamic environments and exhibit cooperative behavior, effectively balancing resource utilization and system stability under constrained spectrum conditions.
Figure 9 illustrates the variation in V2V transmission success rate with respect to the number of transmitted payloads under the configuration of 4 V2V links, comparing several different algorithms. This figure evaluates the performance of each algorithm in terms of V2V transmission reliability. As shown in the figure, the success rates of all algorithms decrease as the number of transmitted payloads increases. This degradation is primarily attributed to the longer transmission duration required for larger payloads, which intensifies the temporal overlap and mutual interference among V2V links. Notably, the proposed algorithm consistently achieves a higher V2V transmission success rate compared to baseline methods when transmitting the same number of payloads, demonstrating its superior efficiency. This improvement stems from the enhanced cooperation among agents enabled by the proposed approach, which effectively reduces inter-agent interference and promotes coordinated channel access. Under high payload loads, the proposed method performs slightly worse than the centralized heuristic algorithm; however, it still outperforms all other distributed or learning-based baselines. The marginal gap with the centralized heuristic highlights a favorable trade-off between performance and scalability, as the proposed method operates in a fully distributed manner without requiring global coordination.
Figure 10 presents the average V2I transmission rate of different algorithms under the configuration of 4 V2V links, as the number of V2V transmitted payloads increases. This figure evaluates the impact of V2V traffic load on the performance of V2I communications. As observed in
Figure 10, the average V2I transmission rate decreases for all algorithms as the V2V payload load increases. This degradation is caused by the prolonged transmission duration and increased channel occupancy of V2V links, which in turn leads to more frequent and sustained interference on V2I communications. Notably, the proposed algorithm achieves a comparable or higher average V2I transmission rate than the baseline methods under the same payload load, demonstrating its ability to better mitigate cross-link interference. By enabling intelligent and cooperative spectrum access among agents, the proposed approach effectively preserves V2I communication quality even under high V2V traffic loads. When comparing
Figure 9 and
Figure 10, it is evident that, under identical payload conditions, the proposed algorithm outperforms existing methods in both V2V transmission success rate and average V2I transmission rate. This simultaneous improvement in both metrics highlights the effectiveness of the proposed method in achieving a favorable balance between V2V reliability and V2I throughput, thereby enhancing the overall performance of the C-V2X network.
Figure 11 shows the average transmission time of the proposed algorithm compared to other algorithms under the configuration of 4 V2V links and 4 V2I links, with a total of 6 data packets being transmitted.
Figure 12 presents similar results but under the configuration of 8 V2V links and 4 V2I links, also transmitting 6 data packets. By observing
Figure 10 and
Figure 12, it can be seen that the proposed algorithm achieves lower average transmission times than the other algorithms in both configurations. Specifically, under the settings with 4 V2V links and 8 V2V links, the proposed method consistently outperforms its counterparts, indicating its superior efficiency in managing transmission delays. These results highlight the effectiveness of the proposed algorithm in optimizing resource allocation and reducing interference among concurrent transmissions, thereby enhancing overall system throughput and reliability.
The suboptimal data transmission performance of V2V links is primarily attributable not to the data transmission process itself, but to packet collisions. In conventional distributed algorithms, each link independently selects the optimal channel, which frequently leads to multiple links simultaneously contending for the same high-quality resources, thereby inducing severe interference and packet collisions. As observed in
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12, the proposed algorithm in this work, through collaborative learning among agents, enables the links to distribute themselves more evenly across different channel resources. This results in more balanced channel occupancy, effectively avoiding collisions with other links and allowing each link to select resource combinations that minimize interference. Consequently, as demonstrated in
Figure 9 and
Figure 10, the proposed algorithm achieves significantly superior performance in the key metric of average transmission time compared to other benchmark algorithms, fully validating the effectiveness of its design in reducing end-to-end delay.
Figure 13 illustrates the V2V transmission success rate as a function of payload load under different numbers of V2V links. As shown in the figure, when the payload load is fixed, the V2V transmission success rate decreases with an increasing number of V2V links. This degradation is primarily caused by the heightened interference among V2V links due to greater channel contention in denser network scenarios. Furthermore, for a fixed number of V2V links, the success rate declines as the payload size increases. This is because larger payloads require longer transmission durations, which extend the time intervals during which V2V links interfere with each other. The prolonged channel occupancy intensifies mutual interference, thereby reducing the probability of successful packet delivery. These results highlight the challenges of reliable V2V communication under high traffic density and large data demands.
Figure 14 illustrates the variation in the average V2I transmission rate with respect to the number of V2V transmitted payloads under different numbers of V2V links. The figure provides insights into how the presence of additional V2V links and increased payload sizes impact V2I communication performance. As observed in
Figure 14, for a fixed payload size, the average V2I transmission rate decreases as the number of V2V links increases. This decline is attributed to the increased competition for channel resources between V2V and V2I links. More V2V links lead to higher levels of noise and interference, which adversely affect the reliability and throughput of V2I communications. Additionally, for a fixed number of V2V links, the average V2I transmission rate also decreases as the payload size increases. This reduction occurs because larger payloads require longer transmission times for V2V links, thereby extending the periods during which V2I links experience interference from V2V transmissions. The extended interference duration negatively impacts the overall performance of V2I links. These observations underscore the significant influence of V2V traffic on V2I communication quality, particularly in terms of channel contention and interference. Despite these challenges, the proposed algorithm demonstrates its effectiveness by mitigating the adverse effects of increased V2V activity and maintaining robust V2I performance under varying network conditions. This highlights the algorithm’s capability to balance the competing demands of V2V and V2I communications in dynamic vehicular environments.
The declining trend in the transmission performance of both V2V and V2I links with increasing payload volume, as shown in
Figure 9,
Figure 10,
Figure 13 and
Figure 14, can be understood from the following two perspectives. From the collision perspective, the increase in payload directly leads to a higher frequency of transmission attempts and an extended duration per transmission within the network. This significantly raises the probability of multiple links transmitting data concurrently over the same time-frequency resources, resulting in a nonlinear increase in signal collision events. For V2V links, collisions directly cause packet loss, leading to a decline in transmission success rate. For V2I links, interference induced by collisions occupies available bandwidth, thereby reducing the average transmission rate. From the channel occupancy perspective, the larger data volume substantially increases the proportion of time the channel remains in a busy state. The sharp reduction in available transmission windows intensifies competition among all links for limited resources. As V2V links struggle to secure transmission opportunities within the required low-latency constraints, their success rate decreases accordingly. Meanwhile, for V2I links, which typically rely on base station coordination, the allocable continuous or stable bandwidth resources are compressed.
Figure 15 and
Figure 16 illustrate the variation in V2V transmission delay with respect to payload load under two different configurations: 4 V2V links and 8 V2V links, respectively. As observed in both figures, the V2V transmission delay increases as the payload size grows, which is expected due to longer transmission durations required for larger data volumes. Moreover, by comparing
Figure 15 and
Figure 16, it can be seen that, under the same payload load, the transmission delay in the 4-V2V-link scenario is significantly lower than that in the 8-V2V-link scenario. This performance gap arises because, under fixed communication resources, a higher number of V2V links leads to increased channel contention and interference, resulting in more frequent access delays and retransmissions. The increasing trend in transmission delay with payload load is primarily attributed to the finite system capacity. When the total transmission workload increases-either through larger payloads or a greater number of active links-the time required to complete all transmissions increases accordingly. In dense scenarios with 8 V2V links, this effect is further exacerbated by intensified mutual interference and reduced per-link resource availability. The proposed algorithm demonstrates effective delay control across both network scales, particularly by enabling cooperative spectrum access and reducing redundant collisions, thereby improving overall system efficiency even under constrained resources.