Next Article in Journal
A Line of Sight/Non Line of Sight Recognition Method Based on the Dynamic Multi-Level Optimization of Comprehensive Features
Previous Article in Journal
Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
Previous Article in Special Issue
A 3-D Near-Field Source Localization Approach Based on the Combination of a Phase Interferometer, the Centroid Algorithm and the Perpendicular Foot Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement Learning-Based Resource Allocation Scheme of NR-V2X Sidelink for Joint Communication and Sensing

College of Information Science and Technology, Donghua University, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(2), 302; https://doi.org/10.3390/s25020302
Submission received: 11 November 2024 / Revised: 17 December 2024 / Accepted: 6 January 2025 / Published: 7 January 2025
(This article belongs to the Special Issue Communication, Sensing and Localization in 6G Systems)

Abstract

:
Joint communication and sensing (JCS) is becoming an important trend in 6G, owing to its efficient utilization of spectrums and hardware resources. Utilizing echoes of the same signal can achieve the object location sensing function, in addition to the V2X communication function. There is application potential for JCS systems in the fields of ADAS and unmanned autos. Currently, the NR-V2X sidelink has been standardized by 3GPP to support low-latency high-reliability direct communication. In order to combine the benefits of both direct communication and JCS, it is promising to extend existing NR-V2X sidelink communication toward sidelink JCS. However, conflicting performance requirements arise between radar sensing accuracy and communication reliability with the limited sidelink spectrum. In order to overcome the challenges in the distributed resource allocation of sidelink JCS with a full-duplex, this paper has proposed a novel consecutive-collision mitigation semi-persistent scheduling (CCM-SPS) scheme, including the collision detection and Q-learning training stages to suppress collision probabilities. Theoretical performance analyses on Cramér–Rao Lower Bounds (CRLBs) have been made for the sensing of sidelink JCS. Key performance metrics such as CRLB, PRR and UD have been evaluated. Simulation results show the superior performance of CCM-SPS compared to similar solutions, with promising application prospects.

1. Introduction

In the future, vehicles on the road will need to frequently exchange information with surrounding vehicles, pedestrians, and road traffic infrastructure, which drives the development of Vehicle-to-Everything (V2X) technology. Today’s vehicles have transformed from traditional vehicles into intelligent vehicles. Through the V2X network, there is potential to achieve an Advanced Driver Assistance System (ADAS), improving driving safety and comfort. Recently, the latest beyond 5G and 6G standards have introduced new requirements for V2X, including enhanced demands for sensing accuracy, precision, and resolution, alongside the existing communication criteria of latency, reliability, capacity, and coverage [1]. Therefore, the joint communication and sensing (JCS) system that utilizes a signal to simultaneously achieve two functions has attracted much attention.
In previous systems, communication and radar sensing were separate systems using different frequencies and hardware resources. However, with increasingly scarce spectrum resources, there is a need for more efficient utilization of spectrum resources by communication and radar systems. As the bandwidth of commercial communication systems increases, coexistence with various existing radar systems is anticipated, leading to the development of the JCS concept [2]. JCS can provide integrated and collaborative gains for future systems [3]. On the one hand, sharing spectrum and hardware resources can lead to high resource utilization efficiency. On the other hand, the sensing function can assist communication in obtaining more accurate channel estimation models, which are beneficial for beamforming and spectrum resource management.
3GPP Release 16 has established standards for vehicle sidelink communication based on the 5G-NR PC5 air interface, enabling vehicles to communicate directly without the assistance of gNB [4], as illustrated in Figure 1. Sidelink is beneficial for reducing latency and improving communication. In addition, sidelink signals can also be used for near-field positioning, range sensing, and distance measurements [5], thereby complementing or enhancing positioning systems that may be limited by obstacles or other factors, such as network-based positioning or the Global Navigation Satellite System (GNSS). Therefore, the V2X sidelink JCS system has significant development potential.
However, due to limited available bandwidth, and without the assistance of a base station, there is a conflicting requirement between radar and communication in spectrum resource utilization. Radar accuracy requires large bandwidth occupancy, which reduces available resources in the resource pool, increases the resource collision probability, and thereby affects the performance of communication. The issue of resource collision in the sidelink scenario is related to resource allocation schemes. Therefore, a flexible and robust resource allocation scheme is crucial for mitigating resource pool conflicts.
Traditional sidelink resource allocation schemes are divided into dynamic allocation and sensing-based semi-persistent scheduling (SB-SPS). SB-SPS is widely used for sidelink resource allocation due to its better reliability and latency [6]. The SB-SPS scheme firstly senses the channel quality and selects the available candidate resources and does not select a new resource for the next transmission until the retransmission counter ( R C ) returns to zero in order to avoid packet collision during the sidelink communication access. There are also numerous studies concerning modified SPS schemes to improve low-latency communication [7,8]. However, the existing research has rarely studied the impacts of packet collision on sidelink sensing performance. According to the theoretical model of Cramér–Rao Lower Bound (CRLB), the resultant SINR of the echo due to collisions will significantly impact the sensing performance of the JCS sidelink. Especially, in high-density scenarios, the probability of selecting the same resource for different vehicles quickly rises, resulting in the deterioration of both communication and sensing performances. In fact, obtaining knowledge about the dynamic echo channel state of JCS is still a big challenge and difficult to resolve. Therefore, it is rational for this paper to study the consecutive collision problem related to echo together with transmission signals of the JCS sidelink. With the advancements in self-interference (SI) technology in recent years [9], simultaneous transmission and reception on the same frequency band with in-band full-duplex (FD) transceivers have become feasible, offering hope for the implementation of JCS systems in the sidelink. Additionally, the powerful sensing capabilities of full-duplex bring collision detection functionality, creating new opportunities for enhancing sidelink resource allocation schemes.
Moreover, in response to the high dynamics of V2X traffic density and network load, inspired by references [10,11], reinforcement learning and other intelligent algorithms can be employed to optimize resource allocation, ensuring the stability and accuracy of communication and sensing tasks.
Inspired by the above, this paper focuses on high-positioning accuracy, low-latency and high-reliability in the 5G NR-V2X sidelink JCS system. By studying the comprehensive impact of interference due to consecutive collisions, we propose a reinforcement learning-based collision mitigation resource allocation scheme (CCM-SPS). Specifically, this scheme employs JCS full-duplex collision detection and reinforcement learning to optimize traditional SB-SPS parameters, mitigating performance degradation from consecutive collisions. Furthermore, the impact of varying vehicle density and packet sizes on JCS performance in dynamic vehicular networks is discussed. Finally, the effectiveness of the proposed scheme in enhancing overall performance is validated using a V2X sidelink system-level simulator.
The main contributions of this work can be summarized as follows:
  • In order to address the conflicting requirement between sensing accuracy and communication reliability for sidelink resources, a novel collision mitigation resource allocation scheme is proposed. The algorithm integrates the full-duplex detection capability of JCS with the resource sensing reservation process of the traditional SB-SPS scheme. This allows vehicles to dynamically optimize reservation times based on sensing channel information, effectively reducing consecutive packet collisions and enhancing the overall utilization efficiency of resources in the sidelink JCS system.
  • The proposed CCM-SPS and traditional SB-SPS’s performance is theoretically analyzed in scenarios of a variety of vehicle densities and packet sizes. Through reinforcement learning, comprehensive optimization of resource utilization for sensing and communication is achieved.
  • Comprehensive evaluations are performed using the Cramér–Rao Lower Bounds (CRLB), packet reception rate (PRR) and update delay (UD). The novel scheme shows comparative advantages in positioning accuracy, latency, and reliability performance indicators over a comparative scheme.
The remainder of the article is organized as follows. In Section 2, a review of recent literature is conducted, and in Section 3, a theoretical analysis is conducted on the performance of the sidelink JCS system. In Section 4, the consecutive collision problem is analyzed using a traditional resource allocation scheme. Then, in Section 5, the specific implementation of the improved resource allocation scheme is introduced, including full-duplex collision detection and a Q-learning based collision mitigation scheme. The extensive results are presented in Section 6, which analyzes the performance indicators in various scenarios. Section 7 presents the concluding remarks.

2. Related Works

Signal coexistence for achieving higher spectral efficiency has recently garnered significant attention, e.g., in [12,13,14,15,16]. Reference [12] proposes a system that combines collaborative communication technology with cognitive radio, which improves the overall performance of the system through cooperative spectrum-sharing networks. Reference [13] explored the coexistence of LAA and WIFI over unlicensed bands through a static contention window method. Their work provides practical solutions for achieving technological coexistence in existing network architectures. Reference [14] explores the advantages of integrating sensing and communication technologies (ISAC), including high spectral efficiency, low hardware costs, and improved system performance, as well as the potential for ISAC in future 6G applications. Reference [15] proposed a joint design framework for communication and sensing in small cellular networks, which optimizes the collaborative work of communication and sensing through waveform selection and resource allocation. Reference [16] addresses the practical challenges of residual hardware impairments (RHIs) and imperfect successive interference cancellation, deriving the superior performance of the ISAC framework compared to the sensing-communication coexistence (SCC) framework. It demonstrates that the integration of sensing and communication is a significant trend for future development.
In a highly dynamic V2X scenario, JCS systems can make network service adjustments more flexible and robust by simultaneously handling communication and sensing signals and dynamically coordinating [17]. Extensive research has focused on the JCS in vehicular networks, primarily concentrating on signal waveform design [18,19,20] and power management [21,22,23]. Firstly, extensive research on the design of signal waveforms provide the theoretical model foundation for the feasibility of JCS systems. Secondly, regarding interference and resource management, it can be found that existing research in JCS primarily focuses on large bandwidth millimeter-wave bands or Vehicle-to-Infrastructure (V2I) scenarios, which have certain limitations. The optimization of JCS system resource allocation is a challenge for V2V direct communication with limited resources and no base station assistance.
In Release 16, the 3rd Generation Partnership Project (3GPP) developed the NR-V2X and defined a new air interface PC5 as the sidelink, which allows for direct vehicle-to-vehicle communication to support various advanced use cases, covering four areas: platooning, extended sensing, remote driving, and autonomous driving [24]. In recent years, research on the sensing capabilities of NR-V2X sidelink signals has been increasingly explored. Studies on the theoretical foundation of CRLB for the sidelink sensing performance highlight the potential value for the resource allocation scheme of the JCS sidelink system. Reference [25] utilizes unused communication subcarriers of the 5G NR waveform as radar sensing subcarriers, aiming to minimize the CRLB of distance estimation by optimizing the amplitude of radar subcarriers, thereby improving sensing performance. Reference [26] derives the CRLB for distance/angle estimation accuracy based on sidelink OFDM signals, demonstrating the feasibility of sidelink JCS signals for near-field localization. References [5,27] derive the CRLB and mean squared error (MSE) for the sensing location of sidelink signals and analyze in detail the impact of communication physical layer parameters on sensing performance under interference conditions. Reference [28] compares the radar sensing performance of sidelink resource allocation SPS algorithms versus random allocation algorithms under interference, indicating that the JCS resource allocation mechanism is also a key factor affecting sidelink sensing performance. The above existing research contributions can be referenced to conduct a theoretical performance analysis of the sidelink JCS system in this paper.
According to the 3GPP standard [29], NR-V2X sidelink typically employs distributed resource allocation schemes based on the reservation mechanism, known as the SB-SPS algorithm [30], to obtain low latency communication performance. However, SB-SPS still faces consecutive resource collisions caused by resource allocation conflicts among multiple vehicles [31]. Recently, there have been many achievements in improving the SPS algorithm to alleviate the communication collisions [32,33,34]. Reference [32] improves the efficiency of resource utilization, enhances reliability, and reduces latency by designing a vehicle reuse distance during the SPS reselection process. Reference [33] develops an adaptive sidelink open-loop power control (AS-OLPC) algorithm to dynamically adjust transmission power, thereby improving communication reliability in complex urban environments. Reference [34] analyzes the effectiveness of introducing a re-evaluation mechanism at the MAC layer of NR-V2X to avoid collisions caused by packet re-transmissions, though the overall improvement is modest and requires more refined allocation strategies. However, in highly dynamic scenarios, there are uncertain complicated environment changes in the network that will intensify the packet collisions, so the above methods could not be adaptive to solve the communication performance degradation.
To overcome this problem, recent studies have increasingly focused on applying reinforcement learning techniques to optimize V2X resource allocation [35,36,37]. Existing resource allocation methods often rely on static models, which are unable to adapt in real-time to the dynamic traffic environment. In contrast, reinforcement learning can autonomously adjust, significantly improving resource utilization and system performance. Specifically, Reference [35] uses Q-learning to optimize SPS algorithm parameters, including reservation probability (RP) and reselection counter ( R C ), enhance packet reception rate (PRR) and reduce update delay (UD) in high-dynamic vehicular networks. Reference [36] proposes a deep reinforcement learning-based congestion control mechanism that optimizes channel busy rate (CBR) and age-of-information (AoI), showing significant improvements over traditional decentralized congestion control (DCC) algorithms. Reference [37] treats each vehicle as an independent agent and employs a multi-agent deep reinforcement learning (MARL) resource allocation algorithm, enabling vehicles to learn to select resource blocks and transmission power for periodic packet broadcasting. It is worthwhile to note from existing studies that the optimal R C parameter trained by RL can intelligently reduce consecutive collision probabilities in highly dynamic sidelink communication. Similarly, considering that R C optimization is crucial to sensing CRLB, this paper will introduce an RL approach to R C optimization in the JCS sidelink resource allocation to improve sensing performance.
Additionally, sidelink collision detection indicates that the dynamic change on channel states can be feasible through full-duplex technology. With advancements in self-interference cancellation (SIC) technology for in-band full-duplex communication [38], some studies have begun using full-duplex antennas to enhance resource allocation mechanisms. Reference [39] utilizes an in-band full-duplex transceiver and its collision detection capabilities to trigger SPS resource reselection, improving communication performance and analyzing the relationship between the collision detection threshold and vehicle density variations. Reference [40] also leverages collision detection capabilities brought about by full-duplex communication to adjust the probability of maintaining the same subchannel for transmission.
In short, most of the current resource allocation algorithm improvements only focus on improving communication performance, and there are few references on the joint optimization of sidelink JCS systems. Indeed, in practical sidelink scenarios with a limited frequency spectrum, the radar sensing accuracy can impact communication reliability due to the conflict in bandwidth requirements [28]. Due to the lack of base station coordination, a traditional SPS scheme may lead to serious consecutive packet collisions. To our knowledge, there is currently a lack of extensive research on joint optimization for sensing and communication in sidelink JCS systems. Therefore, this paper leverages pertinent research and proposes the CCM-SPS scheme to address the issue of consecutive collisions in sidelink JCS systems. This scheme utilizes full-duplex collision detection and Q-learning reinforcement learning methods, optimizing both the sensing range (CRLB) and ultra-reliable low latency communications (URLLCs) quality in vehicular JCS systems.

3. Theoretical Performance Analysis on Sidelink JCS System

Multiple vehicles simultaneously transmit data packets via full-duplex broadcasts and use echo signals to detect and locate passive targets in the surrounding environment, obtaining sensing information such as target distance, relative speed, and the signal-to-noise ratio (SNR). In this scenario, multiple vehicles send sidelink signals, such as a cooperative awareness message (CAM) or collective perception message (CPM) [41]. It is initially assumed that all vehicles are equipped with in-band full-duplex transceivers with perfect SIC, enabling the use of echoes to detect channel quality [42].
Assume that during time slot t T , vehicle i I t generates a data packet and begins broadcasting the OFDM symbols s m ( t ) . In this context, the symbol for the n-th subcarrier of the m-th symbol is denoted as x n , m , with each symbol power being P n , m . The symbol duration is defined as T s y m = T + T c p , where T c p represents the cyclic prefix duration and T = 1 / Δ f , where Δ f is subcarrier space. The representation in the complex baseband is as follows:
s m ( t ) = n = 0 N 1 P n , m x n , m e j 2 π n Δ f t rect t m T sym T sym
The sidelink JCS data packet consists of N subcarriers and M OFDM symbols. Assume that all the symbols x n , m in the packet form an N × M matrix X = { x n , m } C N × M , where each column represents an OFDM symbol and each row represents a subcarrier. Assume that the power of each symbol on each subcarrier is the same and normalized, i.e., P n , m = P a v g . Therefore, for all N and M, the transmission power of each vehicle is given by P T = N · P a v g .
Sidelink data packets are not only used for communication transmission but also for echo sensing. Assume the target vehicle is d away from the transmitting vehicle and the relative velocity is v. The complex baseband representation of the received echo signal for the n-th subcarrier of the m-th symbol is given by:
y m , n α e j 2 π m T sym ν e j 2 π n Δ f τ x m , n
Among them, α is the channel coefficient. By receiving the echoes at the transmitting receiver, estimates of ν and τ are obtained as ν ^ and τ ^ . The parameters ν and τ represent the Doppler shift and delay of the echo signal, respectively. The distance estimate d ^ is calculated from τ ^ as:
d ^ = τ ^ 2 c
where c represents the speed of light. The carrier frequency is f x . The relative speed estimate v ^ is calculated from ν ^ as:
v ^ = ν ^ c 2 f x
P R is the received power of the full duplex, given by:
P R = P T G 2 c 2 σ ( 4 π ) 3 f c 2 d 4
Among them, P T represents the transmit power, and G represents the transmit/receive antenna gain. The noise power is represented as P n = k B T 0 F W , where k B is the Boltzmann constant, T 0 = 290 is the reference temperature, F is the noise figure of the full-duplex radar receiver, and the bandwidth of the sidelink transmission signal is W = N Δ f .
In practice, during the transmission slot, interference occurs when other vehicles transmit using the same frequency resources, which affects radar sensing performance. The calculation formula is:
I t , i = k I t , k i η t , k i h t , k i P T L ( d t , k i )
In (6), L d t , k i represents the path loss at distance d t , k i between the interfering vehicle k and the transmitting vehicle i at time slot t, and the parameter h t , k i represents the large-scale fading at time slot t. Additionally, η t , k i is a coefficient that takes values of 0 or 1, indicating the absence or presence of interference in that time slot, respectively.
Therefore, considering Gaussian white noise and interference from other vehicles, the signal-to-interference-plus-noise ratio (SINR) of the reflected echo signal from a target located at a distance d from the transmitter is calculated as follows:
SINR t , i = P R P n + I t , i
According to [43,44], the radar sensing performance of NR-V2X signals with OFDM waveforms is represented by the Cramér–Rao Lower Bound (CRLB) on distance estimation variances. The CRLB represents the best performance that can be achieved for unbiased estimation of these parameters. The CRLB for distance estimation for the transmitting/sensing vehicle i at time slot t is given by:
CRLB t , i ( d ^ ) = 3 c 2 SINR t , i 8 π 2 Δ f 2 M N ( N 2 1 )
The CRLB clearly describes the most optimistic performance achievable and serves as a benchmark for characterizing sensing accuracy. Practically, the signal processing methods cannot achieve performance below the theoretical CRLB.
Further, based on the preceding analysis, the impact of sidelink bandwidth resources on sensing performance can be analyzed. According to (8), the CRLB for distance estimation decreases rapidly with the cube of the bandwidth. Therefore, increasing the subcarrier spacing ( Δ f ) or the number of subcarriers (N) is advantageous for distance estimation.
Simultaneously, according to Shannon’s theorem, increasing bandwidth benefits the data rate in vehicular networks. However, in a multi-user distributed resource allocation system with limited resources, indiscriminately increasing N to enhance sensing and communication performance may exacerbate consecutive packet collisions within the existing sidelink resource allocation strategy. This interference significantly impacts the performance of the JCS system and reduces resource utilization efficiency [28]. The subsequent sections will provide a detailed analysis of optimizing the allocation scheme to mitigate collisions.

4. Consecutive Collision Problem Analysis of the JCS Sidelink Resource Allocation

4.1. Principle of the SB-SPS Resource Allocation Scheme

NR-V2X supports Mode 2 sidelink communication and employs the sensing-based semi-persistent scheduling (SB-SPS) scheme [45]. Initially, the SB-SPS scheme is designed to support periodic safety messages, utilizing sensing windows and a resource reservation mechanism to reduce end-to-end latency. The fundamental working principle of SB-SPS is illustrated in Figure 2, with specific steps as follows:
In the sensing window, vehicles measure the reference signal received power (RSRP) of a physical resource block (PRB), continuously generating a list of available resources L a . This list includes time-frequency resources with RSRP values below the threshold P t h . Once the number of resources in L a is less than X % of the total resources, P t h will increase by 3 dB to increase the number of L a . The X % threshold can be set to 20%, 35%, or 50% depending on the configuration and service priorities. Subsequently, during the resource selection window, a reselection counter ( R C ) is employed to manage the use of reserved resources, with resource reselection occurring only when the R C reaches 0. This process involves selecting a new resource with probability ( 1 P k ) or continuing to use the previously reserved resource with probability P k , where P k ranges from 0 to 0.8. Once a resource is selected, continuous transmission occurs in the same resource block, with the number of transmissions determined by the value of R C . R C is randomly chosen within a range between 5 and 15 and decremented by 1 after each transmission until R C reaches 0, at which point the next selection process is triggered. Equation (9) demonstrates the Δ R C , which is the R C decreasing step size of the SB-SPS:
Δ R C = R C R C = 1
Due to the distributed resource allocation characteristics of the sidelink, it is impossible to obtain complete channel state information, which means that it cannot be guaranteed that all vehicles select idle resources. In addition, there is partial overlap in the candidate resource pools between neighboring vehicles, resulting in packet collisions. This issue is exacerbated in dynamic scenarios with high-density and large packet-size services. Furthermore, the sidelink echo signals are also subject to interference from consecutive packet collisions, which degrade the sensing performance.

4.2. Markov Chain Model of SB-SPS

In this section, a Markov chain analytical model [46] is presented, as shown in Figure 3. At any slot t, the corresponding R C ( t ) [ 0 , 15 ] . If R C ( t ) = 0 , it is randomly re-initialized s.t. R C ( t + 1 ) [ 5 , 15 ] . Thus, the probability that
Pr { R C ( t + 1 ) = i | R C ( t ) = 0 } = 1 11 .
Denote π i as the probability that R C ( t ) = i , 0 i 15 . According to Figure 3, π i  satisfies:
π i = π 0 for 0 i 4 , 1 11 π 0 + π i + 1 for 5 i 14 , 1 11 π 0 for i = 15 .
Using the normalization condition i = 0 15 π i = 1 , and solving (10), we obtain π 0 = 1 11 .
Since access collisions are caused by resource reselection, we first define the probability of a reselection resource collision. A collision will occur when multiple vehicles reselect the same resources within overlapping selection windows. Assume that at time t, vehicle UE0 is in the state RC = 0 , and it performs a reselection during the selection window [ t , t + RRI ] . During this time, other UEs also engage in reselection. UEs transition to RC = 0 with probability π 0 and reselect with probability 1 P k , moving to a new state. Since the RC states of each UE are independent, if n out of N UE UEs are involved in reselection, the probability that other vehicles also trigger reselection when UE0 triggers reselection is given by:
P s ( n ) = Pr { n R C U E s = 0 , Reselect | R C U E 0 = 0 , Reselect } = N U E n π 0 ( 1 P k ) n 1 π 0 ( 1 P k ) N U E n
where n represents the number of vehicles simultaneously reselecting within UE0’s reselection window. Access collision may occur if at least one other vehicle selects the same available PRB as UE0.
We define the collision involving n UEs reselecting within the overlapping selection window as an n-fold collision, given by the following formula:
P r ( n ) = Pr { n - fold Collision | n R C U E s = 0 , Reselect } = 1 1 1 N a ¯ n
where N a ¯ represents the average number of available PRBs within the selection window. This number is influenced by the vehicle density and the packet size. Higher vehicle density and larger packet sizes result in fewer available resources.
Thus, when UE0 performs reselection within the selection window, the access collision probability can be obtained as follows:
P c = Pr { Collision | R C U E 0 = 0 , Reselect } = n = 1 N U E P r ( n ) P s ( n )
In SB-SPS, the same resources would be continuously colliding after an access collision. The collision will last for at least min [ R C UE 0 , R C UE 1 ] times between two UE0 and UE1. Therefore, it is useful to adjust Δ R C , which is the R C decreasing step size to mitigate consecutive collisions. Increasing the Δ R C can reduce consecutive collisions. However, an overly aggressive increase in Δ R C will increase the number of vehicles entering the selection window simultaneously, thereby increasing the probability of access collision, according to (14). Therefore, it is necessary to optimize the Δ R C .

5. Q-Learning-Based CCM-SPS Resource Allocation Scheme Proposed for JCS Sidelink

Details of our proposed CCM-SPS scheme to improve the sensing and resource reservation process will be elaborated below.

5.1. Collision Detection Mechanism

The full-duplex (FD) transceiver calculates the echo power as a condition for collision detection. The total received power of the FD receiver can be expressed as:
P r F D = P R + P n + I t , i
where P R represents the power of the reflected echo signal. Assume only the closest signal to the transmission vehicle is considered. Since the CAM signal contains distance information, P R can be calculated through path loss using the distance of CAM messages. P n is the Gaussian white noise power. I t , i represents the interference of other transmitting vehicles to VUE i in the same time slot t according to (6).
By calculating the difference between the received power P r F D and the echo power P R from the nearest vehicle, while also accounting for the noise power P n , the interference power strength I t , i can be effectively estimated.
Then, the collision detection rules for collisions in the JCS sidelink are given as follows:
P r F D P R P n > P Δ , C O L = 1 P r F D P R P n < P Δ , C O L = 0
The threshold is used to determine whether the interference impact exceeds the critical power. If C O L = 0 , this indicates no resource collision; if C O L = 1 , this indicates a resource collision.

5.2. Q-Learning-Based CCM-SPS Scheme for JCS Sidelink

After detecting a resource collision, a consecutive collision elimination mechanism based on Q-learning is proposed to mitigate consecutive resource collisions. Vehicles interact with the environment in real-time and intelligently determine the optimal actions given the current state.

5.2.1. Vehicular Agent Based on the Reinforce Learning Model

In a typical reinforcement learning framework [47], as illustrated in Figure 4, the agent achieves its learning objectives through an iterative process of receiving rewards from the environment. Intelligent agents improve their strategies by exchanging rewards with the environment.
In the proposed sidelink JCS system, each vehicle updates its current state after transmitting data packets and detecting collisions. The current state includes the collision status and the current R C value. For vehicle i, the state s i S can be described as follows:
s i = ( C O L i , R C i )
where COL i indicates whether the vehicle i experiences a resource collision when transmitting a data packet. RC i represents the current value of the reselection counter, with a range of [0, 15]. Therefore, the state set consists of a discrete state space.
The traditional SB-SPS scheme reduces the R C by 1 after each packet transmission. To regulate the decreased efficiency of R C , we propose a set of a discrete action space that represents the selection of actions based on acquired state information. This enables resource collision vehicles to adjust the R C decreasing step size ( Δ R C ).
A = { a i Δ R C }
where Δ R C represents the value of the R C that the vehicle i has to reduce after each transmission. The vehicle will dynamically select an action in the action space of [0, 15] based on the observed state. When the R C Δ R C < 0 , the vehicle enters the reselection process and reselects a new resource.
After a vehicle executes an action, its state will transition, and it will learn the instantaneous reward R from the environment.
R = ( exp ( N c o l ) 1 ) / l o g 10 ( R C + 1 ) , COL = 1 1 , COL = 0
where N c o l represents the times of a consecutive resource collision for the vehicle. When the collision does not occur, the reward is 1. When a collision occurs, the greater the N c o l is, the greater the penalty is, ensuring that the collision state can be removed as quickly as possible. In addition, the current R C value is introduced as a constraint. The smaller the R C , the greater the penalty. When different R C vehicles collide in the same resources, the probability of the vehicle with a small R C entering the reselection stage increases, while the vehicle with a large R C can continue to transmit using the current resources. This mechanism can effectively reduce the number of vehicles in the reselection stage simultaneously, thus helping to maintain the stability of the system.
In Q-learning, Q value Q ( s , a ) was calculated and updated using a reward model to evaluate the state-action mapping policy under the state action pair ( s , a ) . The Q value is updated by the Bellman equation, as shown below:
Q ( s , a ) Q ( s , a ) + α [ R + η max s S Q ( s , a ) Q ( s , a ) ]
When vehicles optimize the search for the optimal action, they need to balance the exploitation and exploration of learned knowledge to ensure that each action has a possibility of being selected.
a = arg max a A Q ( s , a ) , with prob . 1 ε random , with prob . ε
This paper adopts an ϵ -greedy strategy to balance the exploitation–exploration process. During the training process, the initial ϵ is set to 0.9 and gradually decreased to 0.1. The training lasted for 100 s, consisting of 1000 iterations. The initial learning rate was set to 0.01, and the discount factor was set to 0.9.

5.2.2. Algorithm Flow and Pseudo-Code

Based on previous analysis, we proposed the CCM-SPS scheme to reduce the times of consecutive collisions. Specifically, during transmission, the full-duplex (FD) echo detection capability is used to sense the channel state. If a collision is detected, an Δ R C is selected based on the current state. This means that when a collision is detected, the scheme decreases the times of consecutive collision and restricts the number of vehicles entering the reselection process simultaneously, as shown in Figure 5.
The pseudocode of its algorithm is shown below (Algorithm 1):
Algorithm 1: Pseudo-code of the proposed CCM-SPS
Input: Vehicle density, Packets occupy bandwidth
Output:  Q t a b l e
   1:
Initialize the parameters such as learning rate α and discount factor η .
   2:
Initialize the vehicles’ states, actions, and Q t a b l e
   3:
loop
   4:
    Begin the new packet transmission
   5:
    Observe transmitted vehicle s i
   6:
    if collision occurred, C O L = 1  then
   7:
        the number of consecutive resource collisions
         N c o l = N c o l + 1
   8:
        Vehicle i obtain the negative reward:
         ( exp ( N c o l ) 1 ) / l o g ( R C + 1 )
   9:
    else
 10:
        the number of consecutive resource collisions
         N c o l = 0
 11:
        Vehicle i obtain the positive reward 1
 12:
    end if
 13:
    Vehicle update the Q(s,a)
 14:
    Vehicle update the probability ϵ according to simulation time
 15:
    if exploration then
 16:
        Vehicle randomly select an action α
 17:
    else
 18:
        Vehicle select the optimal action α o p t with Q m a x
 19:
    end if
 20:
 end loop

6. JCS Sidelink Performance Evaluation Using CCM-SPS

In this section, key performance metrics such as CRLB, PRR, and UD were evaluated under varying vehicle density and packet size scenarios, and comprehensive experimental discussions were conducted. Among them, CRLB is a sensing performance metric that affects target detection accuracy, thereby influencing autonomous driving decision-making; PRR is a communication reliability metric that impacts communication quality; and UD is a latency metric that affects the real-time performance of intelligent transportation systems.

6.1. Simulation Setup

The main settings are reported in Table 1 and discussed hereafter.
In this section, we simulate the performance of a JCS sidelink system using a system-level simulator. In particular, vehicles perform SINR evaluation for collision detection and CRLB calculation after each transmission.
Scenario. We simulated a three-lane bidirectional highway scenario, with vehicle density varying between 50, 150, and 250, depending on the setting. The average speed of vehicles was 70 km/h, and the STD of vehicle speed was 7 km/h. The radar cross-section of the vehicle is 10 dBsm.
Power settings and channel model. The channel model is WINNER + B1, with a fixed available bandwidth of 40 MHz and a center frequency of 5.9 GHz, to simulate a V2V channel scenario. The transmission power of the vehicle is 23 dBm, assuming that the gain of both the transmitting and receiving antennas is 3 dBi and the noise gap is 6 dB.
Physical layer and data traffic. In terms of physical layer settings, we used fixed SCS and MCS sizes. In terms of data traffic settings, we simulated V2V periodic message transmission with a period of 100 ms and support packet sizes of 350 or 1000 bytes to represent CAM and CPM business types, respectively. The packet size, under fixed MCS and SCS configurations, will affect the bandwidth occupied by vehicle transmission packets, as shown in Table 2.

6.2. JCS Sidelike Performances with Dynamic Vehicle Density

Firstly, the JCS performances of the proposed CCM-SPS have been evaluated, compared with a benchmark scheme in the case of various vehicle densities. The benchmark scheme, FD-enhanced, is proposed in [39]. It detects resource collisions using full-duplex during transmission and then uses aggressive resource reselection by setting R C to 0 for all vehicles for which access collision occurs. It can quickly break off consecutive collisions to somehow improve communication performance. The proposed CCM-SPS scheme employs Q-learning to optimize the resource reselection process in addition to the FD detection in the resource reservation process.
Figure 6 illustrates the empirical CDF of the root CRLB for a range using SB-SPS, FD-enhanced and CCM-SPS in various density scenarios. As the vehicle density increases, the sensing performance of the root CRLB range decreases. And the FD-enhanced significantly improves performance at medium and low vehicle densities, but the improvement is not as obvious at high vehicle densities. However, the proposed CCM-SPS not only further enhances performance in medium to low-density but also effectively improves performance in high-density scenarios. Figure 7 presents 95% range root CRLB for different algorithms at varying densities. The bar graph provides a more visual representation showing the superior performance of the proposed CCM-SPS across all density scenarios compared to the other two schemes.
Similar results also appear in discussions on communication performance. Figure 8 illustrates PRR over a distance using SB-SPS, FD-enhanced and CCM-SPS in different density scenarios. As the vehicle density increases, the PRR of different schemes all show a downward trend. In mid- and low-density scenarios, the FD-enhanced scheme shows some improvements in PRR within a 200 m communication range. However, in high-density scenarios, the improvements are not significant. By examining the maximum distance where PRR exceeds 0.95 under three different schemes at varying densities, as shown in Figure 9, the advantages of the CCM-SPS scheme become more obvious. Compared to the FD-enhanced scheme, the proposed algorithm can further enhance communication metrics effectively across different density scenarios, especially in high vehicle density. The maximum communication ranges in low-, mid-, and high-density scenarios are increased by 22.2%, 30%, and 50%, respectively. This analysis indicates that the CCM-SPS scheme demonstrates robustness and effectiveness in various traffic density scenarios.
In short, as vehicle density rises, available resources in the pool decrease, increasing the probability of access collisions and resulting in consecutive resource collisions, leading to JCS performance degradation. The FD-enhanced method employs full-duplex to detect collisions and achieve early termination of consecutive collisions. However, an overly aggressive reselection mechanism leads to an excessive number of vehicles entering the reselection stage in high-density scenarios. According to (13), an increase in the number of vehicles in the reselection process within the same time slot raises the probability of new access collisions. The advantages of the FD-enhanced scheme over traditional SB-SPS are more pronounced at mid and low densities but diminish at high densities. However, the CCM-SPS scheme outperforms FD-enhanced in a variety of density scenarios. It can be demonstrated that the Q-learning-based scheme significantly enhances performance, particularly in high-density environments. This is due to the reward function defined in the proposed Q-learning algorithm, i.e., vehicles that do not undergo resource collisions will receive rewards and strive to maintain the current state as much as possible, ensuring the stability of the resource pool. Vehicles that experience resource collisions are penalized based on the current R C and the times of consecutive collision, which encourages vehicles to learn the best strategy to minimize consecutive collisions and avoid new collisions.
The results demonstrate that the proposed CCM-SPS scheme outperforms the comparison methods in terms of CRLB and PRR across various density scenarios. These advantages have strong application potential in fields of high-accuracy sensing. For example, in high-density environments, it maintains a CRLB accuracy of 0.1 m and a reliable communication range of 50 m, ensuring both accurate target sensing and reliable communication, which are critical for autonomous driving decision-making.

6.3. JCS Sidelike Performances with Dynamic Packet Sizes

6.3.1. Conflicting Impacts of Packet Sizes on JCS Performance Metrics

Secondly, in this section, we investigate the conflicting impacts of packet sizes on JCS performance metrics in terms of root CRLB range and PRR in different vehicle density scenarios by using a traditional SB-SPS scheme. Then, we evaluate the optimization performance results of a Q-learning-based CCM-SPS scheme, including sensing and communication metrics in different vehicle density scenarios.
Figure 10 illustrates the empirical CDF of root CRLB range estimation with different packet sizes and vehicle densities. With a given pack size, the empirical CDF of the root CRLB Range decreases rapidly with an increase in vehicle density. As the packet size increases, the CDF curve of the root CRLB Range shifts toward the left with a small value of CRLB. This result indicates the enhanced sensing accuracy performance.
Figure 11 illustrates the communication reliability metric PRR vs. distance under different packet sizes and vehicle densities. With a given packet size, the communication reliability PRR decreases with the transmission distance. As the packet size or the vehicle density increases, the PRR curves decline fast.
Based on the above discussion of the results of both Figure 10 and Figure 11 of the JCS system with the traditional SB-SPS scheme, it can be seen that with increasing vehicle density, both sensing and communication performances decrease. This is because the worse channel quality of the sidelink in a high-density vehicle network yields more consecutive collisions over the shared resource pool with traditional SB-SPS. Moreover, it also can be seen that with increasing packet size, the sensing accuracy increases with a smaller root CRLB range value, while the communication reliability declines with a smaller PRR value. In the case of a big packet size, the required bandwidth for transmission increases when more subcarriers are allocated. An echo radio with a large subcarrier number N can produce a small CRLB value range, according to (8). At the same time, there is a high probability that different vehicles occupy the same spectrum resource, resulting in serious consecutive collisions due to the resource competition over the sidelink, which would deny successful access and produce a small PRR.
In short, the traditional sidelink resource allocation scheme is less effective for a sidelink JCS system to support various services in highly dense dynamic vehicular networks. In order to overcome the conflicting requirement on the spectrum resource allocation between sensing and communication, this paper has proposed a novel CCM-SPS scheme to realize an effective sidelink resource allocation for enhancing JCS performances. Next, in the following section, JCS performances will be evaluated.

6.3.2. Optimization Performance Evaluation of Q-Learning-Based CCM-SPS

The proposed CCM-SPS can optimize JCS performance metrics by introducing the Q-learning method in order to control the repetition times of the reserved resources by adjusting a decreasing R C step size and suppressing the consecutive collisions probability.
Figure 12 illustrates the CCM-SPS’s range sensing performance evaluation on an empirical CDF of root CRLB with different pack sizes in the case of density = 50, 150, 250 veh/km. The performances of traditional SB-SPS with the same configuration are also given as a comparison. As for the CDF curve of the root CRLB Range, CCM-SPS’s performance curve increases faster than traditional SB-SPS at a given packet size and vehicle density, giving a high probability of a small CRLB value. Meanwhile, with an increase in packet size, the larger the packet size, the more CRLBs. As the vehicle density increases, CCM-SPS has the advantage of being able to maintain a smaller CRLB value than SB-SPS.
On the other hand, Figure 13 shows PRR vs. distance performance of CCM-SPS with different pack sizes with different densities. Figure 14 indicates CCM-SPS’s communication performance evaluation on empirical CDF of update delay with different pack sizes with different densities. The performances of traditional SB-SPS with the same configuration are also given as a comparison. Figure 13 and Figure 14 provide the communication performance results.
It can be seen that, at a given vehicle density, CCM-SPS can obtain a higher PRR than SB-SPS. With the packet size increase, the PRR of CCM-SPS decreases slower than SB-SPS. As the vehicle density becomes large, CCM-SPS can hold a relatively high PRR and SB-SPS a low PRR even in the low density scenario Similarly, with a given vehicle density, CCM-SPS can achieve an obvious deduction on update delay compared with SB-SPS. As vehicle density becomes dense, the update delay increases in both schemes, but CCM-SPS can maintain a high probability of a small update delay value. Therefore, the proposed CCM-SPS can achieve optimal communication qualities, such as high reliability and low latency, better than the traditional scheme.
In view of the above discussion results, it is significant that CCM-SPS can fulfill comprehensive performance enhancements on both sensing and communication without affecting the cost of each other. CCM-SPS makes resource reservation feasible for sidelink JCS access after FD detects the dynamic SINR from the echo signal. According to reward function (19) related to R C and N c o l , the JCS vehicle agent can learn from the dynamic network environment and feed back a corresponding reward in order to optimize the Δ R C selection actions in the reservation process through the Q-learning approach. As a result, the effective resource allocation for both sensing and communication can simultaneously be realized by using CCM-SPS. Therefore, the CCM-SPS algorithm is capable of supporting dynamic data traffic service scenarios such as the V2X network in intelligent transportation systems.

7. Conclusions

This paper proposes a resource allocation scheme in a sidelink JCS system, named consecutive collision mitigation semi-persistent scheduling (CCM-SPS). By employing collision detection referring to the echo power threshold and Q-learning to train the R C decreasing step size, this scheme can effectively suppress the consecutive collision probability. Compared with traditional SB-SPS and the FD-enhanced scheme, CCM-SPS can achieve both superior sensing and communication performance even in high-density vehicle scenarios. Furthermore, CCM-SPS can support services with large packet sizes and achieve accurate sensing, and the cost of communication reliability is smaller as the distance increases. It is particularly meaningful for CCM-SPS from the perspective of enabling sidelinks to support sensing and communication collaboration in 6G networks. In future work, there are interesting topics to be studied, such as practical full-duplex impacts from interference and cross-layer optimization. In addition to the V2X network studied in this paper, there is still room to explore the CCM-SPS scheme to be used in various JCS applications, such as the AIOT network. Additionally, the integration of edge computing with the CCM-SPS scheme can further enhance the performance of the Sidelink JCS system to support rich and broad JCS application tasks.

Author Contributions

Conceptualization, Z.L. and P.W.; methodology, Z.L. and Y.S.; software, Z.L.; validation, Z.L. and S.L.; formal analysis, P.W.; investigation, Z.L., P.W. and Y.S.; resources, Z.L.; data curation, Z.L.; writing—original draft preparation, Z.L.; visualization, Z.L.; supervision, P.W.; project administration, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yan, S.; Peng, M.G.; Wang, W.B. Integration of communication, sensing and computing: The vision and key technologies of 6G. J. Beijing Univ. Posts Telecommun. 2021, 44, 1. [Google Scholar]
  2. Griffiths, H.; Cohen, L.; Watts, S.; Mokole, E.; Baker, C.; Wicks, M.; Blunt, S. Radar spectrum engineering and management: Technical and regulatory issues. Proc. IEEE 2014, 103, 85–102. [Google Scholar] [CrossRef]
  3. Liu, F.; Cui, Y.; Masouros, C.; Xu, J.; Han, T.X.; Eldar, Y.C.; Buzzi, S. Integrated sensing and communications: Toward dual-functional wireless networks for 6G and beyond. IEEE J. Sel. Areas Commun. 2022, 40, 1728–1767. [Google Scholar] [CrossRef]
  4. Garcia, M.H.C.; Molina-Galan, A.; Boban, M.; Gozalvez, J.; Coll-Perales, B.; Şahin, T.; Kousaridas, A. A tutorial on 5G NR V2X communications. IEEE Commun. Surv. Tutor. 2021, 23, 1972–2026. [Google Scholar] [CrossRef]
  5. Decarli, N.; Bartoletti, S.; Bazzi, A.; Stirling-Gallacher, R.A.; Masini, B.M. Performance Characterization of Joint Communication and Sensing With Beyond 5 G NR-V2X Sidelink. IEEE Trans. Veh. Technol. 2024, 73, 10044–10059. [Google Scholar] [CrossRef]
  6. Jeon, Y.; Kuk, S.; Kim, H. Reducing message collisions in sensing-based semi-persistent scheduling (SPS) by using reselection lookaheads in cellular V2X. Sensors 2018, 18, 4388. [Google Scholar] [CrossRef]
  7. Gu, B.; Chen, W.; Alazab, M.; Tan, X.; Guizani, M. Multiagent reinforcement learning-based semi-persistent scheduling scheme in c-v2x mode 4. IEEE Trans. Veh. Technol. 2022, 71, 12044–12056. [Google Scholar] [CrossRef]
  8. Gu, X.; Peng, J.; Cai, L.; Cheng, Y.; Zhang, X.; Liu, W.; Huang, Z. Performance analysis and optimization for semi-persistent scheduling in c-v2x. IEEE Trans. Veh. Technol. 2022, 72, 4628–4642. [Google Scholar] [CrossRef]
  9. Kolodziej, K.E.; Perry, B.T.; Herd, J.S. In-band full-duplex technology: Techniques and systems survey. IEEE Trans. Microw. Theory Tech. 2019, 67, 3025–3041. [Google Scholar] [CrossRef]
  10. Zhang, H.; Zhai, X.; Zhang, J.; Bai, X.; Li, Z. Mechanism Analysis of the Effect of the Equivalent Proportional Coefficient of Inertia Control for a Doubly Fed Wind Generator on Frequency Stability in Extreme Environments. Sustainability 2024, 16, 4965. [Google Scholar] [CrossRef]
  11. Zhang, H.; Li, Z.; Xue, Y.; Chang, X.; Su, J.; Wang, P.; Guo, Q.; Sun, H. A Stochastic Bi-level Optimal Allocation Approach of Intelligent Buildings Considering Energy Storage Sharing Services. IEEE Trans. Consum. Electron. 2024, 70, 5142–5153. [Google Scholar] [CrossRef]
  12. Ibrahem, L.N.; Al-Mistarihi, M.F.; Khodeir, M.A.; Alhulayil, M.; Darabkh, K.A. Best relay selection strategy in cooperative spectrum sharing framework with mobile-based end user. Appl. Sci. 2023, 13, 8127. [Google Scholar] [CrossRef]
  13. Alhulayil, M.; López-Benítez, M. Static contention window method for improved LTE-LAA/Wi-Fi coexistence in unlicensed bands. In Proceedings of the 2019 International Conference on Wireless Networks and Mobile Communications (WINCOM), Fez, Morocco, 29 October–1 November 2019; pp. 1–6. [Google Scholar]
  14. Masouros, C.; Zhang, J.A.; Liu, F.; Zheng, L.; Wymeersch, H.; Di Renzo, M. Guest editorial: Integrated sensing and communications for 6G. IEEE Wirel. Commun. 2023, 30, 14–15. [Google Scholar] [CrossRef]
  15. Wild, T.; Braun, V.; Viswanathan, H. Joint design of communication and sensing for beyond 5G and 6G systems. IEEE Access 2021, 9, 30845–30857. [Google Scholar] [CrossRef]
  16. Liu, M.; Yang, M.; Zhang, Z.; Li, H.; Liu, F.; Nallanathan, A.; Hanzo, L. Sensing-Communication Coexistence vs. Integration. IEEE Trans. Veh. Technol. 2023, 72, 8158–8163. [Google Scholar] [CrossRef]
  17. Li, Y.; Liu, F.; Du, Z.; Yuan, W.; Shi, Q.; Masouros, C. Frame Structure and Protocol Design for Sensing-Assisted NR-V2X Communications. IEEE Trans. Mob. Comput. 2024, 23, 11045–11060. [Google Scholar] [CrossRef]
  18. Ni, Z.; Zhang, J.A.; Yang, K.; Liu, R. Frequency-hopping based joint automotive radar-communication systems using a single device. In Proceedings of the 2022 IEEE International Conference on Communications Workshops (ICC Workshops), Seoul, Republic of Korea, 16–20 May 2022; pp. 480–485. [Google Scholar]
  19. Liu, Y.; Liao, G.; Chen, Y.; Xu, J.; Yin, Y. Super-resolution range and velocity estimations with OFDM integrated radar and communications waveform. IEEE Trans. Veh. Technol. 2020, 69, 11659–11672. [Google Scholar] [CrossRef]
  20. Gaudio, L.; Kobayashi, M.; Caire, G.; Colavolpe, G. On the effectiveness of OTFS for joint radar parameter estimation and communication. IEEE Trans. Wirel. Commun. 2020, 19, 5951–5965. [Google Scholar] [CrossRef]
  21. Zhang, Q.; Li, Z.; Gao, X.; Feng, Z. Performance evaluation of radar and communication integrated system for autonomous driving vehicles. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; pp. 1–2. [Google Scholar]
  22. Yang, H.; Wei, Z.; Feng, Z.; Qiu, C.; Fang, Z.; Chen, X.; Zhang, P. Queue-aware dynamic resource allocation for the joint communication-radar system. IEEE Trans. Veh. Technol. 2020, 70, 754–767. [Google Scholar] [CrossRef]
  23. Liu, F.; Masouros, C. Joint localization and predictive beamforming in vehicular networks: Power allocation beyond water-filling. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 8393–8397. [Google Scholar]
  24. Zhong, Y.; Bi, T.; Wang, J.; Zeng, J.; Huang, Y.; Jiang, T.; Wu, Q.; Wu, S. Empowering the V2X network by integrated sensing and communications: Background, design, advances, and opportunities. IEEE Netw. 2022, 36, 54–60. [Google Scholar] [CrossRef]
  25. Liyanaarachchi, S.D.; Barneto, C.B.; Riihonen, T.; Valkama, M. Experimenting joint vehicular communications and sensing with optimized 5G NR waveform. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021; pp. 1–5. [Google Scholar]
  26. Decarli, N.; Guerra, A.; Giovannetti, C.; Guidi, F.; Masini, B.M. V2X sidelink localization of connected automated vehicles. IEEE J. Sel. Areas Commun. 2023, 42, 120–133. [Google Scholar] [CrossRef]
  27. Giovannetti, C.; Decarli, N.; Bartoletti, S.; Stirling-Gallacher, R.A.; Masini, B.M. Target Positioning Accuracy of V2X Sidelink Joint Communication and Sensing. IEEE Wirel. Commun. Lett. 2023, 13, 849–853. [Google Scholar] [CrossRef]
  28. Bartoletti, S.; Decarli, N.; Masini, B.M. Sidelink 5G-V2X for integrated sensing and communication: The impact of resource allocation. In Proceedings of the 2022 IEEE International Conference on Communications Workshops (ICC Workshops), Seoul, Republic of Korea, 16–20 May 2022; pp. 79–84. [Google Scholar]
  29. European Telecommunications Standards Institute (ETSI). LTE; 5G; Overall Description of Radio Access Network (RAN) Aspects for Vehicle-to-Everything (V2X) Based on LTE and NR; ETSI: Nice, France, 2024. [Google Scholar]
  30. 3GPP. 3GPP TR 37.895 v16.0.0, Overall Description of Radio Access Network (RAN) Aspects for Vehicle-to-Everything (V2X) Based on LTE and NR. Technical Report, Release 16. 2020. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3601 (accessed on 5 January 2025).
  31. Jeon, Y.; Kim, H. An explicit reservation-augmented resource allocation scheme for C-V2X sidelink mode 4. IEEE Access 2020, 8, 147241–147255. [Google Scholar] [CrossRef]
  32. Yin, J.; Hwang, S.H. Reuse Distance-Aided Resource Selection Mechanisms for NR-V2X Sidelink Communication. Sensors 2023, 24, 253. [Google Scholar] [CrossRef] [PubMed]
  33. Jiawei, T.; Pawase, C.J.; Chang, K. Adaptive Sidelink Open Loop Power Control Optimization Strategies for Vehicle-to-Vehicle Communications in 5G-NR-V2X. IEEE Access 2024, 12, 25079–25089. [Google Scholar] [CrossRef]
  34. Molina-Galan, A.; Lusvarghi, L.; Coll-Perales, B.; Gozalvez, J.; Merani, M.L. On the Impact of Re-evaluation in 5G NR V2X Mode 2. IEEE Trans. Veh. Technol. 2023, 73, 2669–2683. [Google Scholar] [CrossRef]
  35. Lu, Y.; Wang, P.; Wang, S.; Yao, W. A Q-learning based SPS resource scheduling algorithm for reliable C-V2X communication. In Proceedings of the 2021 5th International Conference on Digital Signal Processing, Chengdu, China, 26–28 February 2021; pp. 201–206. [Google Scholar]
  36. Saad, M.M.; Tariq, M.A.; Seo, J.; Ajmal, M.; Kim, D. Age-of-information aware intelligent MAC for congestion control in NR-V2X. In Proceedings of the 2023 Fourteenth International Conference on Ubiquitous and Future Networks (ICUFN), Paris, France, 4–7 July 2023; pp. 265–270. [Google Scholar]
  37. Urmonov, O.; Aliev, H.; Kim, H. Multi-agent deep reinforcement learning for enhancement of distributed resource allocation in vehicular network. IEEE Syst. J. 2022, 17, 491–502. [Google Scholar] [CrossRef]
  38. Campolo, C.; Molinaro, A.; Berthet, A.O.; Vinel, A. Full-duplex radios for vehicular communications. IEEE Commun. Mag. 2017, 55, 182–189. [Google Scholar] [CrossRef]
  39. Campolo, C.; Bazzi, A.; Todisco, V.; Bartoletti, S.; Decarli, N.; Molinaro, A.; Berthet, A.O.; Stirling-Gallacher, R.A. Enhancing the 5G-V2X sidelink autonomous mode through full-duplex capabilities. In Proceedings of the 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), Helsinki, Finland, 19–22 June 2022; pp. 1–6. [Google Scholar]
  40. Campolo, C.; Molinaro, A.; Romeo, F.; Bazzi, A.; Berthet, A.O. Full duplex-aided sensing and scheduling in cellular-V2X mode 4. In Proceedings of the 1st ACM MobiHoc Workshop on Technologies, Models, and Protocols for Cooperative Connected Cars, Catania, Italy, 2 July 2019; pp. 19–24. [Google Scholar]
  41. Bazzi, A.; Masini, B.M.; Zanella, A. Performance Analysis of V2V Beaconing Using LTE in Direct Mode with Full Duplex Radios. IEEE Wirel. Commun. Lett. 2015, 4, 685–688. [Google Scholar] [CrossRef]
  42. Barneto, C.B.; Liyanaarachchi, S.D.; Heino, M.; Riihonen, T.; Valkama, M. Full duplex radio/radar technology: The enabler for advanced joint communication and sensing. IEEE Wirel. Commun. 2021, 28, 82–88. [Google Scholar] [CrossRef]
  43. Gaudio, L.; Kobayashi, M.; Bissinger, B.; Caire, G. Performance analysis of joint radar and communication using OFDM and OTFS. In Proceedings of the 2019 IEEE International Conference on Communications Workshops (ICC Workshops), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
  44. Keskin, M.F.; Koivunen, V.; Wymeersch, H. Limited feedforward waveform design for OFDM dual-functional radar-communications. IEEE Trans. Signal Process. 2021, 69, 2955–2970. [Google Scholar] [CrossRef]
  45. Lien, S.Y.; Deng, D.J.; Lin, C.C.; Tsai, H.L.; Chen, T.; Guo, C.; Cheng, S.M. 3GPP NR sidelink transmissions toward 5G V2X. IEEE Access 2020, 8, 35368–35382. [Google Scholar] [CrossRef]
  46. Shuai, W.; Yan, L.; Jie, Z.; Ping, W. A novel collision supervision and avoidance algorithm for scalable MAC of vehicular networks. Chin. J. Electron. 2021, 30, 164–170. [Google Scholar] [CrossRef]
  47. Wang, P.; Wang, S. A fairness-enhanced intelligent MAC scheme using Q-learning-based bidirectional backoff for distributed vehicular communication networks. Tsinghua Sci. Technol. 2022, 28, 258–268. [Google Scholar] [CrossRef]
Figure 1. NG-RAN architecture supporting the PC5 interface.
Figure 1. NG-RAN architecture supporting the PC5 interface.
Sensors 25 00302 g001
Figure 2. Process flow of sensing-based semi-persistent scheduling (SB-SPS).
Figure 2. Process flow of sensing-based semi-persistent scheduling (SB-SPS).
Sensors 25 00302 g002
Figure 3. Markov chain for state transition of SPS.
Figure 3. Markov chain for state transition of SPS.
Sensors 25 00302 g003
Figure 4. Reinforcement learning framework.
Figure 4. Reinforcement learning framework.
Sensors 25 00302 g004
Figure 5. CCM-SPS accelerate reselection.
Figure 5. CCM-SPS accelerate reselection.
Sensors 25 00302 g005
Figure 6. Empirical CDF of the root CRLB for a range using SB-SPS, FD-enhanced and CCM-SPS.
Figure 6. Empirical CDF of the root CRLB for a range using SB-SPS, FD-enhanced and CCM-SPS.
Sensors 25 00302 g006
Figure 7. Bar graph of root CRLB (at CCDF = 95-percentile) for a range using different schemes with varying vehicle densities.
Figure 7. Bar graph of root CRLB (at CCDF = 95-percentile) for a range using different schemes with varying vehicle densities.
Sensors 25 00302 g007
Figure 8. PRR over distance using SB-SPS, FD-enhanced and CCM-SPS.
Figure 8. PRR over distance using SB-SPS, FD-enhanced and CCM-SPS.
Sensors 25 00302 g008
Figure 9. The maximum distance allowing PRR larger than 0.95 is evaluated using conventional SB-SPS, FD-enhanced methods and CCM-SPS.
Figure 9. The maximum distance allowing PRR larger than 0.95 is evaluated using conventional SB-SPS, FD-enhanced methods and CCM-SPS.
Sensors 25 00302 g009
Figure 10. Empirical CDF of root CRLB for range estimation.
Figure 10. Empirical CDF of root CRLB for range estimation.
Sensors 25 00302 g010
Figure 11. PRR vs. distance performance of SB-SPS with different pack sizes in case of density = 50, 150, 250 veh/km.
Figure 11. PRR vs. distance performance of SB-SPS with different pack sizes in case of density = 50, 150, 250 veh/km.
Sensors 25 00302 g011
Figure 12. CCM-SPS’s range sensing performance evaluation on empirical CDF of root CRLB with different pack sizes in the case of density = 50, 150, 250 veh/km.
Figure 12. CCM-SPS’s range sensing performance evaluation on empirical CDF of root CRLB with different pack sizes in the case of density = 50, 150, 250 veh/km.
Sensors 25 00302 g012
Figure 13. PRR vs. distance performance of CCM-SPS with different pack sizes in case of density = 50, 150, 250 veh/km.
Figure 13. PRR vs. distance performance of CCM-SPS with different pack sizes in case of density = 50, 150, 250 veh/km.
Sensors 25 00302 g013
Figure 14. CCM-SPS’s communication performance evaluation on empirical CDF of update delay with different pack sizes in the case of density = 50, 150, 250 veh/km.
Figure 14. CCM-SPS’s communication performance evaluation on empirical CDF of update delay with different pack sizes in the case of density = 50, 150, 250 veh/km.
Sensors 25 00302 g014
Table 1. Simulation parameters.
Table 1. Simulation parameters.
ParameterSymbolValue
Scenario
Road layout- -Highway, 3 + 3 lanes
Density- -50, 150, 250 vehicles/km
Average speed- -70 km/h
STD of vehicle speed- -7 km/h
Target RCS σ 10 dBsm
Power and propagation
Channel model(interference)- -WINNER+, B1
Available channel bandwidth W c h 40 MHz
Transmitted power P T 23 dBm
Antenna gainG3 dBm
Noise figureF6 dB
Center frequency f c 5.9 GHz
Shadowing- -Variance 3 dB, decorr.dist. 25 m
Physical layer
SCS Δ f 15 kHz
MCS- -5 (QPSK, R c = 0.3 )
Sbuchannel size- -10 PRBs
Access layer
Keep probability P k e e p 0.8
Initial reselection counter R C [ 5 , 15 ]
RSRP sensing threshold- -−126 dBm
Data traffic
Packet generation interval- -100 ms
Packet size- -350, 1000 bytes
Table 2. Impact of parameters on occupied bandwidth.
Table 2. Impact of parameters on occupied bandwidth.
PacketSCSMCS N PRB max N sub N PRB W [MHz]
3501552164407.2
10001552161010018
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Wang, P.; Shen, Y.; Li, S. Reinforcement Learning-Based Resource Allocation Scheme of NR-V2X Sidelink for Joint Communication and Sensing. Sensors 2025, 25, 302. https://doi.org/10.3390/s25020302

AMA Style

Li Z, Wang P, Shen Y, Li S. Reinforcement Learning-Based Resource Allocation Scheme of NR-V2X Sidelink for Joint Communication and Sensing. Sensors. 2025; 25(2):302. https://doi.org/10.3390/s25020302

Chicago/Turabian Style

Li, Zihan, Ping Wang, Yamin Shen, and Song Li. 2025. "Reinforcement Learning-Based Resource Allocation Scheme of NR-V2X Sidelink for Joint Communication and Sensing" Sensors 25, no. 2: 302. https://doi.org/10.3390/s25020302

APA Style

Li, Z., Wang, P., Shen, Y., & Li, S. (2025). Reinforcement Learning-Based Resource Allocation Scheme of NR-V2X Sidelink for Joint Communication and Sensing. Sensors, 25(2), 302. https://doi.org/10.3390/s25020302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop