You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

13 March 2023

A Reinforcement Learning-Based Congestion Control Approach for V2V Communication in VANET

,
and
School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Special Issue Vehicular Edge Computing and Networking

Abstract

Vehicular ad hoc networks (VANETs) are crucial components of intelligent transportation systems (ITS) aimed at enhancing road safety and providing additional services to vehicles and their users. To achieve reliable delivery of periodic status information, referred to as basic safety messages (BSMs) and event-driven alerts, vehicles need to manage the conflicting requirements of situational awareness and congestion control in a dynamic environment. To address this challenge, this paper focuses on controlling the message transmission rate through a Markov decision process (MDP) and solves it using a novel reinforcement learning (RL) algorithm. The proposed RL approach selects the most suitable transmission rate based on the current channel conditions, resulting in a balanced performance in terms of packet delivery and channel congestion, as shown by simulation results for different traffic scenarios. Additionally, the proposed approach offers increased flexibility for adaptive congestion control through the design of an appropriate reward function.

1. Introduction

Road traffic safety is a persistent issue that has been the subject of study for over 80 years [1]. The safety of motor vehicles, as the most widely used mode of transportation, is of utmost importance. Despite the global pandemic in 2020, an estimated 38,680 people lost their lives in motor vehicle crashes, marking the highest projected number of fatalities since 2007 [2]. In addition to driver behavior and attitudes, improving vehicle safety through inter-vehicular communication is a crucial aspect. Vehicular ad hoc networks (VANETs) [3] have gained recognition from government agencies, automobile industries, and academia as key components of intelligent transportation systems (ITS) [4] to enhance safety and efficiency on the road. VANETs enable direct communication between vehicles through onboard units (OBUs) or with infrastructure nodes such as roadside units (RSUs), thereby facilitating the dissemination of safety-related information. This information, including a vehicle’s position, speed, and acceleration, is periodically broadcast as basic safety messages (BSMs) [5] (also referred to as cooperative awareness messages (CAMs) [6] in Europe) to surrounding vehicles, enabling safety-critical applications, such as collision avoidance, lane change warnings, and hard braking warnings. The timely and accurate dissemination of these messages is essential for the effective operation of various safety applications.
The 5.9 GHz band has been allocated 75 MHz of spectrum for vehicle-to-vehicle (V2V) communication in VANET through dedicated short-range communications/wireless access for the vehicular environment (DSRC/WAVE) [7]. An additional 30 MHz of the spectrum is reserved for cellular vehicle-to-everything (C-V2X) communication [8]. Channel 172, with a bandwidth of 10 MHz, has been designated for vehicle safety and uses the 802.11p protocol [7], a contention-based random access MAC layer protocol. This type of communication can lead to simultaneous transmissions, causing packet collisions and reducing the reliability of communication [9]. As the number of vehicles increases, the broadcasting of BSMs can easily lead to congestion on this single channel, resulting in lower reception probability and decreased transmission ranges. Traditional methods attempt to address this by controlling transmission parameters, such as the transmission power and transmission rate (also known as beacon rate), but this can also lead to reduced awareness and increased inter-packet delay (IPD) [10]. Although an optimal combination of transmission power and rate could potentially provide sufficient awareness while keeping the channel load below a specified threshold, the optimization problem is non-convex, making it challenging to solve, as noted in [11].
In vehicular ad hoc networks (VANETs), it is challenging to find optimal solutions for congestion and awareness due to the highly mobile nodes and dynamic environments. As an alternative, this paper approaches the problem as a decision-making task, where each vehicle must make the appropriate choice of transmission parameters for its safety messages based on the information it gathers from its surroundings. The transmission rate selection is formulated as a Markov Decision Process (MDP) [12], where each vehicle is modeled as an independent agent that interacts with its environment and selects its transmission parameters based on the current conditions. Reinforcement learning (RL) [13] is utilized to train the vehicles to make the right decisions, and the learning is based solely on the observations of the surrounding environment. After training, the vehicles will have learned to make optimal decisions in dynamic conditions, adapting to changes in vehicle density and channel load. The key contributions of this paper are:
  • A framework to solve the MDP using RL methods, with a focus on discrete action and state spaces.
  • A Q-learning algorithm, where the training data are obtained directly from a simulated dynamic traffic environment, allowing for a more realistic representation of state transitions by observing channel busy ratio (CBR) values for different transmission rates and vehicle densities.
  • A reward function is defined, combining CBR and transmission rate, to keep the channel load under a target threshold while maximizing the transmission rate for congestion control.
  • Our simulation results demonstrate that the proposed Q-learning approach is successful in maintaining the desired channel load under various dynamic traffic scenarios and exhibits a lower Beacon Error Rate (BER) compared to existing methods.
The structure of this paper is outlined as follows:
In Section 2, we conduct a review of current approaches for congestion control, with a specific emphasis on recent machine learning (ML)-based techniques. In Section 3, we present our formulation of congestion control as an MDP and introduce our proposed Q-learning-based algorithm for congestion control. In Section 4, we demonstrate the application of our framework with a concrete example and discuss the results obtained. Finally, in Section 5, we provide a conclusion and suggest directions for future work.

3. An RL-Based Framework of Congestion and Awareness Control

In the field of VANET, congestion control is a vital challenge for ensuring the safety of communication over the limited bandwidth of the wireless channel. The objective of congestion control is to alleviate channel congestion by reducing the number of transmitted safety messages. This can be achieved by increasing the time interval between transmissions. However, reducing the frequency of transmissions leads to a reduction in awareness, causing a decrease in the visibility of other vehicles and vice versa. This in turn can cause congestion to arise again when the density of vehicles increases. The vehicles must therefore balance between prioritizing congestion control and awareness control, which becomes particularly challenging in dynamic mobility environments. Making the correct decision in different scenarios is crucial for successful congestion control. For instance, when the vehicle density is low, it may be beneficial to increase the transmission power to achieve a larger transmission range, while still maintaining an acceptable channel load by adjusting the transmission rate. When the vehicle density is high, the transmission power and rate must be adjusted accordingly. This decision-making problem is influenced by various factors, including vehicle density, channel congestion, and packet delay, making it challenging to find an optimal combination of transmission parameters using traditional methods, especially since some constraints may conflict with each other. It is important to note that the decision-making process in VANET must take into consideration the current situation only. This is due to the Markov property, which states that the future is independent of the past given the present. This means that the vehicle’s decision is based on the current state of the traffic flow and channel, rather than prior conditions. The Markov property can be formalized as:
P [ S t + 1 | S t ] = P [ S t + 1 | S 1 , , S t ]
Equation (1) shows that the state at the next moment, t + 1 , is solely dependent on the state of the current moment, t. Hence, we can model the problem with an MDP. RL is a good framework to find solutions to an MDP [13]. The main learning principle of a typical RL cycle in V2V communication can be described in Figure 1.
Figure 1. Typical RL Cycle in V2V communication.
Here, a vehicle and environment interact at each of a sequence of discrete time steps, t = 0 , 1 , 2 , 3 , . At each time step t, the vehicle receives some representation of the environment’s state, S t S , and based on the representation, it selects an action, A t A ( s ) . After one time step, the vehicle will receive a numerical reward, R t + 1 R , and it moves to a new state, S t + 1 . The vehicle’s goal is to maximize the total reward it receives. This means maximizing not the immediate reward, but the cumulative reward in the long run. If the sequence of rewards received after time step t is denoted as R t + 1 , R t + 2 , R t + 3 , , the maximized reward, where the return is G t , can be the sum of all the rewards at each time step until the final state. However, the BSM transmission is a continuous task without a time limit. To avoid the return being infinite, a discount rate γ is used to determine the present value of future rewards: a reward received k time steps in the future is worth only γ k 1 times the reward. Then when a vehicle selects an action A t to send the BSM, the expected return can be expressed as the following formula:
G t = R t + 1 + γ R t + 2 + γ 2 R t + 3 + = k = 0 γ k R t + k + 1
In Equation (2), G t is the total return and R t + i is the reward at each timestamp, where i N , i 1 and 0 γ 1 . When γ is close to 1, it means we take the future rewards into account more strongly. With RL, vehicles will learn by estimating how good it is to be in a given state or how good it is to perform a given action in a given state in terms of return. The higher the return, the better the state or the action just taken is. The mapping from each state to the probabilities of selecting each possible action is called a policy, denoted as π ( a | s ) , which indicates the probability that A t = a if S t = s . Solving an RL task means, roughly, finding a policy that achieves a high reward value over the long run [13]. At each state, a vehicle can have many actions to choose from, which means it can use different policies to choose an action. The value of taking action a in state s under a policy π , denoted q π ( s , a ) , is the expected return starting from s, taking action a, and following policy π . q π ( s , a ) is called the state–action value function. The learning target is to find the optimal state–action value function which tells the vehicle the maximum reward it is going to obtain if it is in state s and takes action a under policy π ; it can be denoted as follows:
q * ( s , a ) = m a x π   q π ( s , a )
After finding the optimal policy by q * ( s , a ) , the vehicle can then pick the action that gives it the optimal state–action value function as follows:
π * ( a | s ) = 1 i f a = a r g m a x a A   q * ( s , a ) 0 o t h e r w i s e
In Equation (4), when the agent is in state s, it can just simply select action a which maximizes the value of q * ( s , a ) and ignores other optional actions.

3.1. Elements of the RL Framework for Congestion and Awareness Control

The primary objective of this paper is to demonstrate the application of RL in selecting appropriate transmission parameters for V2V congestion control. The selection process is modeled as a MDP. This paper will cover the design of the elements of the MDP, considering the following factors:
  • Finite state and action space: In the VANET application layer, it is assumed that each vehicle can only choose from a finite set of actions at each state. As the state space is finite, Q-learning can be used to solve the problem.
  • Determination of neighboring vehicles: The number of neighboring vehicles is determined based on the interactions with the environment, as indicated by the BSMs received from these vehicles.
  • Experimental observations: All observations are calculated through experimental actions taken by the vehicle.
  • Independent decision-making: Each vehicle independently selects its actions based on its own observations, without exchanging any information with other vehicles except for BSMs.
The decision-making problem is formalized using the framework of MDP. The following are the key elements of the RL framework used to solve the MDP for V2V congestion control:
  • The agent: A learning agent must have the capability to perceive the state of its environment and take actions that can alter it. In the context of this problem, the vehicle acts as the agent that makes the decision of which action to take [13].
  • The goal: The objective is to select the optimal action for each state, with the aim of maximizing the reward. A well-defined goal is crucial for the agent (vehicle), such as reducing congestion or enhancing awareness. In this paper, the goal is to maximize the reward of actions that maintain the CBR below 0.6.
  • The environment: The environment represents the uncertain world in which the agent operates and interacts. The agent can interact with the environment and modify it through its actions, but it cannot change the rules or dynamics of the environment. In the context of VANETs, the environment encompasses the wireless channel and other vehicles. The uncertainty arises from factors such as dynamic traffic flow, such as changing vehicle velocity and density. The actions of the vehicles can impact the state of the environment (wireless channel status), but will not affect the density of vehicles on the road.
  • The action: Actions are the means by which the agent interacts with and influences its environment. In the VANET application layer, the most common actions include setting the transmission power, message transmission rate, or data rate of the messages to be transmitted. In this paper, for simplicity, we only consider the transmission rate. The maximum transmission rate in DSRC is 10 Hz. The action space is defined as 10 discrete transmission rates, ranging from 1 BSM per second to 10 BSMs per second, where a N , 1 a 10 , for each action a.
  • The state (observation): The state of the environment in the V2V communication problem is a collection of information that identifies the current situation. This information includes the wireless channel status, such as CBR, BER, IPD, etc. Moreover, the vehicle density, or the number of neighbors of a given vehicle, which represents the dynamics of the environment, should be considered. Even with the same action, the state could be different when the vehicle density is different. In our case, we define the space as a 2-tuple including the CBR and vehicle density, denoted as s = ( C B R , V D ) , C B R R + , 0 C B R 1 , and V D N , 1 V D m a x V D . The CBR is a real number between 0 and 1 that represents the channel busy ratio and the vehicle density is the number of vehicles within a 100 m radius, and m a x V D is the maximum vehicle density in this range. In this paper, we set m a x V D to 50. For each vehicle density, there are 10 CBR values corresponding to the 10 transmission rates. Note that the vehicle density cannot be changed based on the action but only calculated based on the BSMs received from the neighbors, so the whole state space will consist of 500 individual states. At each state, the vehicle selects a new transmission rate from 10 possible rates and updates its state accordingly, based on the information received from its neighbors in the form of BSMs.
  • The reward: A reward is a scalar value that measures the quality of an action taken by an agent. The agent uses the rewards provided by the environment after each action to learn and improve its behavior over time [13]. In the context of V2V communication, the reward is calculated based on observations from the environment and the goals of the vehicle. The reward calculation is performed using a reward function, which should be designed to meet the desired learning objectives. The goal of our proposed approach is to maintain the CBR below a predefined threshold η while simultaneously maximizing the number of BSMs transmitted. To accomplish this, we have defined the reward function as follows:
    r ( C B R , B R ) = B R · C B R · s i g n ( η C B R )
    where sign is the signum function shifted by the target value η . Any action that causes the CBR to exceed η will have a negative reward, which can speed up the learning process [13]. A very low transmission rate is not encouraged because the resulting reward will be lower. In this paper, we have used η = 0.6 as the target channel load. For different learning objectives, this value can be modified as needed or a different reward function can be implemented.

3.2. A Q-Learning Approach

Traditional MDP models require knowledge of state-transition probabilities, which can be challenging to obtain in the context of V2V communication. As an alternative, Q-learning is a model-free algorithm in RL that allows an agent to select and perform actions without relying on a prior understanding of the state-transition probabilities. Instead, the agent learns an optimal policy by directly interacting with the environment. The proposed Q-learning-based congestion control algorithm is implemented in two stages as follows:
  • Stage 1: The Q-learning algorithm is implemented using observation data obtained from a simulation. The algorithm generates a Q-table that represents the optimal policies at each state, as demonstrated in Algorithm 1.
  • Stage 2: The Q-table generated in the first stage is utilized by the vehicle to determine the interval before the next BSM transmission takes place. This stage is explained in detail in Algorithm 2.
Algorithm 1 provides an overview of our proposed Q-learning-based adaptive congestion control (QBACC) approach. The algorithm starts by initializing all values in the Q-table, which contains all the action–state pairs, to 0. Then at each time step t, the vehicle selects an action a t , observes the environment and receives a reward r t , transitions to a new state s t + 1 , and updates the value of Q ( s , a ) with Equation (6):
Q ( s t , a t ) Q ( s t , a t ) + α ( r ( s t , a t ) + γ   m a x a   Q ( s t + 1 , a ) Q ( s t , a t ) )
In Equation (6), α is the learning rate, where 0 < α 1 , and γ is the discount rate, where 0 < γ 1 . Q ( s t , a t ) is the current value of Q ( s , a ) , m a x a   Q ( s t + 1 , a ) is the estimate of optimal future value of Q ( s , a ) and r ( s t , a t ) is the reward when agent takes action a in state s at time step t.
Figure 2 lists the flowchart of Algorithm 1.
Algorithm 1 Q-Learning-based Adaptive Congestion Control (QBACC)
Parameters: 
step size α ( 0 , 1 ] , small ϵ > 0 , number of episodes
Result: 
Q-table with values of each state–action pair
  1:
Initialize S , the set of states (which contains one state for each beacon rate)
  2:
Set A ( s ) to be the set of actions that can be taken in state s, which consists of the ten possible beacon rates regardless of the value of s.
  3:
Initialize the Q-table Q, where Q ( s , a ) = 0 for all s S , a A ( s )
  4:
for each episode do
  5:
    Set s to be a random state in S
  6:
    for i = 1 to 10 do
  7:
        Choose action a from A ( s ) using the ϵ -greedy algorithm (random chance of choosing the action with the highest value in the Q-table so far; otherwise choose a random action)
  8:
        Compute the reward using Equation (5), where CBR is calculated using Equation (7), with the vehicle density as input
  9:
        Update Q ( s , a ) using Equation (6) and the obtained reward
10:
        Take action a and move to the corresponding state
11:
        Set s to be the new state
12:
    end for
13:
end for
Figure 2. Flow Chart of QBACC.
To generate the Q-table, in Algorithm 1, we created 12 different traffic models with different vehicle densities, varying from 0 to m a x V D vehicles in a 100 m radius, where we set m a x V D = 50 . Simulations were run for each traffic model with our designed action space, i.e., using a transmission rate ranging from 1 to 10 BSMs per second, to get the observation data (CBR) for each action at each state. These data were then used to generate functions for curves of best fit, which represent the correlation between the average beacon rate used by vehicles and the average CBR experienced by the network. These functions were used to create Equation (7) to estimate the CBR for each transmission rate and density combination based on the known trends. We set a maximum return value of 0.92 to prevent densities greater than the tested range, yielding values greater than 1 since the change in CBR was found to be negligible at high densities. The Q-learning algorithm was then run with the observation data and Equation (7) below to generate the Q-table for each combination of vehicle density and transmission rate for the whole state space. In the following equation, V D is the current vehicle density, B R is the estimated average transmission rate used by surrounding vehicles, and e s t C B R is the estimated CBR when there are V D neighboring vehicles that use an average transmission rate of B R BSMs.
e s t C B R ( V D , B R ) = 0.0101 V D + 0.0301 i f B R = 1 0.0189 V D + 0.0703 i f V D 33 0.2730 ln ( V D ) 0.2526 o t h e r w i s e i f B R = 2 0.0249 V D + 0.1194 i f V D 27 0.1663 ln ( V D ) + 0.2487 o t h e r w i s e i f B R = 3 0.0314 V D + 0.1500 i f V D 21 0.0988 ln ( V D ) + 0.5318 o t h e r w i s e i f B R = 4 0.0379 V D + 0.1818 i f V D 17 0.0884 ln ( V D ) + 0.5843 o t h e r w i s e i f B R = 5 0.0686 V D + 0.1425 i f V D 10 0.0819 ln ( V D ) + 0.6817 o t h e r w i s e i f B R = 6 0.0772 V D + 0.1688 i f V D 8 0.0659 ln ( V D ) + 0.6940 o t h e r w i s e i f B R = 7 0.0843 V D + 0.1972 i f V D 7 0.0442 ln ( V D ) + 0.7760 o t h e r w i s e i f B R = 8 0.0891 V D + 0.2289 i f V D 7 0.0304 ln ( V D ) + 0.8246 o t h e r w i s e i f B R = 9 0.0930 V D + 0.2602 i f V D 6 0.0151 ln ( V D ) + 0.8736 o t h e r w i s e i f B R = 10
By using this equation to predict the CBR of every combination of vehicle density and transmission rate, the Q-learning algorithm is able to find which combinations yield the best results and can generate the corresponding actions as a result. Each row in the Q-table corresponds to a combination of a vehicle density (ranging from 0 to 50) and an estimated average transmission rate used by the neighboring vehicles (an integer from 1 to 10), while each column represents a transmission rate that can be chosen by the current vehicle.
In Algorithm 1, the first two steps aim to define the state and action space. Step 3 initializes the Q-table and sets each of its values to 0. Since there is no “terminating state”, we set the number of episodes to be 80,000 in this paper. During each episode, the vehicle executes steps 4 to 13 to update the Q-table. The algorithm employs a combination of an optimal policy and a random action (with a probability of 0.1) to search for better policies, which is the exploration strategy in RL [13]. The discount factor γ and the learning rate α are set to be 0.9 and 0.01, respectively. After 80,000 iterations, the difference between the new and old Q-table is negligible, indicating that the algorithm has converged. The final Q-table will be saved into a file for use in Stage 2. As noted in [37], the complexity of Algorithm 1 is O ( n ) in the general case with duplicate actions.
Algorithm 2 Policy Application of QBACC in OMNeT++
  1:
c u r C B R = Obtain current CBR
  2:
c u r V D = Obtain current vehicle density
  3:
if  c u r V D > m a x V D  then
  4:
     c u r V D = m a x V D
  5:
end if
  6:
m a x V a l = 999
  7:
i n d e x = 9 (default value to ensure there is always a valid output)
  8:
for  i = 0 to 9 do
  9:
    if  e s t C B R ( c u r V D , i + 1 ) using Equation (7) c u r C B R  then
10:
         i n d e x = i
11:
        break
12:
    end if
13:
end for
14:
for  i = 0 to 9 do
15:
     q V a l = Obtain the entry at index i of the row in the Q-table corresponding to c u r V D and i n d e x
16:
    if  q V a l > m a x V a l  then
17:
         m a x V a l = q V a l
18:
         b e s t B e a c o n R a t e = i + 1
19:
    end if
20:
end for
21:
b e s t B e a c o n I n t e r v a l = 1 / b e s t B e a c o n R a t e
22:
Send beacon using b e s t B e a c o n I n t e r v a l
Once an optimal policy has been determined in Stage 1, a vehicle can apply this policy to select the BSM transmission rate in Stage 2 (Algorithm 2). In steps 1 and 2, a vehicle detects its environment in terms of CBR and vehicle density. If the measured vehicle density is found to be higher than m a x V D , it is set to m a x V D (steps 3 and 4). In steps 6 and 7, the values of m a x V a l and i n d e x are initialized. m a x V a l is set to a very small number to ensure that the algorithm will come across a higher value in the Q-table. From steps 8 to 13, the vehicle considers each of the possible average transmission rates from the surrounding vehicles and uses these values with the vehicle density as inputs for Equation (7) to determine which inputs give the smallest estimated CBR value that is greater than or equal to the current CBR. From steps 14 to 20, the vehicle selects the best beacon transmission rate, based on the Q-table given as input. Each row in the Q-table represents a state of the environment with vehicle density, the estimated average transmission rate of surrounding vehicles, and the corresponding estimated CBR, and contains the Q-table values for each possible transmission rate that the vehicle can use. The transmission rate with the maximum value in a row is the best action, i.e., the optimal policy for the corresponding state. An example with selected rows of the Q-table (rather than the entire table) is shown in Table 2; the complete table could not be included due to space limitations.
Table 2. Example Items of a Q-Table.
In steps 14–20, the vehicle checks each possible transmission rate for its current state and selects the one with the highest corresponding value in the Q-table. In step 15, it accesses the row in the Q-table corresponding to the state, and in steps 16–19, it compares it with the maximum Q-table value encountered so far, updating the maximum value if the new value is higher. After it has checked each value in the row, it selects the beacon rate with the highest value. Once this is selected, the beacon interval is calculated, which determines the time before the next BSM is sent and is equal to the reciprocal of the beacon rate. For example, if the current vehicle density is 15, with vehicles sending 1 BSM/s, and the estimated CBR is 0.1816, then using Table 2, the best transmission rate is 3 BSMs/s, with q V a l = 86 , and the corresponding beacon interval is 0.333 s. Since the iteration number is fixed in Algorithm 2, the time complexity will be just O ( 1 ) which will be very quick for the vehicle to select the best action. Figure 3 lists the flow chart of Algorithm 2.
Figure 3. Flow Chart of Policy Application of QBACC.

4. Evaluation

In this section, we evaluate the performance of the QBACC approach using a framework of Vehicles in Network Simulation (Veins) [38], which contains a basic implementation of the IEEE 802.11p and IEEE 1609 protocols in order to facilitate the testing of V2V networks. Veins connect a widely used network simulation tool, an objective modular network testbed in C++ (OMNeT++) [39], and the traffic mobility simulator simulation of urban mobility (SUMO) [40]. To evaluate the Q-learning performance, we simulated a 20 km highway with four lanes (two in each direction). We only considered data from a 4 km stretch in the middle of the highway to eliminate inaccuracies caused by vehicles entering or exiting the simulation. The simulation involved either 300 or 500 vehicles, with random velocities ranging from 80 to 130 km/h, to produce a dynamic traffic flow with varying vehicle densities. The two vehicle numbers represent low- and high-density traffic scenarios. The simulation time was 1000 s, but we only analyzed data from 350 to 750 s when vehicles were present within the 4 km stretch.
The configuration parameters for the evaluation are summarized in Table 3.
Table 3. Configuration Parameters.
The performance of our proposed QBACC congestion control approach is evaluated by comparing it with other techniques using the dynamic traffic model described above. For all approaches, a fixed transmission power of 20 mW is utilized.
  • 10 Hz: BSMs are transmitted at a constant rate of 10 BSM/s.
  • 5 Hz: BSMs are transmitted at a constant rate of 5 BSM/s.
  • MDPRP: An RL-based congestion control algorithm proposed in [28].

4.1. Comparison of CBR

The CBR is defined as the ratio between the time the channel is detected as busy and the total observation time. CBR is a useful indicator of the channel load, with higher values indicating higher channel load and vice versa. To monitor the change in CBR throughout the simulation process, each vehicle calculates the current CBR based on the status of the channel before sending each BSM. Figure 4 illustrates the individual CBR values observed by each vehicle over the simulation interval for the 10 Hz 20 mW scenario. This is referred to as “Real Time CBR”, with the X-axis in seconds. It can be difficult to determine overall variations in the channel CBR from this plot. Hence, we introduce a new metric, “average CBR”, which calculates the average CBR from all vehicles in 5-s intervals until the end of the simulation time.
Figure 4. Real Time CBR.
Figure 5 and Figure 6 compare the average CBR of the four approaches with 300 and 500 vehicles, respectively. In both cases, the 10 Hz transmission rate consistently results in the highest average CBR because all vehicles use it to send BSMs. The average CBR of 5 Hz is lower than 10 Hz. In both traffic models, QBACC consistently demonstrates stability with all average CBR values remaining below 0.6, which was defined in the reward function. In contrast, MDPRP has a higher average CBR in both traffic models.
Figure 5. Average CBR with 300 Vehicles.
Figure 6. Average CBR with 500 vehicles.

4.2. Comparison of PDR

In addition to average CBR, we also evaluated other metrics, such as the total number of packets sent, received, and lost during the simulation time. Figure 7 and Figure 8 compare the total number of packets sent by each approach for the low and high vehicle density scenarios, respectively. The PDR represents the percentage of sent packets that were successfully received by a vehicle and is shown in Figure 9 and Figure 10. The proposed QBACC approach has the highest PDR value, at 98.7%, for high vehicle densities, and achieves the second-best results for low vehicle densities, with a PDR of 98.2%, just slightly lower than the PDR of 98.8% obtained using 5 Hz.
Figure 7. Total sent packets with 300 vehicles.
Figure 8. Total sent packets with 500 vehicles.
Figure 9. Packet delivery rate with 300 vehicles.
Figure 10. Packet delivery rate with 500 vehicles.
The number of lost packets, representing the packets that were not received by any vehicles during the simulation, is shown in Figure 11 and Figure 12. The BER, which calculates the percentage of sent packets that were lost, is shown in Figure 13 and Figure 14. QBACC consistently had low BER values, with the lowest value for 500 vehicles and the second-lowest value for 300 vehicles. The MDPRP BER values were slightly higher than QBACC for both cases; moreover, 5 Hz had slightly lower BER for 300 vehicles, but significantly higher BER for 500 vehicles. A comparison table of total sent packets, total lost packets, BER, and PDR is listed in Table 4.
Figure 11. Total lost packets with 300 vehicles.
Figure 12. Total lost packets with 500 vehicles.
Figure 13. Beacon error rate with 300 vehicles.
Figure 14. Beacon error rate with 500 vehicles.
Table 4. Metrics Comparison.
In summary, QBACC has been shown to effectively maintain the channel load below the set threshold of 0.6, as defined in the reward function. This is critical in ensuring the efficient flow of data in high-density vehicular environments. By keeping the channel load below the threshold, QBACC reduces the likelihood of packet losses, resulting in improved network performance, as seen in its BER and Packet PDR performance.

5. Conclusions

The reliable delivery of safety messages in V2V communication requires the maintenance of channel congestion below a critical level. At the same time, an increased transmission rate is necessary for higher awareness. To address this challenge, we present an innovative RL framework that employs Q-learning to determine the most suitable transmission rate policy for BSMs. The aim of our proposed QBACC approach is to strike a balance between maintaining the channel congestion bit rate (CBR) below a specified threshold and maximizing awareness under different traffic conditions. To evaluate QBACC, we used two dynamic traffic models and compared them with existing approaches that employ constant transmission rates and another RL-based approach. The results showed that QBACC outperformed the other approaches by consistently maintaining the channel load at or near the specified level, without exceeding it, for both low and high traffic densities. It also achieved the best performance in terms of packet delivery and beacon error rates for high vehicle densities. For low vehicle density, the constant 5 Hz rate performed the best, but QBACC was a close second, with less than a 1% difference.
We are currently working on enhancing QBACC by designing a comprehensive reward function that takes into account additional metrics, such as inter-packet delay and packet loss. We are also exploring the possibility of adjusting multiple parameters, such as transmission power and data rate, in addition to the transmission rate, to create a more robust Q-table.

Author Contributions

Paper design, X.L. and A.J.; methodology, X.L. and B.S.A.; writing—original draft editing, X.L.; writing—review and editing, A.J., B.S.A. and X.L.; simulation design, X.L.; simulation implementation, B.S.A.; data collection and visualization, X.L. and B.S.A.; data analysis, X.L., B.S.A. and A.J.; Supervision, A.J. funding acquisition, A.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC DG, grant no. RGPIN-2015-05641.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The work of A.J. has been supported by a research grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Evans, L. Traffic fatality reductions: United States compared with 25 other countries. Am. J. Public Health 2014, 104, 1501–1507. [Google Scholar] [CrossRef]
  2. National Highway Traffic Safety Administration Early Estimate of Motor Vehicle Traffic Fatalities in 2020; United States Department of Transportation: Washington, DC, USA, 2020.
  3. Toh, C.K. Ad Hoc Mobile Wireless Networks: Protocols and Systems; Pearson Education: London, UK, 2001. [Google Scholar]
  4. Dimitrakopoulos, G.; Demestichas, P. Intelligent transportation systems. IEEE Veh. Technol. Mag. 2010, 5, 77–84. [Google Scholar] [CrossRef]
  5. Society of Automotive Engineers. SAE J2735: Dedicated Short Range Communications (DSRC) Message Set Dictionary; Technical Report; Society of Automotive Engineers: Warrendale, PA, USA, 2009. [Google Scholar]
  6. ETSI (2013) ETSI EN 302 637-2 (V1.3.0)—Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 2: Specification of Cooperative Awareness Basic Service. Technical Report. 2013. Available online: https://www.etsi.org/deliver/etsi_en/302600_302699/30263702/01.03.00_20/en_30263702v010300a.pdf (accessed on 5 January 2023).
  7. Kenney, J.B. Dedicated short-range communications (DSRC) standards in the United States. Proc. IEEE 2011, 99, 1162–1182. [Google Scholar] [CrossRef]
  8. Eggerton, J. FCC to Split Up 5.9 GHZ. 2019. Available online: https://www.nexttv.com/news/fcc-to-split-up-5-9-ghz (accessed on 3 February 2022).
  9. Bilstrup, K.S.; Uhlemann, E.; Strom, E.G. Scalability issues of the MAC methods STDMA and CSMA of IEEE 802.11 p when used in VANETs. In Proceedings of the 2010 IEEE International Conference on Communications Workshops, Cape Town, South Africa, 23–27 May 2010; pp. 1–5. [Google Scholar]
  10. Liu, X.; St Amour, B.; Jaekel, A. Balancing Awareness and Congestion in Vehicular Networks Using Variable Transmission Power. Electronics 2021, 10, 1902. [Google Scholar] [CrossRef]
  11. Egea-Lopez, E.; Pavon-Mari no, P. Fair congestion control in vehicular networks with beaconing rate adaptation at multiple transmit powers. IEEE Trans. Veh. Technol. 2016, 65, 3888–3903. [Google Scholar] [CrossRef]
  12. Garcia, F.; Rachelson, E. Markov decision processes. Markov Decis. Process. Artif. Intell. 2013, 2, 1–38. [Google Scholar]
  13. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An introduction; MIT Press: Denver, CO, USA, 2018. [Google Scholar]
  14. Goyal, A.K.; Agarwal, G.; Tripathi, A.K.; Sharma, G. Systematic Study of VANET: Applications, Challenges, Threats, Attacks, Schemes and Issues in Research. Green Comput. Netw. Secur. 2022, 33–52. [Google Scholar]
  15. Bansal, G.; Kenney, J.B.; Rohrs, C.E. LIMERIC: A linear adaptive message rate algorithm for DSRC congestion control. IEEE Trans. Veh. Technol. 2013, 62, 4182–4197. [Google Scholar] [CrossRef]
  16. Bansal, G.; Lu, H.; Kenney, J.B.; Poellabauer, C. EMBARC: Error model based adaptive rate control for vehicle-to-vehicle communications. In Proceedings of the Tenth ACM International Workshop on Vehicular Inter-Networking, Systems, and Applications, 2013, Taipei, Taiwan, 25 June 2013; pp. 41–50. [Google Scholar]
  17. Ogura, K.; Katto, J.; Takai, M. BRAEVE: Stable and adaptive BSM rate control over IEEE802. 11p vehicular networks. In Proceedings of the 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 11–14 January 2013; pp. 745–748. [Google Scholar]
  18. Subramaniam, M.; Rambabu, C.; Chandrasekaran, G.; Kumar, N.S. A Traffic Density-Based Congestion Control Method for VANETs. Wirel. Commun. Mob. Comput. 2022, 2022, 7551535. [Google Scholar] [CrossRef]
  19. Sharma, S.; Panjeta, M. Optimization transmit rate-based decentralized congestion control scheme in vehicular ad hoc networks. AIP Conf. Proc. 2022, 2555, 030006. [Google Scholar]
  20. Aznar-Poveda, J.; García-Sánchez, A.J.; Egea-López, E.; García-Haro, J. Approximate reinforcement learning to control beaconing congestion in distributed networks. Sci. Rep. 2022, 12, 1–11. [Google Scholar]
  21. Torrent-Moreno, M.; Santi, P.; Hartenstein, H. Distributed fair transmit power adjustment for vehicular ad hoc networks. In Proceedings of the 2006 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks, Reston, VA, USA, 28 September 2006; Volume 2, pp. 479–488. [Google Scholar]
  22. Torrent-Moreno, M.; Santi, P.; Hartenstein, H. Fair sharing of bandwidth in VANETs. In Proceedings of the 2nd ACM International Workshop on Vehicular Ad Hoc Networks, Cologne, Germany, 2 September 2005; pp. 49–58. [Google Scholar]
  23. Wang, M.; Chen, T.; Du, F.; Wang, J.; Yin, G.; Zhang, Y. Research on adaptive beacon message transmission power in VANETs. J. Ambient. Intell. Humaniz. Comput. 2020, 13, 1307–1319. [Google Scholar] [CrossRef]
  24. Jiang, D.; Chen, Q.; Delgrossi, L. Optimal data rate selection for vehicle safety communications. In Proceedings of the fifth ACM international workshop on VehiculAr Inter-NETworking, 2008, San Francisco, CA, USA, 15 September 2008; pp. 30–38. [Google Scholar]
  25. Yang, S.; Kim, H.; Kuk, S. Less is more: Need to simplify ETSI distributed congestion control algorithm. Electron. Lett. 2014, 50, 279–281. [Google Scholar] [CrossRef]
  26. Jayachandran, S.; Jaekel, A. Adaptive Data Rate Based Congestion Control in Vehicular Ad Hoc Networks (VANET). In Proceedings of the Ad Hoc Networks and Tools for IT: 13th EAI International Conference, ADHOCNETS 2021, Virtual Event, 6–7 December 2021; pp. 144–157. [Google Scholar]
  27. Sepulcre, M.; Gozalvez, J.; Härri, J.; Hartenstein, H. Contextual Communications Congestion Control for Cooperative Vehicular Networks. IEEE Trans. Wirel. Commun. 2011, 10, 385–389. [Google Scholar] [CrossRef]
  28. Aznar-Poveda, J.; Garcia-Sanchez, A.J.; Egea-Lopez, E.; Garcia-Haro, J. Mdprp: A q-learning approach for the joint control of beaconing rate and transmission power in vanets. IEEE Access 2021, 9, 10166–10178. [Google Scholar] [CrossRef]
  29. Deeksha, M.; Patil, A.; Kulkarni, M.; Shet, N.S.V.; Muthuchidambaranathan, P. Multistate active combined power and message/data rate adaptive decentralized congestion control mechanisms for vehicular ad hoc networks. J. Phys. Conf. Ser. 2022, 2161, 012018. [Google Scholar] [CrossRef]
  30. Cho, B.M.; Jang, M.S.; Park, K.J. Channel-aware congestion control in vehicular cyber-physical systems. IEEE Access 2020, 8, 73193–73203. [Google Scholar] [CrossRef]
  31. Mittag, J.; Schmidt-Eisenlohr, F.; Killat, M.; Härri, J.; Hartenstein, H. Analysis and design of effective and low-overhead transmission power control for VANETs. In Proceedings of the fifth ACM international workshop on VehiculAr Inter-NETworking, San Francisco, CA, USA, 15 September 2008; pp. 39–48. [Google Scholar]
  32. Joseph, M.; Liu, X.; Jaekel, A. An adaptive power level control algorithm for DSRC congestion control. In Proceedings of the 8th ACM Symposium on Design and Analysis of Intelligent Vehicular Networks and Applications, Montreal, QC, Canada, 28 October–2 November 2018; pp. 57–62. [Google Scholar]
  33. Kumar, S.; Kim, H. BH-MAC: An efficient hybrid MAC protocol for vehicular communication. In Proceedings of the 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 7–11 January 2020; pp. 362–367. [Google Scholar]
  34. Taherkhani, N.; Pierre, S. Centralized and localized data congestion control strategy for vehicular ad hoc networks using a machine learning clustering algorithm. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3275–3285. [Google Scholar] [CrossRef]
  35. Jiang, H.; Li, Q.; Jiang, Y.; Shen, G.; Sinnott, R.; Tian, C.; Xu, M. When machine learning meets congestion control: A survey and comparison. Comput. Netw. 2021, 192, 108033. [Google Scholar] [CrossRef]
  36. Liu, X.; Amour, B.S.; Jaekel, A. A Q-learning based adaptive congestion control for V2V communication in VANET. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2022; pp. 847–852. [Google Scholar]
  37. Koenig, S.; Simmons, R.G. Complexity analysis of real-time reinforcement learning. AAAI 1993, 93, 99–105. [Google Scholar]
  38. Sommer, C.; German, R.; Dressler, F. Bidirectionally coupled network and road traffic simulation for improved IVC analysis. IEEE Trans. Mob. Comput. 2010, 10, 3–15. [Google Scholar] [CrossRef]
  39. Varga, A.; Hornig, R. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, Marseille, France, 3–7 March 2008; pp. 1–10. [Google Scholar]
  40. Behrisch, M.; Bieker, L.; Erdmann, J.; Krajzewicz, D. SUMO–simulation of urban mobility: An overview. In Proceedings of the SIMUL 2011, the Third International Conference on Advances in System Simulation, Barcelona, Spain, 23–29 October 2011. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.