SROR: A Secure and Reliable Opportunistic Routing for VANETs

Xu, Huibin; Wang, Ying

doi:10.3390/vehicles6040084

Open AccessArticle

SROR: A Secure and Reliable Opportunistic Routing for VANETs

by

Huibin Xu

^1,*

and

Ying Wang

²

¹

School of Information Engineering, Huzhou University, Huzhou 313000, China

²

School of Electronic Information Engineering, JiuJiang University, Jiujiang 332005, China

^*

Author to whom correspondence should be addressed.

Vehicles 2024, 6(4), 1730-1751; https://doi.org/10.3390/vehicles6040084

Submission received: 5 September 2024 / Revised: 26 September 2024 / Accepted: 28 September 2024 / Published: 30 September 2024

Download

Browse Figures

Versions Notes

Abstract

In Vehicular Ad Hoc Networks (VANETs), high mobility of vehicles issues a huge challenge to the reliability and security of transmitting packets. Therefore, a Secure and Reliable Opportunistic Routing (SROR) is proposed in this paper. During construction of Candidate Forwarding Nodes (CFNs) set, the relative velocity, connectivity probability, and packet forwarding ratio are taken into consideration. The aim of SROR is to maximally improve the packet delivery ratio as well as reduce the end-to-end delay. The selection of a relay node from CFNs is formalized as a Markov Decision Process (MDP) optimization. The SROR algorithm extracts useful knowledge from historical behavior of nodes by interacting with the environment. This useful knowledge are utilized to select the relay node as well as to prevent the malicious nodes from forwarding packets. In addition, the influence of different learning rate and exploratory factor policy on rewards of agents are analyzed. The experimental results show that the performance of SROR outperforms the benchmarks in terms of the packet delivery ratio, end-to-end delay, and attack success ratio. As vehicle density ranges from 10 to 50 and percentage of malicious vehicles is fixed at 10%, the average of packet delivery ratio, end-to-end delay, and attack success ratio are 0.82, 0.26s, and 0.37, respectively, outperforming benchmark protocols.

Keywords:

vehicular ad hoc networks; opportunistic routing; deep reinforcement learning; black hole attack; gray hole attack

1. Introduction

Vehicular Ad-Hoc Networks (VANETs) have drawn wide concerns since it is an effective solution to ease traffic congestion and ensure road safety. As a few form of Internet of Things (IoT) [1,2], VANETs is a promising technology designed to provide mature solutions for Intelligent Transport Systems (ITS). As an important component of ITS, VANETs have emerged as a promising research topic in recent years to improve traffic management efficiency, road safety, and provide entertainment service. It generally comprises an On-Board Unit (OBU) and a Road-Side Unit (RSU) used to support Vehicle-to-Vehicle (V2V) and Vehicle-to-infrastructure (V2I) communications. So that, VANETs can support safety applications that can be used to assist drivers to avoid collisions.

In V2V, vehicles communicate with other vehicles in order to exchange traffic-related information by Dedicated Short Range Communication (DSRC) standard. RSUs are the most common infrastructure in VANETs, which are deployed along the roadside. Each RSU is in charge of providing services for nearby vehicles. Therefore, based on V2I communication, these vehicles can access network quickly to download traffic data and obtain news. The typical structure of VANETs is shown in Figure 1.

VANETs have some challenging issues that distinguish it from Mobile Ad Hoc Networks (MANETs) [3]. The high mobility characteristic of vehicles [4] leads to dynamic topology changes and frequent network disconnections. Thus, communication links in VANETs may have frequent outages. It is a huge challenge to construct a reliable path. Additionally, open communication environments in VANETs enhance the possibility of cyber attacks including but not limited to distributed denial-of-service [5], jamming, and spoofing. Among many cyber attacks, the packet drop attack is the most representative. The loss of packets endangers the transmission of information and even life safety of pedestrians and drivers. Therefore, to achieve a high packet delivery ratio, designing an attack detection method is essential for VANETs [6].

Constructing a reliable and secure routing is a huge challenge because of the inherent mobility of vehicles and packet drop attacks. The mobility characteristic results in recurrent topology deviations and intermittent connection. To this end, it is hard to construct a stable entire path between a source node and its destination. Reluctantly, some relay nodes are continually chose on the basis of current network environment. Finally, the packet is delivered hop by hop toward its destination with the help of some relay nodes.

Due to high mobility, how to choose a relay node to successfully forward packet is a challenging issue [7]. The employed metric and strategy for selecting a relay node may affect the performance of transmitting packets. The former is applied to evaluate the link from a sender to a received node, and the latter refers to whether a relay node is chose from the candidate set opportunistically or deterministically.

As dynamic changes of topology, opportunistic routing is more appropriate for VANETs compared with deterministic routing. In opportunistic routing, a source node transmits packet toward a set of neighbor nodes instead of just one neighbor node. This transmission mechanism aids in improvement of the success rate of transmitting packet. One of nodes in this set is selected to be a relay node, and it will be responsible for forwarding the packet.

Li et al. [8] proposed an adaptive routing protocol based on ant colony optimization. In the routing protocol, the routing problem is formulated as a constrained optimization problem. Then, an ant colony optimization algorithm is employed to solve the problem. However, the computation time of ant colony optimization increases with the number of nodes. Besides, some important network metrics like relative velocity and connectivity are not considered in route discovery. In [9], an efficient multi-metric routing algorithm was proposed. The routing algorithm considered the connectivity and relative velocity, but it did not consider the attack issues.

Significantly, V2V communication is vulnerable to many hostile attacks due to open communication environments. As the typical attacks, Black Hole Attack (BHA) and Gray Hole Attack (GHA) disrupt the fluency of transmitting packet by dropping packet deliberately. This behavior undoubtedly increase the packet loss ratio. Therefore, to achieve a high packet delivery ratio, a method of detection these attacks need to be employed and these attackers will be prevented from routing packets.

Recently, Reinforcement Learning (RL) [10,11] is introduced to solve critical problems in many applications, and attack detection in VANETs is a typical application. In VANETs, some attacks can be detected by RL algorithm due to the good ability to learn dynamic environments. For example, Sherazi et al. [12] proposed RL-based Distributed Denial of Service (DDoS) attack detection algorithm. The detection of a DDoS attack is performed by Q learning.

Motivated by these issues, a Secure and Reliable Opportunistic Routing (SROR) algorithm is proposed in this paper. In the SROR algorithm, three important network metrics, namely relative velocity, connectivity probability, and packet forwarding ratio, are taken into account, for several reasons. Firstly, in vehicular networks, relative distance depends on the relative velocity among vehicles, while the transmission delay depends on the relative distance. Therefore, considering the relative velocity may aid in reducing the delay. Secondly, the stability of routing packet is influenced by the connectivity probability of the link. So, the delay and packet delivery ratio are reduced by considering the connectivity probability. Finally, to prevent some malicious nodes from forwarding packets, the packet forwarding ratio is utilized to select a relay node. This aids in avoiding the BHA and GHA in forwarding packets.

To this end, during the selection of relay nodes, SROR algorithm takes the above-mentioned metrics into consideration. Specifically, by jointly considering relative velocity, connectivity probability, and packet forwarding ratio, the Candidate Forwarding Nodes (CFNs) are selected. Furthermore, the selection of relay node is modeled as a Markov Decision Process (MDP) [13] and the solution of the MDP is obtained by the Deep Reinforcement Learning (DRL). In addition, to achieve a better convergence, the dynamic learning rate and exploratory factor policy are employed.

The main contributions of our work can be summarized as follows.

(1): We discuss the secure and reliable opportunistic routing protocol. To achieve a shorter transmission delay as well as a higher packet delivery ratio, a CFNs set is constructed based on relative velocity, connectivity probability, and packet forwarding ratio metrics.
(2): The selection of a relay node from CFNs set is expressed as an MDP. Then, the solution of the MDP is obtained with the help of DRL. In addition, the dynamic learning rate and exploratory factor are adopted and sensitivity of DRL to learning rate and exploratory factor are analyzed.
(3): An extensive performance evaluation is conducted. Compared with benchmarks, the effectiveness of SROR algorithm is evaluated by simulation.

The remainder of this paper is organized as follows. The related work are illustrated in Section 2. Then, we formulate the network model and some important metrics in Section 3. We illustrate the opportunistic routing and problem in Section 4, and the selection of relay node based on DRL is illustrated in Section 5. Section 6 presents numerical results. Finally, a conclusion is presented in Section 7.

2. Related Work

To date, many researchers have made much effort to optimize the routing strategy and apply reinforcement learning. We mainly review related work of routing protocols in VANETs.

2.1. Routing Protocol for VANETs

Different from the Mobile Ad Hoc Networks (MANETs), VANETs require a more stable path to deliver the packet. A wealth of work has been performed on routing for VANETs. For the unicast routing, existing routing protocols can be roughly divided into four categories, namely location-based, topology-based, cluster-based, and opportunity-based protocol.

In the location-based routing, the location of nodes are taken into consideration in route discovery. Greedy Perimeter Stateless Routing (GPSR) is a representative of position-based routing. According to the idea of GPSR, a source node makes greedy forwarding decision based on location data of neighbors. Wang et al. [14] proposed a location privacy-based secure routing protocol. However, this routing protocol only focuses on the privacy-preserving location. A velocity and position-based message transmission scheme is proposed in [15]. The velocity and position information are used to alleviate rebroadcast message collision. Additionally, in [16], authors discussed the position-based routing, and proposed FoG-oriented VANETs structure in support of location-based routing.

In the topology-based routing protocols, each vehicle needs to establish a routing tables based on the topology information. Kadadha et al. [17] proposed a Stackelberg-game-model-based routing protocol. The relay nodes are selected by street topology. Wang et al. [18] proposed a network connectivity-based low-latency routing protocol. They took the most representative features of vehicles into consideration during the routing discovery. A novel geographic routing is proposed in [19]. In the routing protocol, some feasible paths to the destination are selected according to the network topology.

Based on the idea of cluster-based routing, the nodes in the network are divided into different clusters by a certain principle [20,21]. In [22], a chain-branch-leaf cluster-based routing protocol is proposed and analyzed. In addition, for the sake of improving performance, vehicles with similar trajectory features are grouped in [23]. In [24], a connectivity prediction-based dynamic clustering model is proposed, and the proposed model is used to realize stable communications.

The opportunistic routing generally combines with the geographic routing and forms a Geographical Opportunistic Routing (GOR) protocol. In the GOR protocol, a source node trends to choose a relay node with closer to the destination. Salkuyeh et al. [25] proposed an adaptive geographic routing to transmit video data. In the routing protocol, a number of independent path is discovered. However, multiple paths result in high complexity. In [26], authors proposed a trajectory-based opportunistic routing. The Global Position System (GPS) information of vehicles is used to transmit data, and a relay node is selected based on the proximity to the trajectory.

2.2. Secure Routing for VANETs

The V2V communication is vulnerable to some attacks due to open communication environments in VANETs. While routing is one of the targets of attackers. A major line of work [27,28,29,30] concentrated on the security of router data. Eiza et al. [28] proposed an ant colony optimization-based secure routing protocol. Some feasible routes are constructed by the ant colony optimization technique. Lyu et al. [29] put forward a geographic position-based secure routing. They employed a trust model to resist black hole attack in order to choose the secure routing path. In [27], the authors put forward an intelligent cluster-based routing protocol. In the routing protocol, an artificial neural network (ANN) model was used to detect the malicious nodes.

Trust-based solutions [31,32] are common methods to deal with the routing security issue. Xia et al. [30] first studied the trust properties and put forward a novel trust inference model. Shen et al. [33] also employed a trust model. In the trust model, the cloud is in charge of evaluating the trustworthiness of each individual. The evaluation results are used to choose the reliable relay node. Nevertheless, evaluation of trustworthiness of vehicles is very complete due to dynamic topology, and the trustworthiness of vehicles depend on many factors. Hence, it is still a challenge to objectively evaluate the behavior of vehicles by trustworthiness alone. In addition, in [34], the authors proposed an RSU-assisted trust-based routing for VANETs. The routing provides a more reliable monitoring process for trust management in order to increase resistance to trust-based attacks.

2.3. Routing Protocol Based on RL

To cope with the dynamic environment of VANETs, RL [35] has been widely explored in route discovery and attack detection. A decentralized routing protocol based on a multi-agent RL is put forward in [36]. In the protocol, the problem of how vehicles learn from the environment is modeled as a multi-agent problem. A vehicle is considered as an agent, and it autonomously establishes the transmission packet path with a neighbors. In addition, a geographic routing based on Q-learning algorithm is put forward in [37]. Each vehicle holds a Q-table, and an optimal relay node is selected by querying the Q-table. Luo et al. [38] suggested a V2X routing protocol based on intersection. They employed a multidimensional Q-table, and the Q-table is used for selection of the most appropriate road segment to transmit the packet.

Unfortunately, Q-learning needs to maintain a Q-table. As the number of nodes increases, the Q-table is larger. To overcome this issue, a DRL-based collaborative routing protocol is put forward in [39]. The problem of minimizing the delay is formulated as an MDP, and the solution of the MDP is solved by Deep Q-network (DQN) [40]. Similarly, in [41], a deep reinforcement learning is used to tackle with the network dynamics in VANETs.

In addition, RL has been widely used to detect attacks to reduce the packet loss ratio in VANETs. In [42], an adaptive multi-agent RL algorithm is designed to detect packet drop attacks. In [43], the malicious vehicles activity are detected by Q-learning to prevent malicious vehicles from forwarding packets. Besides, in [44], an intelligent detection algorithm is put forward. For sake of realizing the detection of black hole attack, four network parameters are taken into consideration. In [45], the IoT security issue is discussed and put forward a protection scheme using effective decision-making strategy of appropriate features.

2.4. Problems That Need to Be Solved

Although the above-mentioned work achieve high performance in transmitting packets, there is still much room for performance improvement.

Firstly, selecting the relay node in VANETs is a challenge task due to dynamic topology. Deep reinforcement learning algorithm has strong learning ability and can quickly capture the knowledge of dynamic environment. Therefore, the proposed SROR algorithm adopts a deep reinforcement learning to select the relay node.

Secondly, V2V communication is vulnerable to packet drop attacks due to open communication environments in VANETs. However, most of existing routing protocols have not taken the packet drop attacks into account. While these attacks reduce the success rate of transmitting packets, resulting in a large number of packet losses. For this purpose, the proposed SROR algorithm takes the packet delivery ratio into account when selecting the relay node. The facts behind this strategy are that packet drop attacks result in a low packet delivery ratio. To this end, the node is prevented from participating in routing if a node has a low packet delivery ratio. Thus, the attackers are not considered to be a relay node, and this is helpful to ensure the security of VANETs.

Finally, although some existing work has also applied DRL to the routing protocols, the sensitivity of DRL to learning rate and exploratory factor are not analyzed. Conversely, the proposed SROR algorithm fully analyzes the sensitivity.

Therefore, the proposed SROR algorithm addresses the performance in transmitting packet as well as defence against packet drop attacks. In addition, a DRL is employed in SROR algorithm to obtain the optimal route path as well as to prevent attackers from forwarding packets.

3. System Model and Routing-Related Metrics

The network model is presented in this section. Then, the relative velocity, packet forwarding ratio, and connectivity probability are defined. Besides, some important notations are demonstrated in Table 1. Note that “–” represents a dimensionless quantity.

3.1. Network Model

Figure 2 depicts the considered network model, where there are N vehicles and they are denoted by the set

N = \{ϑ_{1}, ϑ_{2}, \dots, ϑ_{N}\}

. By means of an On-Board Unit (OBU) that is installed in each vehicle, each vehicle can communicate with adjacent vehicles based on the IEEE 802.11p protocol. We assume that only a small fraction of vehicles are malicious and that they may launch the BHA or GHA.

Each vehicle can be a source node, relay node, or destination. If the destination is out of the transmission range of the source node, the source node needs the help of relay node to send packet to the destination. So, multi-hop routing requires multiple relay nodes to participate in. If some relay nodes are malicious, they may launch BHA or GHA, namely deliberately drop packets, resulting in the increase of packet loss ratio, as shown in Figure 1. For example, assume that the

ϑ_{1}

is a source node, and it needs to transmit a packet toward the relay node

ϑ_{2}

. The arrow represents the direction of transmitting packet. After receiving the packet from

ϑ_{2}

, the attacker

ϑ_{3}

should forward the packet, but instead of transmitting the packet, it drops the packet. The dotted lines without arrow indicate that the packet does not forward in Figure 2.

Without loss of generality, assume that all vehicles have the same transmission range, which is denoted by

ϱ

. Furthermore,

ϱ

is commonly much larger than the length of a vehicle, similar to [46], and thus, the length of any vehicle is not taken into consideration. The inter-vehicle distances are independent identically distributed and it is exponentially distributed random variables with mean

λ^{- 1}

.

Additionally, if the distance between two vehicles is not more than

ϱ

, they can form a link. Let

l_{i, j}

represent a link between vehicle pairs

(ϑ_{i}, ϑ_{j})

, where

ϑ_{i}

represents the sender and

ϑ_{j}

is the receiver. Moreover, each vehicle needs to maintain a neighbor set. Let

M_{i}

denote the neighbor set of vehicle

ϑ_{i}

. The distance between vehicle

ϑ_{i}

and each vehicle in set

M_{i}

is less than

ϱ

. By receiving the periodical beacon packet, the set

M_{i}

can be updated.

Delay is a measure of the efficiency of transmitting packet. The total delay is the times that a packet is transmitted from a sender to a receiver, which included propagation delay as well as transmission delay. Let

T_{i, j}^{tr}

and

T_{i, j}^{pr}

represent the transmission delay and propagation delay, respectively. They can be given as [47]:

T_{i, j}^{tr} = L^{da} / R_{i, j}, T_{i, j}^{pr} = D_{i, j} / c,

(1)

where

L^{da}

represents the length of transmitted packet,

R_{i, j}

represents the transmission rate of link

l_{i, j}

,

D_{i, j}

represents the distance from

ϑ_{i}

to

ϑ_{j}

, and c is the speed of light.

T_{i, j}^{tr}

and

T_{i, j}^{pr}

are used to calculate the end-to-end delay in Section 6.

To this end, the end-to-end delay that a packet is transmitted through link

l_{i, j}

is denoted as

T_{i, j}^{to}

, satisfying

T_{i, j}^{to} = T_{i, j}^{tr} + T_{i, j}^{pr}

. For the sender

ϑ_{i}

, it expects its packet to be forwarded quickly and reliably toward the next-hop node. Here, quickness means low delay and reliability means the success of forwarding the packet; that is, whether the receiver

ϑ_{j}

successfully received the packet within the time limit.

Additionally, the free-space path-loss model is considered as the propagation model in our work. In this model, the reception power (in dBm ) is expressed as:

P_{r} = P_{t} + 10 l o g (\frac{G_{t} G_{r} λ^{2}}{16 π^{2} d^{η}}),

(2)

where

P_{t}

is the transmission power in dBm,

G_{t}

and

G_{r}

are the antenna gains in the transmitter and receiver, respectively,

λ

is the wavelength, d is the distance between the transmitter and receiver, and

η

is an exponent.

3.2. Relative Velocity

As shown in Equation (1), the end-to-end delay depends on the distance

D_{i, j}

when the length of transmitted packet and transmission rate are given. In fact, in the same VANET system, the length of packet and transmission rate of different vehicles is a little difference. Relative distance can more accurately reflect their spatial location relationship due to the mobility of vehicles, while the relative distance depends on the relative velocity. If the mobility direction of two vehicles is the same, small relative velocity can make the distance unchanged for a long time. The unchanged distance is conducive to network connectivity, and this will improve the packet delivery ratio.

As a consequence, the relative velocity is considered an important routing metric. Furthermore, it will be taken into account when the next-hop relay node is selected. For vehicle pairs

(ϑ_{i}, ϑ_{j})

, their relative velocity is denoted by

ν_{i, j}

, which can be calculated as [9]:

ν_{i, j} = \sqrt{{(υ_{i})}^{2} + {(υ_{j})}^{2} - 2 υ_{i} υ_{j} c o s θ},

(3)

where

υ_{i}

and

υ_{i}

represent the velocity of

ϑ_{i}

and

ϑ_{j}

, respectively, and

θ

is the angle between

ϑ_{i}

and

ϑ_{j}

, which can be defined as:

θ = \{\begin{matrix} arctan \frac{Y_{i, j}}{X_{i, j}} & if X_{i, j} \geq 0, Y_{i, j} > 0; \\ arctan \frac{Y_{i, j}}{X_{i, j}} + π & if X_{i, j} < 0, Y_{i, j} \neq 0; \\ - π & if X_{i, j} < 0, Y_{i, j} = 0; \\ arctan \frac{Y_{i, j}}{X_{i, j}} + 2 π & if X_{i, j} \geq 0, Y_{i, j} < 0; \end{matrix},

(4)

where

X_{i, j} = x_{i} - x_{j}

and

Y_{i, j} = y_{i} - y_{j}

. Furthermore,

(x_{i}, y_{i})

and

(x_{j}, y_{j})

represent the coordinates of

ϑ_{i}

and

ϑ_{j}

, respectively.

3.3. Packet Forwarding Ratio

Assume that some vehicles are attackers that intentionally drop some packets, even all packets. Furthermore, they should forward these packets. The act of intentionally dropping packets result in the increase of packet loss ratio and transmission delay. Packet Forwarding Ratio (PFR) is a metric of efficiency of forwarding packet in VANETs, which is equal to the ratio of the number of packets forwarded to the number of packets that should have been forwarded [48]. As a result, the PFR of the vehicle

ϑ_{j}

is defined as:

P F R_{j} = \frac{N F (ϑ_{j})}{N S (ϑ_{i}, ϑ_{j})},

(5)

where

N F (ϑ_{j})

represents the total number of packets forwarded by

ϑ_{j}

and

N S (ϑ_{i}, ϑ_{j})

represents the number of packets that

ϑ_{j}

should have forwarded. If a vehicle’s PFR is very low, this may indicate that the vehicle has a habit of dropping packets, which may be BHA or GHA.

3.4. Connectivity Probability

During the transmission, the sender hopes the selected next-hop relay node can stably connect to its nearby nodes when the next-hop relay node is out of the transmission range of the destination. That way, the next-hop relay node can probably forward the packet to its next-hop relay node when it received the packet from the sender. If the selected next-hop relay node cannot stably connect to its nearby nodes, even if it has received the packet, it may not be able to successfully deliver the packet to the destination.

Therefore, the connectivity probability should be considered when selecting the next-hop relay node. According to [49], if vehicle intensity is less than 1000 veh/h, the inter-vehicle distance is subject to exponential function. For the vehicle pairs

(ϑ_{i}, ϑ_{j})

, the sender

ϑ_{i}

hopes the receiver

ϑ_{j}

can stably connect to its nearby vehicles. The connectivity probability of the vehicle

ϑ_{j}

can be defined as:

P_{j} = \{\begin{matrix} 1 - e^{- λ ϱ} & if ϑ_{j} \in M_{i} \\ 0 & if ϑ_{j} \notin M_{i} \end{matrix} .

(6)

3.5. Attack Model

VANETs are vulnerable to packet drop attacks due to their dynamic topology, which is also one of the most frequent attacks on VANETs. If a node is compromised by the packet drop attack, the node will selectively drop some packets that should be forwarded. The BHA and GHA are most representative in the packet drop attack. As a result, they are taken into consideration in the proposed routing algorithm.

An attacker receives a packet and needs to forward the packet toward the next-hop node, but it actually drops the packet, and this attack behavior is a BHA. In brief, the BHA drops all the packets instead of forwarding them. On the contrary, the GHA selectively drops some packets but not all packets.

Assume that a small number of vehicles are considered to be malicious vehicles. Let

σ

represent the percentage of malicious vehicles. These malicious vehicles are acting as a BHA or GHA.

4. Opportunistic Routing and Problem Formulation

We outline the opportunistic routing and then expound the problem in this section.

4.1. Overview of Opportunistic Routing

In opportunistic routing, a sender does not rely on a single relay node to forward a packet in the routing process, while the sender transmits its packet to neighbor nodes at the same time, and one of these neighbor nodes will be selected to forward the packet. By properly setting up the selection mechanism, the most suitable relay node will be selected.

In order to obtain a higher packet delivery ratio, neighbor nodes cooperate to forward packets. The cooperation means that each neighbor node can forward the packet, but the priority of forwarding the packet is different. The Node with the Highest Priority (NHP) firstly forwards the packet. Once the packet has been forwarded, other neighbor nodes give up forwarding the packet, and drop the packet immediately. Thus, each node needs to set a timer based on its priority. The higher the priority is, the shorter the timer duration is. This ensures that the NHP firstly forwards the packet.

As shown in Figure 3, vehicle

ϑ_{i}

is the sender, and it holds a packet and needs to forward the packet to its neighbor vehicles

ϑ_{1}

,

ϑ_{2}

, and

ϑ_{3}

simultaneously. Once the packet has been received, they start a timer and listen to whether the packet is forwarded by one vehicle. Assuming that the priority of

ϑ_{1}

is highest, it firstly forwards the packet. In this case,

ϑ_{2}

and

ϑ_{3}

give up forwarding the packet, and drop it at once.

In our methodology, each packet acts as an agent in DRL. The agent selects a relay node based on the environment knowledge and it will be transmitted toward the relay node. The proposed SROR algorithm makes full use of the ability of DRL to learn the environment knowledge, so that the agent can select the appropriate relay node, avoid the attack nodes to participate in the routing, and finally construct a secure and stable route.

4.2. Problem Formulation

By optimizing the selection of next-hop relay node, the Packet Delivery Ratio (PDR) can be improved while the delay is reduced. The PDR is the ratio of the number of packets received by destination to the number of packets that should have been received. The delay is the End-to-End Delay (EED).

Specifically, as Figure 4 shows,

ϑ_{0}

and

ϑ_{m}

represent a source node and destination, respectively. There is an m hops path from the

ϑ_{0}

to the

ϑ_{m}

. The EED of the path is the sum of delay of m links. Therefore, the optimization is given by:

P 1 : m i n [\sum_{i = 0}^{m - 1} T_{i, i + 1}^{t o} + \prod_{i = 0}^{m - 1} (1 - P F R_{i + 1})],

(7)

s . t . P_{i} \geq 0.63, \forall i = 1, 2, \dots, m,

0 \leq ν_{i, i + 1} \leq 200, \forall i = 0, 1, \dots, m - 1,

P F R_{i} \geq 0.4, \forall i = 1, 2, \dots, m .

where

\sum_{i = 0}^{m - 1} T_{i, i + 1}^{t o}

is the sum of the EED delay of the m hops path and

\prod_{i = 0}^{m - 1} (1 - P F R_{i + 1})

reflects the packet loss ratio of the path. For that, the smaller the EED delay and packet loss ratio is, the better the routing performance is. As indicated by the constraint

P_{i} \geq 0.63

, the node with connectivity probability greater than 0.63 is allowed to add into the CFNs set. The constraint

0 \leq ν_{i, i + 1} \leq 200

means that relative velocity of the candidate forwarding node with respect to

ϑ_{i}

should be more than 0, and less than 200. The constraint

P F R_{i} \geq 0.4

represents only one node with PFR greater than 0.4 can be selected to be a candidate forwarding node.

5. Deep Reinforcement Learning—Opportunistic Routing

We first discuss the problem of selecting the relay nodes in this section, and formulate this problem as an MDP. Next, the problem is solved by a DRL. The DRL model is shown in Table 2, the learning environment of DRL model is the VANETs, and the agent is each packet that needs to be transmitted. The state of the agent is the union of the node with the holding packet and its CFNs set, and this node is the one that currently holds the packet, which is abbreviated as Node-HP. The action of the agent is to select a relay node.

The problem P1 is a sequential decision problem. To select the most suitable next-hop relay node, the problem is modeled as an MDP. Let a tuple

〈S, A, R〉

represent the MDP, where

S

,

A

, and

R

represent the state space, action space, and immediate reward, respectively. Next, we will describe the definition of each element in detail.

5.1. State Space

At time step t, the state of the agent is denoted by

s_{t}

, while the packet that needs to be transmitted is considered to be an agent. So, the state

s_{t}

depends on the location of the agent. If the vehicle

ϑ_{i}

holds the packet, namely the agent is located in vehicle

ϑ_{i}

, the current state can be defined as:

s_{t} = \{ϑ_{i} \cup C_{i}\},

(8)

where

C_{i}

is the Candidate Forwarding Nodes (CFNs) set of vehicle

ϑ_{i}

, satisfying

C_{i} \subseteq M_{i}

. Next, we will discuss how to construct a CFNs set. According to the idea of opportunistic routing, multiple neighbor nodes of the sender have received the packet simultaneously. To raise the utilization of the resources as well as to avoid broadcast storms, only some neighbor nodes are selected to forward the packet. The selected neighbor nodes are the CFNs, which form a CFNs set.

To construct a fast and reliable routing, a CFNs set is formed by considering the three metrics: relative velocity, packet forwarding ratio, and connectivity probability. Specifically, a sender

ϑ_{i}

needs to construct a CFNs set

ϑ_{i}

. If a node

ϑ_{j} \in C_{i}

meets the following three conditions [50], the node is added to the set

C_{i}

. Condition 1:

0 \leq ν_{i, j} \leq 200

; Condition 2:

P_{j} \geq 0.63

; Condition 3:

P F R_{j} > 0.4

. To this end, the process that the

ϑ_{j}

is added to the set

C_{i}

can be expressed as:

C_{i} ⟵ C_{i} \cup \{ϑ_{j} | ν_{i, j} \in [0200] and P_{j} \geq 0.63 and P F R_{j} > 0.4\} .

(9)

5.2. Action Space

The state transition depends on the selected action. Based on the definition of the state, the action refers to which node is considered to be the next-hop relay node. So, at time step t, the action

a_{t}

is given by:

a_{t} = \{ϑ_{j} | ϑ_{j} \in C_{i}\},

(10)

where

a_{t} = ϑ_{j}

means that the vehicle

ϑ_{j}

is considered to be the next-hop relay node.

5.3. Reward Function

The reward is an important part of the RL algorithm, and the influence of an action on the transmission packet is evaluated by its reward. To this end, we comprehensively consider node’s location, relative velocity, connectivity probability, and PFR when the reward function is defined. Specifically, at time step t, the reward function is given by:

r_{t} = R_{i, j}^{j} = R_{s_{t}, s_{t + 1}}^{a_{t}} | s_{t} = ϑ_{i}, s_{t + 1} = ϑ_{j}, a_{t} = ϑ_{j},

(11)

where

R_{i, j}^{j}

is the immediate reward and

s_{t + 1}

represents the state at time step

t + 1

. If

s_{t} = ϑ_{i}

,

a_{t} = ϑ_{j}

, and

s_{t + 1} = ϑ_{j}

, which represents the case that the packet is held by

ϑ_{i}

at time step t, the

ϑ_{i}

selects

ϑ_{j}

as the next-hop relay node, and the next state is

ϑ_{j}

. Furthermore, we define the immediate reward

R_{i, j}^{j}

with four factors, i.e., relative velocity, connectivity probability, PFR, and an additional reward, which is given by:

R_{i, j}^{j} = w_{1} \frac{200 - ν_{i, j}}{200} + w_{2} P_{j} + w_{3} P F R_{j} + δ_{i, j},

(12)

where

ω_{1}

,

ω_{2}

, and

ω_{3}

are all non-negative weighted factors, and

δ_{i, j}

represents the additional reward, which can be defined as:

δ_{i, j} = \{\begin{matrix} R_{1}, & if ϑ_{j} is the destination \\ R_{2}, & if ϑ_{j} is neighbor node of destination \\ 0, & others, \end{matrix}

(13)

where

R_{1}

and

R_{2}

are a positive constant. If the selected action is

ϑ_{j}

, the

δ_{i, j}

will be a positive constant

R_{1}

when

ϑ_{j}

is the destination. If the

ϑ_{j}

is not the destination but is a neighbor node of the destination, the

δ_{i, j}

will be a positive constant

R_{2}

; obviously,

R_{1}

should be greater than

R_{2}

.

The

r_{t}

just reflects the short-term rewards. For a multi-hop path, we need to have an insight into the long-term rewards so that the impact of future states on the current state is analyzed. The long-term reward is defined as:

R_{t} = r_{t} + γ r_{t + 1} + γ r_{t + 1} + γ^{2} r_{t + 2} + \dots = \sum_{k = 0}^{\infty} γ^{k} r_{t + k},

(14)

where

γ

is a discount factor. In the real model, the k is set to be 5 because infinite value cannot be calculated.

5.4. Selection of Relay Node

The DRL that is employed in the SROR algorithm consists of an evaluation network and a target network, and their structure is same. Therefore, a DRL-based method is designed to find the relay node. Figure 5 depicts the overall SROR framework. The pseudo code of the SROR is illustrated in Algorithm 1. Next, we analyze the computational complexity of the Algorithm 1. For the SROR, computational complexity of each time slot is denoted by

τ_{1} = (O (|A| \times |S| \times N))

. Here, the

A

and

S

represent the size of the action space and state space, respectively. The N is the number of vehicles. Considering the total number of episodes and the number of time slots in each episode, the total computational complexity of Algorithm 1 is denoted by

K_{m a x} \times T_{m a x} \times τ_{1}

, where

K_{m a x}

and

T_{m a x}

represent the total number of episodes and the number of time slots in each episode, respectively.

Algorithm 1: SROR

Specifically, a current state

s_{t}

is generated randomly by the environment at time step t, and the state

s_{t}

is inputted into the DRL. Then, an action

a_{t}

is selected according to a

ε

-greedy policy. Finally, Q value of the action

a_{t}

is evaluated by the evaluation network:

Q (s_{t}, a_{t}; θ_{t}) = E [R_{t}] = E [\sum_{k = 0}^{\infty} γ^{k} r_{t + k} | s_{t}, a_{t}],

(15)

where

E [\cdot]

represents the expectation operation and

θ_{t}

represents the network parameter of evaluation network.

Then, the state

s_{t}

is transferred to the state

s_{t + 1}

. The quadruples

(s_{t}, a_{t}, r_{t}, s_{t + 1})

are stored into a replay buffer. To revisit the (17),

Q (s_{t}, a_{t}; θ_{t})

is a prediction value, and it is evaluated by the evaluation network. Correspondingly, the target value is denoted by

y_{t}

, which is evaluated by the target network. The

y_{t}

is given by:

y_{t} = r_{t + 1} + γ Q (s_{t + 1}, \arg max_{a \in A} Q (s_{t + 1}, a; θ_{t}); θ_{t}^{'}),

(16)

where

θ_{t}^{'}

is the target network parameter and

\arg {max}_{a \in A} Q (s_{t + 1}, a; θ_{t})

represents the action with the greatest Q value in evaluation network.

In addition, the target network parameter

θ_{t}^{'}

is copied from the evaluation network parameter

θ_{t}

every

ϖ

time steps due to the same network structure. Upon computation of the target value

y_{t}

, a loss function [51] is established by the evaluation network:

L (θ_{t}) = E [{(y_{t} - Q (s_{t}, a_{t}; θ_{t}))}^{2}] .

(17)

Subsequently, the gradient of

L (θ_{t})

can be given by:

▽ L (θ_{t}) = E [(y_{t} - Q (s_{t}, a_{t}; θ_{t})) ▽ Q (s_{t}, a_{t}; θ_{t})] .

(18)

According to a stochastic gradient descent [52], the parameter

θ_{t}

is updated, which is given by:

θ_{t + 1} = θ_{t} - α ▽ L (θ_{t}),

(19)

where

α

represents the learning rate (LR).

To obtain the better convergence performance, we employ a dynamic LR rather than a fixed LR. The learning rate can be adjusted using the learning rate attenuation strategy when using the Adam optimizer. Fixed attenuation, exponential attenuation, and cosine attenuation are common attenuation strategies. Our methodology adopts the OneCycleLR strategy. This strategy can ensure a large learning rate at the beginning of training, and after a certain amount of training, the learning rate gradually decreases. At the later stage of training, the learning rate decays slowly.

Specially, the OneCycleLR policy is introduced in the proposed SROR. In the OneCycleLR policy, the LR changes after every batch. The maximum LR

a_{\max}

and the percentage of the cycle

ρ

are the important parameters for the OneCycleLR policy, the former being the upper learning rate boundary in the cycle, and the latter being the percentage of the cycle spent in increasing the LR. The default value of

ρ

is 0.3.

To get a better idea of how the learning rate varies, Figure 6 shows the curve of the learning rate using the OneCycleLR policy, where the total number of episodes is 20,000,

a_{\max} = 0.9

, and

ρ = 0.3

. As shown in Figure 6, in a cycle (

ρ

× 20,000 = 6000), the learning rate first increases to maximum learning rate

a_{\max}

when the number of episodes increases from 1 to 6000. Then, the learning rate decays from the maximum learning rate to initial learning rate when the number of episodes increases from 6001 to 12,000. Finally, the learning rate decay to the minimum value

a_{mim} = 1.0 \times 10^{- 5}

by cosine when the number of episodes increases from 12,001 to 20,000, as shown in Figure 6b. In the OneCycleLR policy, the LR is growing faster in the early stages of training. This is conducive to improving the speed of model convergence. On the contrary, the LR slowly decays in the later stages of training. This aid in avoiding over-fitting as well as improving the ability of generalization.

6. Simulation Results

Simulation tools, setting, and numerical results are presented in this section to demonstrate the performance of the SROR algorithm in terms of attack success ratio, EED, and PDR. In addition, we discuss the convergence of SROR by analyzing the average reward and cumulative rewards of agent in different learning rate and exploratory factor policy.

6.1. Simulation Tools and Settings

The code was written in Python 3.9.0. Traffic simulations are used to Simulate Urban MObility (SUMO) version 0.3.0. More details in simulation parameters are marked in Table 3. Each packet that needs to be transmitted is considered as be an agent, and the agent learns the VANETs environment. Then, the agent selects an action according to Equation (10). Subsequently, based on received reward, the agent adjusts the action. Ultimately, the optimal action is obtained to improve the routing performance.

In addition, Table 4 lists the simulated values of RL hyperparameters. A fully connected neural network is adopted in the network, and there are one input layer, two hidden layers, and one output layers. The input layer has five neurons that are consistent with the dimensions of the state. The influence of some key parameters, such as learning rate and exploration factor on the performance of the DRL, are analyzed. Moreover, numerical results are reported with an average of 100 runs in order to avoid errors. The environment and state are reset at the beginning of each episode, and each episode contains 100 time steps.

6.2. Key Parameter of the OneCycleLR Policy

As described in selection V,

a_{\max}

is the key parameter of OneCycleLR policy, and its effect on convergence and reward are analyzed. Figure 7 shows the average reward and cumulative rewards with different

a_{\max}

, where

ρ = 0.3

, and

a_{mim} = 1.0 \times 10^{- 5}

. Other parameters are the default values of the OneCycleLR policy.

Rewards are the source of direct experience for agents to constantly improve themselves and enable themselves to achieve their goals independently. The agent judges good or bad behavior by using the feedback rewards. Figure 7a shows the average reward with different

a_{\max}

. As known from Figure 7a, average reward of agent with

a_{\max} = 0.05

converges when the number of episodes is about 7500, its convergence is the best. As

a_{\max} = 0.5

or

a_{\max} = 0.005

, the average reward does not converges until the number of episodes is up to about 11,000.

Figure 7b shows the cumulative rewards with different

a_{\max}

. As shown in Figure 7b, the cumulative rewards are the highest when

a_{\max} = 0.05

. As can be seen from Figure 6, the convergence is the best as

a_{\max} = 0.05

. To this end, in our simulation, we will set

a_{\max}

as 0.05 to get the best performance.

6.3. Effect of Exploratory Factor $ε$

In the

ε

-greedy policy,

ε

is a very important parameter. Therefore, the effect of

ε

on convergence is measured. Figure 8 shows the average reward and cumulative rewards under three different policies, where

a_{\max} = 0.05

,

ρ = 0.3

, and

a_{mim} = 1.0 \times 10^{- 5}

. These policies are defined as: (1) Dynamic

ε

:

ε = ε_{init} \times 0 . 99^{e p s i o d e}

, where

ε_{init}

is the initial value of

ε

and

ε_{init} = 1

; (2) Fixed

ε

, where the

ε

is fixed at 0.5; (3) Fixed

ε

, where

ε

is fixed at 0.9.

Figure 8a shows the average reward of agent with different policies. As shown in Figure 8a, the average reward does not vary much with the number of episodes when

ε

is fixed. Additionally, when

ε

is fixed, the average reward and cumulative rewards of

ε = 0.5

are higher than that of

ε = 0.9

. While the average reward increases slowly with the number of episodes when

ε

is dynamic, where

ε = ε_{init} \times 0 . 99^{e p s i o d e}

. It converges when the number of episodes reaches about 10,000. In addition, convergence of the average reward with dynamic

ε

is more than the fixed

ε

. The average reward is the lowest when

ε = 0.9

. Figure 8b shows the cumulative rewards of agent with different policies, it shows the advantages of dynamic

ε

more intuitively compared with Figure 8a.

6.4. Comparison with Benchmarks

To analyze the effectiveness of the proposed SROR, it is compared with three baseline algorithms in terms of PDR, EED, and Attack Success Ratio (ASR). ASR is equal to the ratio of number of packet dropped by the attacker to the number of packets that should have been forwarded. Three baseline algorithms are followed: (1) Strength Pareto Evolutionary Algorithm (SPEA) [9], (2) Q-learning-based Trustworthy Routing Technique (QL-TRT) [43], and (3) Probability-based Reliable Opportunistic Routing (PROR) [53].

6.4.1. Impact of Vehicle Density

Figure 9 shows the simulation results where vehicle density ranges from 10 to 50, and the percentage of malicious vehicles

σ

is fixed at 10%.

As Figure 9a shows, as the vehicle density increases, the PDR increases rapidly. This is due to the fact that the network becomes more connected when vehicle density increases. However, note that the PDR of them rises at different speeds with the increase of vehicle density. When vehicle density is low, the PDR of SROR and PROR are more than that of SPEA. However, when vehicle density is high, such as 60, the PDR of SPEA is the highest. This phenomenon can be explained as follows: (1) both the PROR and SROR algorithms belong to opportunistic routing. The network is sparse when vehicle density is low. They give full flexibility in selecting the next-hop relay in the sparse network. (2) The SPEA algorithm belongs to geographic routing, and it takes position of nodes into consideration in route discovery. Geographic routing is not suitable to sparse network. In addition, the PDR of QL-TRT algorithm is the lowest. The explanation for this is that QL-TRT only focuses on detecting attacks. The relative velocity and connectivity probability are not taken into consideration in the selection of relay nodes.

Figure 9b compares the EED of SPEA, QL-TRT, PROR, and SROR algorithms. It is observed that the EED of QL-TRT is the highest. The QL-TRT algorithm belongs to secure routing, and its goal is to detect the BHA and GHA, while the proposed SROR algorithm outperforms than SPEA, QL-TRT, and PROR in term of EED. For example, the EED of SROR is about 0.20, and the EEDs of SPEA, QL-TRT, and PROR are 0.34 s, 0.58 s, and 0.339 s, respectively, when vehicle density is 60.

Figure 9c depicts the attack success ratio of the SPEA, QL-TRT, PROR, and SROR algorithms. As shown in Figure 9c, the ASR of SPEA is similar with that of PROR, and the ASR of SROR is similar with that of QL-TRT. In addition, the ASR of SPEA and PROR algorithms are much larger than that of QL-TRT and SROR algorithms. This is due to the fact that SPEA and PROR algorithms did not taken the malicious nodes into consideration, and they did nothing to prevent the attackers from forwarding packet, resulting in the increase of the number of packets dropped by the attackers.

In conclusion, Table 5 lists the average results of each protocol where vehicle density ranges from 10 to 50, and

σ = 10

%. As known from Table 5, the average PDR of the proposed SROR is 0.82, outperforming QL-TRT, PROR, and SPEA. Additionally, the proposed SROR’s EED and ASR are the lowest among four protocols. These results show that our methodology can improve the PDR, reduce the EED, and lower the ASR.

6.4.2. Impact of the Percentage of Malicious Vehicles

In this simulation, vehicle density is fixed at 40. Figure 10 shows the simulation results where the

σ

ranges from 5% to 30%.

Figure 10a illustrates the PDR of SPEA, QL-TRT, PROR, and SROR algorithms. It is observed that the PDR descends with the increase of the

σ

. The reason lies in the increase of attacks when the

σ

increases. More attacks result in the increase of packets dropped by the attackers, and the PDR decreases in the end.

In addition, it is observed that the PDR of SROR algorithm is the highest, while the PDR of SPEA and PROR algorithm are the lowest, and their PDR are similar. The PDR of the SPEA and PROR algorithms declined quickly since they did nothing to prevent attackers from forwarding packet. This means they are not robust against attacks.

Figure 10b compares the EED of the SPEA, QL-TRT, PROR, and SROR algorithms. As known from Figure 10b, the EED increases when the

σ

increases. The is due to the fact that as the

σ

increases, the number of packets dropped by the attackers increases. Once the destination fails to received the packet, the process of route discovery is restarted, which results in the increase of transmission delay.

Furthermore, as Figure 10b shows, the SROR algorithm outperforms SPEA in term of EED, which contributes to the detection of attacks. In addition, SROR combines the opportunistic and geographic routing. So, the EED of SROR algorithm is very low even when the

σ

is high. However, as the

σ

is more than 15, the EED of SROR algorithm is a little bigger than that of the QL-TRT algorithm. This is due to the fact that the Expected Transmission Time (ETT) is taken into consideration in the route discovery of QL-TRT algorithm.

The numerical results for the attack success ratio are shown in Figure 10c. As shown in Figure 10c, the attack success ratio increases as the

σ

increases. However, for the PROR and SPEA algorithms, the attack success ratios grow slowly. Nevertheless, for the QL-TRT and SROR algorithms, the attack success ratio of them grow quickly when the

σ

is varied from 5 to 30. The main reason is that both of PROR and SPEA algorithms did not detect the attacks, and the attack success ratio of them do not really vary with

σ

. Fortunately, in regard to the attack success ratio, the SROR algorithm shows an enhancement of about 0.62, 0.62, and 0.34 compared with SPEA, PROR, and QL-TRT algorithms when the

σ

is 20.

Table 6 clearly display the average PDR, EED, and ASR of four protocol, where vehicle density is fixed at 40, and the

σ

ranges from 5% to 30%. As Table 6 shows, the average PDF of the proposed SROR algorithm is up to 0.82 as the

σ

ranges from 5% to 30%. Compared with the benchmark protocols, our methodology improves the PDF by 12%. Additionally, the SROR algorithm also lowers the ASR because it prohibits attackers from routing packets. Compared with the QL-TRT protocol, SROR lowers the ASR by about 26%. The SROR algorithm outperforms the SPEA and PROR protocols in term of EED.

7. Conclusions, Limitations, and Future Directions

In this paper, a secure and reliable opportunistic routing named SROR is proposed for selecting a stable route path. The proposed methodology was embraced to address issues regrading selection of relay node and attackers. These attackers often lead to packet drops due to their malicious activities. By employing three important network metrics, some candidate forwarding nodes are selected. The aim of the proposed SROR algorithm is to select relay nodes from the candidate forwarding nodes. In addition, the dynamic exploratory factor and OneCycleLR policy are adopted to improve the reward and achieve a better convergence. Compared with benchmarks, simulation experiments show that the SROR algorithm achieves better routing performance. When the percentage of malicious vehicles ranges from 5% to 30% and the vehicle density is fixed 40, the SROR algorithm improves the average PDR by 12% and lowers the ASR by about 26% compared with QL-TRT, displaying the best overall performance among the SPEA, PROR, and QL-TRT protocols. The comparative analysis illustrated the efficiency of the proposed SROR algorithm.

Although the effectiveness of the SROR algorithm was verified, it has some limitations. The main limitations of our research are as follows: (1) this study was confined to identifying BHA and GHA attacks in the routing protocol; (2) this study verified the effectiveness of the SROR algorithm by simulation, and the real-world validation is not discussed.

Therefore, in future work, we plan to consider multiple different types of attacks simultaneously in the routing protocol. Additionally, the proposed SROR algorithm will be refined to apply to the real vehicular network, and the performance is further analyzed.

Author Contributions

Methodology, H.X.; Software, H.X.; Validation, H.X.; Writing—original draft, H.X.; Writing—review & editing, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Huzhou Natural Science Foundation, Grant No. 2021YZ20, and the Science and Technology Project of Jiangxi Provincial Department of Education, Grant No. GJJ201818.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following main abbreviations are used in this manuscript:

VANETs	Vehicular Ad-Hoc Networks
IoT	Internet of Things
V2V	Vehicle-to-Vehicle
BHA	Black Hole Attack
GHA	Gray Hole Attack
RL	Reinforcement Learning
CFNs	Candidate Forwarding Nodes
MDP	Markov Decision Process
DRL	Deep Reinforcement Learning
DQN	Deep Q-network
OBU	On-Board Unit
PFR	Packet Forwarding Ratio
PDR	Packet Delivery Ratio
EED	End-to-end Delay
ASR	Attack Success Ratio

References

Wang, C.; Jiang, C.; Wang, J.; Shen, S.; Guo, S.; Zhang, P. Blockchain-aided network resource orchestration in intelligent Internet of Things. IEEE Internet Things J. 2023, 10, 6151–6168. [Google Scholar] [CrossRef]
Wiseman, Y. Adapting the H.264 Standard to the Internet of Vehicles. Technologies 2023, 11, 103. [Google Scholar] [CrossRef]
Al-essa, R.; Al-suhail, G. AFB-GPSR: Adaptive Beaconing Strategy Based on Fuzzy Logic Scheme for Geographical Routing in a Mobile Ad Hoc Network (MANET). Computation 2023, 11, 174. [Google Scholar] [CrossRef]
Shen, Y.; Shen, S.; Li, Q.; Zhou, H.; Wu, Z.; Qu, Y. Evolutionary privacy-preserving learning strategies for edge-based IoT data sharing schemes. Digit. Commun. Netw. 2023, 9, 906–919. [Google Scholar] [CrossRef]
Mustafa, S.; Khattab, M.; Abdul, K. An Assessment of Ensemble Voting Approaches, Random Forest, and Decision Tree Techniques in Detecting Distributed Denial of Service (DDoS) Attacks. J. Electr. Electron. Eng. 2023, 20, 16–24. [Google Scholar] [CrossRef]
Karabulut, M.; Shah, A. Inspecting VANET with Various Critical Aspects—A Systematic Review. Ad Hoc Netw. 2023, 150, 103281. [Google Scholar] [CrossRef]
Zhang, X.; Cao, X.; Yan, L.; Sung, D.K. A Street-Centric Opportunistic Routing Protocol Based on Link Correlation for Urban VANETs. IEEE Trans. Mob. Comput. 2016, 15, 1586–1599. [Google Scholar] [CrossRef]
Li, G.; Boukhatem, L.; Wu, J. Adaptive Quality-of-Service-Based Routing for Vehicular Ad Hoc Networks with Ant Colony Optimization. IEEE Trans. Veh. Technol. 2017, 66, 3249–3264. [Google Scholar] [CrossRef]
Ghorai, C.; Shakhari, S.; Banerjee, I. A SPEA-Based Multimetric Routing Protocol for Intelligent Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2021, 22, 6737–6747. [Google Scholar] [CrossRef]
Shen, S.; Wu, X.; Sun, P.; Zhou, H.; Wu, Z.; Yu, S. Optimal privacy preservation strategies with signaling Q-learning for edge-computing-based IoT resource grant systems. Expert Syst. Appl. 2023, 225, 120192. [Google Scholar] [CrossRef]
Zhang, P.; Chen, N.; Shen, S.; Yu, S.; Kumar, N.; Hsu, C.H. AI-enabled space-air-ground integrated networks: Management and optimization. IEEE Netw. 2023, 38, 186–192. [Google Scholar] [CrossRef]
Sherazi, H.H.R.; Iqbal, R.; Ahmad, F.; Khan, Z.A.; Chaudary, M.H. DDoS Attack Detection: A Key Enabler for Sustainable Communication in Internet of Vehicles. Sustain. Comput. Inform. Syst. 2019, 23, 13–20. [Google Scholar] [CrossRef]
Wu, G.; Xu, Z.; Zhang, H.; Shen, S.; Yu, S. Multi-agent lDRL for joint completion delay and energy consumption with queuing theory in lMEC-based lIIoT. J. Parallel Distrib. Comput. 2022, 176, 80–94. [Google Scholar] [CrossRef]
Wang, Y.; Li, X.; Zhang, X.; Liu, X.; Weng, J. ARPLR: An All-Round and Highly Privacy-Preserving Location-Based Routing Scheme for VANETs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16558–16575. [Google Scholar] [CrossRef]
Khan, A.; Siddiqui, A.A.; Ullah, F.; Bilal, M.; Piran, M.J.; Song, H. VP-CAST: Velocity and Position-Based Broadcast Suppression for VANETs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18512–18525. [Google Scholar] [CrossRef]
Ullah, A.; Yao, X.; Shaheen, S.; Ning, H. Advances in Position Based Routing Towards ITS Enabled FoG-Oriented VANET–A Survey. IEEE Trans. Intell. Transp. Syst. 2020, 21, 828–840. [Google Scholar] [CrossRef]
Kadadha, M.; Otrok, H.; Barada, H.; Al-Qutayri, M.; Al-Hammadi, Y. A stackelberg game for street-centric QoS-OLSR protocol in urban Vehicular Ad Hoc networks. Veh. Commun. 2018, 13, 64–77. [Google Scholar] [CrossRef]
Wang, X.; Weng, Y.; Gao, H. A Low-Latency and Energy-Efficient Multimetric Routing Protocol Based on Network Connectivity in VANET Communication. IEEE Trans. Green Commun. Netw. 2021, 5, 1761–1776. [Google Scholar] [CrossRef]
Chen, C.; Liu, L.; Qiu, T.; Yang, K.; Gong, F.; Song, H. ASGR: An Artificial Spider-Web-Based Geographic Routing in Heterogeneous Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1604–1620. [Google Scholar] [CrossRef]
Khan, Z.; Fan, P.; Fang, S.; Abbas, F. An Unsupervised Cluster-Based VANET-Oriented Evolving Graph (CVoEG) Model and Associated Reliable Routing Scheme. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3844–3859. [Google Scholar] [CrossRef]
Abboud, K.; Zhuang, W. Impact of Microscopic Vehicle Mobility on Cluster-Based Routing Overhead in VANETs. IEEE Trans. Veh. Technol. 2015, 64, 5493–5502. [Google Scholar] [CrossRef]
Rivoirard, L.; Wahl, M.; Sondi, P. Multipoint Relaying Versus Chain-Branch-Leaf Clustering Performance in Optimized Link State Routing-Based Vehicular Ad Hoc Networks. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1034–1043. [Google Scholar] [CrossRef]
Liu, B.; Fang, Z.; Wang, W.; Shao, X.; Wei, W.; Jia, D.; Wang, E.; Xiong, S. A Region-Based Collaborative Management Scheme for Dynamic Clustering in Green VANET. IEEE Trans. Green Commun. Netw. 2022, 6, 1276–1287. [Google Scholar] [CrossRef]
Cheng, J.; Yuan, G.; Zhou, M.; Gao, S.; Huang, Z.; Liu, C. A Connectivity-Prediction-Based Dynamic Clustering Model for VANET in an Urban Scene. IEEE Internet Things J. 2020, 7, 8410–8418. [Google Scholar] [CrossRef]
Asgharpoor Salkuyeh, M.; Abolhassani, B. An Adaptive Multipath Geographic Routing for Video Transmission in Urban VANETs. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2822–2831. [Google Scholar] [CrossRef]
Cao, Y.; Kaiwartya, O.; Aslam, N.; Han, C.; Zhang, X.; Zhuang, Y.; Dianati, M. A Trajectory-Driven Opportunistic Routing Protocol for VCPS. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 2628–2642. [Google Scholar] [CrossRef]
Hanssan, U.; Mahmmod, A.; Amin, A. ANN-Based Intelligent Secure Routing Protocol in Vehicular Ad Hoc Networks (VANETs) Using Enhanced AODV. Sensor 2024, 3, 818. [Google Scholar] [CrossRef]
Hashem Eiza, M.; Owens, T.; Ni, Q. Secure and Robust Multi-Constrained QoS Aware Routing Algorithm for VANETs. IEEE Trans. Dependable Secur. Comput. 2016, 13, 32–45. [Google Scholar] [CrossRef]
Lyu, J.; Chen, C.; Tian, H. Secure Routing Based on Geographic Location for Resisting Blackhole Attack In Three-dimensional VANETs. In Proceedings of the 2020 IEEE/CIC International Conference on Communications in China (ICCC), Chongqing, China, 9–11 August 2020; pp. 1168–1173. [Google Scholar] [CrossRef]
Xia, H.; Zhang, S.s.; Li, Y.; Pan, Z.k.; Peng, X.; Cheng, X.z. An Attack-Resistant Trust Inference Model for Securing Routing in Vehicular Ad Hoc Networks. IEEE Trans. Veh. Technol. 2019, 68, 7108–7120. [Google Scholar] [CrossRef]
Muzammal, S.M.; Murugesan, R.K.; Jhanjhi, N.Z. A Comprehensive Review on Secure Routing in Internet of Things: Mitigation Methods and Trust-Based Approaches. IEEE Internet Things J. 2021, 8, 4186–4210. [Google Scholar] [CrossRef]
Shokrollahi, S.; Dehghan, M. TGRV: A trust-based geographic routing protocol for VANETs. Ad Hoc Netw. 2023, 140, 103062.1–103062.16. [Google Scholar] [CrossRef]
Shen, J.; Wang, C.; Castiglione, A.; Liu, D.; Esposito, C. Trustworthiness Evaluation-Based Routing Protocol for Incompletely Predictable Vehicular Ad Hoc Networks. IEEE Trans. Big Data 2022, 8, 48–59. [Google Scholar] [CrossRef]
Azizi, M.; Shokrollahi, S. RTRV: An RSU-assisted trust-based routing protocol for VANETs. Ad Hoc Netw. 2024, 154, 103387. [Google Scholar] [CrossRef]
Wu, G.; Wang, H.; Zhang, H.; Zhao, Y.; Yu, S.; Shen, S. Computation offloading method using stochastic games for software-defined-network-based multiagent mobile edge computing. IEEE Internet Things J. 2023, 10, 17620–17634. [Google Scholar] [CrossRef]
Lu, C.; Wang, Z.; Ding, W.; Li, G.; Liu, S.; Cheng, L. MARVEL: Multi-agent reinforcement learning for VANET delay minimization. China Commun. 2021, 18, 1–11. [Google Scholar] [CrossRef]
Jiang, S.; Huang, Z.; Ji, Y. Adaptive UAV-Assisted Geographic Routing with Q-Learning in VANET. IEEE Commun. Lett. 2021, 25, 1358–1362. [Google Scholar] [CrossRef]
Luo, L.; Sheng, L.; Yu, H.; Sun, G. Intersection-Based V2X Routing via Reinforcement Learning in Vehicular Ad Hoc Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5446–5459. [Google Scholar] [CrossRef]
Li, Z.; Li, Y.; Wang, W. Deep reinforcement learning-based collaborative routing algorithm for clustered MANETs. China Commun. 2023, 20, 185–200. [Google Scholar] [CrossRef]
Shen, S.; Xie, L.; Zhang, Y.; Wu, G.; Zhang, H.; Yu, S. Joint Differential Game and Double Deep lQ-Networks for Suppressing Malware Spread in Industrial Internet of Things. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5302–5315. [Google Scholar] [CrossRef]
Wu, Y.; Wu, J.; Chen, L.; Yan, J.; Han, Y. Load Balance Guaranteed Vehicle-to-Vehicle Computation Offloading for Min-Max Fairness in VANETs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11994–12013. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Zhang, B.; Shen, J.; Kuang, X.; Grieco, L.A. Toward Attack-Resistant Route Mutation for VANETs: An Online and Adaptive Multiagent Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23254–23267. [Google Scholar] [CrossRef]
Mianji, E.M.; Muntean, G.M.; Tal, I. Trustworthy Routing in VANET: A Q-learning Approach to Protect Against Black Hole and Gray Hole Attacks. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023; pp. 1–6. [Google Scholar] [CrossRef]
Stępień, K.; Poniszewska-Marańda, A. Security methods against Black Hole attacks in Vehicular Ad-Hoc Network. In Proceedings of the 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 24–27 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
Ullah, I.; Noor, A.; Aslam, G. Protecting loT devices from security attacks using effective decision-making strategy of appropriate features. J. Supercomput. 2024, 80, 5870–5899. [Google Scholar] [CrossRef]
Huang, J.J. Accurate Probability Distribution of Rehealing Delay in Sparse VANETs. IEEE Commun. Lett. 2015, 19, 1193–1196. [Google Scholar] [CrossRef]
Zhou, M.; Liu, L.; Sun, Y.; Wang, K.; Dong, M.; Atiquzzaman, M.; Dustdar, S. On Vehicular Ad-Hoc Networks with Full-Duplex Radios: An End-to-End Delay Perspective. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10912–10922. [Google Scholar] [CrossRef]
Gao, H.; Liu, C.; Li, Y.; Yang, X. V2VR: Reliable Hybrid-Network-Oriented V2V Data Transmission and Routing Considering RSUs and Connectivity Probability. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3533–3546. [Google Scholar] [CrossRef]
Wisitpongphan, N.; Bai, F.; Mudalige, P.; Sadekar, V.; Tonguz, O. Routing in Sparse Vehicular Ad Hoc Wireless Networks. IEEE J. Sel. Areas Commun. 2007, 25, 1538–1556. [Google Scholar] [CrossRef]
He, J.; Cai, L.; Pan, J.; Cheng, P. Delay Analysis and Routing for Two-Dimensional VANETs Using Carry-and-Forward Mechanism. IEEE Trans. Mob. Comput. 2017, 16, 1830–1841. [Google Scholar] [CrossRef]
Lee, C.; Jung, J.; Chung, J.M. Intelligent Dual Active Protocol Stack Handover Based on Double DQN Deep Reinforcement Learning for 5G mmWave Networks. IEEE Trans. Veh. Technol. 2022, 71, 7572–7584. [Google Scholar] [CrossRef]
Liu, G.; Lv, S.; Wang, C.; Li, X.; Nai, W. Surface Material Classification Based on Unbalanced Visual and Haptic Data: A Double-DQN Method. IEEE Trans. Instrum. Meas. 2023, 72, 5006712. [Google Scholar] [CrossRef]
Ning, L.; Jose-Fernan, M.O.; Hernandez, D.V.; Sanchez, F.J.A. Probability Prediction-Based Reliable and Efficient Opportunistic Routing Algorithm for VANETs. IEEE/ACM Trans. Netw. 2018, 26, 1933–1947. [Google Scholar] [CrossRef]

Figure 1. A typical network structure of VANETs.

Figure 2. System model.

Figure 3. The idea of opportunistic routing.

Figure 4. An m hops path from a source node to its destination.

Figure 5. Structure of SROR.

Figure 6. Learning rate for OneCycleLR policy: (a) from episode 1 to 20,000; (b) from episode 12,000 to 20,000.

Figure 7. Average reward and cumulative rewards for the OneCycleLR policy. (a) Average reward. (b) Cumulative rewards.

Figure 8. Effect of

ε

on reward. (a) Average reward. (b) Cumulative rewards.

Figure 8. Effect of

ε

on reward. (a) Average reward. (b) Cumulative rewards.

Figure 9. Comparison of the performance under different vehicle densities. (a) Packet delivery ratio. (b) End-to-end delay. (c) Attack success ratio.

Figure 10. Comparison of the performance under different

σ

. (a) Packet delivery ratio. (b) End-to-end delay. (c) Attack success ratio.

Figure 10. Comparison of the performance under different

σ

. (a) Packet delivery ratio. (b) End-to-end delay. (c) Attack success ratio.

Table 1. List of symbols.

Notation	Description	Unit
$ϑ_{i}$	i-th vehicle	–
$υ_{i}$	Velocity of $ϑ_{i}$	m/s
$(x_{i}, y_{i})$	Coordinates of $ϑ_{i}$	m
$D_{i, j}$	Distance between $ϑ_{i}$ and $ϑ_{j}$	–
$l_{i, j}$	Link between $ϑ_{i}$ and $ϑ_{j}$	m
$T_{i, j}^{to}$	Total delay that a packet is transmitted through link $l_{i, j}$	s
$ν_{i, j}$	Relative velocity of the $ϑ_{i}$ with respect to $ϑ_{j}$	m/s
$P F R_{j}$	Packet forwarding ratio of the vehicle $ϑ_{j}$	–
$P_{j}$	Connectivity probability of $ϑ_{j}$ connected to $ϑ_{i}$	–

Table 2. The basic components of the DRL model.

Basic Components	System
Learning environment	VANETs
Agent	Each packet that needs to be transmitted
State of the agent	Union of the Node-HP and its CFNs set
Action	Selection of a relay node

Table 3. Simulation parameters.

Parameter	Default Value
Simulation area	6 km × 6 km
Communication protocol	IEEE 802.11p
Packet Size	512 bytes
Number of transmitting packets in each second	10
Vehicle density	10–50 vehicles/km × km
Number of direction	2
Number of lanes	2 in each direction
Traffic type	Constant bit rate
Vehicle velocity	5–25 m/s
Transmission range	250 m
Mobility model of vehicles	Random-Way point

Table 4. Hyperparameters.

Hyperparameter	Default Value
Number of episodes	20,000
Number of time slots	100
Learning rate	OneCycleLR policy
Initial exploration factor	1
Discount factor	0.99
$R_{1}$ , $R_{2}$	1, 0.7
Replay buffer size	10,000
Minimum batch size	32
Optimizer	Adam
Activation function	Relu
Loss function	Equation (17)

Table 5. PDR, EED, and ASR performance under different vehicle densities.

Protocol	Packet Delivery Ratio	End-to-End Delay (s)	Attack Success Ratio
SPEA	0.80	0.28	0.78
QL-TRT	0.73	0.37	0.42
PROR	0.81	0.28	0.80
SROR	0.82	0.26	0.37

Table 6. PDR, EED, and ASR performance under different percentages of malicious vehicles.

Protocol	Packet Delivery Ratio	End-to-End Delay (s)	Attack Success Ratio
SPEA	0.73	0.56	0.84
QL-TRT	0.72	0.44	0.58
PROR	0.72	0.48	0.84
SROR	0.82	0.44	0.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Wang, Y. SROR: A Secure and Reliable Opportunistic Routing for VANETs. Vehicles 2024, 6, 1730-1751. https://doi.org/10.3390/vehicles6040084

AMA Style

Xu H, Wang Y. SROR: A Secure and Reliable Opportunistic Routing for VANETs. Vehicles. 2024; 6(4):1730-1751. https://doi.org/10.3390/vehicles6040084

Chicago/Turabian Style

Xu, Huibin, and Ying Wang. 2024. "SROR: A Secure and Reliable Opportunistic Routing for VANETs" Vehicles 6, no. 4: 1730-1751. https://doi.org/10.3390/vehicles6040084

APA Style

Xu, H., & Wang, Y. (2024). SROR: A Secure and Reliable Opportunistic Routing for VANETs. Vehicles, 6(4), 1730-1751. https://doi.org/10.3390/vehicles6040084

Article Menu

SROR: A Secure and Reliable Opportunistic Routing for VANETs

Abstract

1. Introduction

2. Related Work

2.1. Routing Protocol for VANETs

2.2. Secure Routing for VANETs

2.3. Routing Protocol Based on RL

2.4. Problems That Need to Be Solved

3. System Model and Routing-Related Metrics

3.1. Network Model

3.2. Relative Velocity

3.3. Packet Forwarding Ratio

3.4. Connectivity Probability

3.5. Attack Model

4. Opportunistic Routing and Problem Formulation

4.1. Overview of Opportunistic Routing

4.2. Problem Formulation

5. Deep Reinforcement Learning—Opportunistic Routing

5.1. State Space

5.2. Action Space

5.3. Reward Function

5.4. Selection of Relay Node

6. Simulation Results

6.1. Simulation Tools and Settings

6.2. Key Parameter of the OneCycleLR Policy

6.3. Effect of Exploratory Factor ε

6.4. Comparison with Benchmarks

6.4.1. Impact of Vehicle Density

6.4.2. Impact of the Percentage of Malicious Vehicles

7. Conclusions, Limitations, and Future Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.3. Effect of Exploratory Factor $ε$