You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

26 December 2024

Enhancing VANET Security: An Unsupervised Learning Approach for Mitigating False Information Attacks in VANETs

and
Department of Computer Science, Oklahoma State University, Stillwater, OK 74078, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Machine Learning and Cybersecurity—Trends and Future Challenges

Abstract

Vehicular ad hoc networks (VANETs) enable communication among vehicles and between vehicles and infrastructure to provide safety and comfort to the users. Malicious nodes in VANETs may broadcast false information to create the impression of a fake event or road congestion. In addition, several malicious nodes may collude to collectively launch a false information attack to increase the credibility of the attack. Detection of these attacks is critical to mitigate the potential risks they bring to the safety of users. Existing techniques for detecting false information attacks in VANETs use different approaches such as machine learning, blockchain, trust scores, statistical methods, etc. These techniques rely on historical information about vehicles, artificial data used to train the technique, or coordination among vehicles. To address these limitations, we propose a false information attack detection technique for VANETs using an unsupervised anomaly detection approach. The objective of the proposed technique is to detect false information attacks based on only real-time characteristics of the network, achieving high accuracy and low processing delay. The performance evaluation results show that our proposed technique offers 30% lower data processing delay and a 17% lower false positive rate compared to existing approaches in scenarios with high proportions of malicious nodes.

1. Introduction

Vehicular ad hoc networks (VANETs) involve providing communication and exchange of information among vehicles, infrastructure, and other road users by integrating them into a network. These communication capabilities have contributed to various advancements, such as intelligent transportation systems and autonomous driving. VANETs enable efficient traffic management, offering safety and comfort to the vehicle and road users. VANETs can also support the dissemination of information about emergency events. In VANET architecture, the roadside infrastructure typically comprises roadside units (RSUs) deployed at specific points alongside the roads. The vehicles communicate with each other and with the infrastructure using an on-board unit (OBU) installed in the vehicle [,]. The vehicles connected to the network periodically transmit basic safety messages, also called beacon messages. These beacon messages contain information on real-time parameters of vehicles, such as speed, location, acceleration, etc. The vehicles also transmit event-based messages about specific events, such as road congestion or crashes [].
There can be potential malicious vehicles in VANETs that broadcast false information to create impressions of a fake event or road congestion for their selfish benefits. A malicious vehicle may broadcast false information individually, or there may be collusion among attackers, where they collectively launch a false information attack to increase the credibility of the attack []. The security of VANETs from the malicious activities of these nodes is vital considering the possible effects on public safety. However, ensuring the security of VANETs faces challenges arising from the wireless communication involved and the distinctive features of these networks, such as high mobility, intermittent connectivity, changing topology, difficulty in trust evaluation, validation of information shared by the nodes, etc. [,]. Despite these challenges, securing VANETs from false information attacks is essential as such attacks may lead to hazardous consequences due to the actions of vehicles in response to receiving false information [].
The existing techniques in the literature for detecting false information attacks in VANETs use various approaches such as machine learning, blockchain, trust scores, statistical methods, etc. These techniques are computationally expensive, leading to high processing delays, or they depend on roadside infrastructures or coordination among vehicles. However, intermittent connectivity in VANETs may impact the accurate and timely detection of attacks in a coordinated environment due to the loss of connectivity or messages between the coordinating vehicles. Some techniques (discussed in Section 2) use historical information about vehicles in the detection process. Some other techniques (discussed in Section 2) use artificial data to train machine learning or statistical models used in the detection. It is not always feasible to have historical information about vehicles, as in the highly dynamic scenario of VANETs new vehicles may join the network on the fly. Moreover, the use of historical data leads to higher processing delays. Similarly, if artificially generated data do not correspond to the real scenario where a technique needs to work, the detection accuracy can be reduced.
The RSU-based Online Intrusion Detection and Mitigation (RSUOIDM) technique proposed in [] uses historical data in the communication range of RSUs to train a non-parametric anomaly detection model. The RSUs use their trained models to detect false information in newly received information by comparing it with the detection model and computing an anomaly score. The reliance on historical data incurs an initialization delay for the technique. Moreover, if the traffic pattern changes in the communication range of an RSU from the time when the historical data were collected to the time of evaluation, the detection accuracy of this method is reduced. An unsupervised machine learning-based approach is adopted in [] that uses data clustering for false information detection. In this Data Clustering-based False Information Detection (DCFID) technique, the vehicles are clustered into two distinct groups, benign and malicious, based on the similarities in the information transmitted in the beacon messages from the vehicles. This technique offers high detection accuracy. However, it works under the strict assumption that all the malicious vehicles transmit the same false information in their beacon messages; though in real scenarios there may be more than one group of malicious nodes, each group transmitting different false information.
To address the limitations of the existing approaches, we propose a false information attack detection technique using an unsupervised anomaly detection approach. Specifically, we use the distance-based anomaly detection technique in our framework to detect false information based on the real-time characteristics of the network. The proposed technique can be used independently by a vehicle without relying on roadside infrastructure or other vehicles to detect false information broadcast by other vehicles in its communication range. We use an approximation technique in the anomaly detection process and avoid multiple scans over the data to reduce the processing delay. The proposed technique enables fast and accurate detection of false information attacks and offers 30% lower data processing delay and a 17% lower false positive rate compared to the state-of-the-art techniques [,], as validated by the performance evaluation results.
The novelty of the proposed technique comes from the detection of false information attacks based on only real-time characteristics of the network, without depending on either historical or artificial data, or coordination among vehicles. The motivations for the proposed technique are to achieve high detection accuracy and a low data processing delay in false information detection at a high proportion of malicious nodes. The performance of the proposed technique is evaluated using the SUMO and OMNET++ simulators considering up to 40% of malicious nodes in the network. The results show that the proposed technique meets our objectives, offering 30% lower data processing delay and a 17% lower false positive rate in scenarios with high proportions of malicious nodes.
The following are the contributions of this paper:
  • We propose a technique for detecting false information attacks in VANETs using distance-based anomaly detection.
  • We optimize the detection process using approximations in anomaly detection and avoiding multiple scans over the data to reduce data processing delays. We design an algorithm for this approximation using data binning.
  • We perform extensive simulations to evaluate the performance of the proposed technique in urban and highway scenarios.
The rest of this paper is organized as follows: Section 2 discusses the recent related work in false information detection in VANETs; Section 3 explains the details of the proposed technique; Section 4 discusses the performance evaluation results; and Section 5 provides directions for future work and conclusions.

3. The Proposed Technique

We discuss the details of our proposed false information detection technique in this section. The concept of distance-based anomalies and the attack model considered in this work are introduced first.
Distance-based anomaly: A data point in a dataset is a distance-based anomaly or distance-based outlier [] if the point does not have at least k other points within a distance R for user-defined parameters k and R . An example of a distance-based anomaly in two-dimensional space is shown in Figure 1. In this example, the point p is a distance-based anomaly if the value of the parameter k is specified as 4, i.e., k = 4 . Here, p has only three other points, excluding p itself, within the specified distance R from p , i.e., the region bounded by the green circle with its center at p . As p has less than four (since k = 4 ) points within the distance R , it is a distance-based anomaly for k = 4 . If the value of k is specified to be 3, p would not be a distance-based anomaly for the same distance R as p has three other points excluding itself within the green circle.
Figure 1. An example of a distance-based anomaly.
Attack model: A malicious node broadcasts a speed value lower than its real speed in the beacon messages [] to create the illusion of traffic congestion or an emergency event such as an accident. A vehicle analyzes beacon messages received from all the vehicles in its communication range to detect false information. It is assumed that the majority of vehicles are honest, as considered in the literature [,].

3.1. Overview of the Proposed Technique

In the proposed false information detection technique, a vehicle evaluates the speed values received from all other vehicles in its communication range to identify the malicious vehicles. A vehicle maintains a node list to store the vehicle I D and speed value broadcast by each node. The evaluator vehicle can evaluate the nodes in the node list based on their speed values on demand to detect any potential false information. Vehicles in a region travel at similar speeds as they are in the same traffic conditions and are influenced by the moving patterns of others. Therefore, if a vehicle reports a significantly different speed value compared to the other vehicles in a region, the vehicle is identified to be malicious. An evaluating vehicle uses distance-based anomaly detection to find these substantially different speed values by analyzing the beacon messages received from all the vehicles in its communication range in real time. The two parameters k and R required for distance-based anomaly detection (as discussed in Figure 1) need to be specified.

Parameter Selection for Anomaly Detection

The assumptions of our framework are as follows: we assume that the majority of vehicles are honest and hence the value for k is considered as half the number of nodes in the node list. That is, the speed of an honest vehicle is expected to be similar to at least half of the nodes in the node list. The distance parameter R to define the similarity range can be adjusted based on the travel scenario. For example, in a highway scenario, a speed difference of up to 10 km per hour can be considered similar, which results in R = 10 .
To count the number of points within the specified distance R , referred to as the neighbor count hereafter, an approximation strategy is adopted by using a data binning technique that minimizes the processing delay. The bin information is evaluated in the next phase to detect malicious nodes. The overall approach is shown in Figure 2. We discuss the data binning and evaluation phases in the next two subsections.
Figure 2. The overall approach of the proposed false information detection technique.

3.2. Data Binning Phase

Distance-based anomaly detection requires computation of all pair-wise distances between the speed values in a node list, which is computationally expensive. To address this issue, the data binning technique is used, which avoids computing all pair-wise distances between the speed values. The binning also benefits the evaluation phase as the binned data are used to collectively evaluate a group of nodes instead of evaluating them one by one.
Each bin contains a node count and a list of nodes belonging to the bin. While scanning each ( I D , speed) pair in the node list N l , the bins are dynamically created based on the speed values of the nodes. For each node in the list, the bin index b i n i n d is computed as follows:
b i n i n d = c e i l i n g s p e e d b i n w i d
Here, b i n w i d (<   R ) is the bin width parameter, which is discussed in the next subsection. After computing the b i n i n d , the corresponding bin is created if it does not already exist and the node count for the bin is initialized to one. If the bin already exists, the node count is increased by one. In either case, the node I D is added to the list of node I D s for the bin.
The bin information is used in the evaluation phase to detect the malicious nodes without scanning the node list again. Due to the similarity in the speed values of vehicles in a region, the number of bins created is much smaller compared to the number of nodes in the node list. Therefore, evaluating only the bins reduces the processing time of the evaluation phase, which is discussed in Section 3.3.

3.3. Evaluation Phase

In the evaluation phase of the proposed technique, an approximation of the neighbor count is made for all the nodes by inspecting the bins. This approximation is made for all the nodes in a bin together, and they are collectively evaluated for maliciousness. If the node count for a bin is at least half of the total number of nodes in the node list, the node I D s in that bin are determined as honest, since the speed of all these nodes are within a distance R (as b i n w i d < R ). Otherwise, the neighbor count is approximated from the adjacent bins. The number of bins, n u m b i n s , for this approximation is a small number to reduce the processing time. Also, n u m b i n s is considered an odd number for simplicity. The value of the bin width parameter b i n w i d is computed based on R and n u m b i n s from Equation (2), the rationale for which is explained in example 1.
2 R = n u m b i n s ×   b i n w i d
Example 1: Say we want to approximate the neighbor count for the nodes in the i -th bin b i in Figure 3 with three bins. We can estimate the neighbor count with reference to the midpoint of the bin, i.e., the distance R is extended on either side of the midpoint, as shown in the figure, which leads to 2 R = 3 ×   b i n w i d . This is generalized as Equation (2). It can be observed that for the bins at both ends, all possible bins may not exist. The same may happen for bins in the middle as the bins are dynamically created based on the observed speed values. If any bin does not exist, the neighbor count is approximated with only the bins that exist within the specified distance. Further, depending on the speed values of the nodes in the node list, the best approximation of the neighbor count may not be obtained by considering an equal number of bins on either side. For instance, the neighbor count for b i in Figure 3 can be obtained by considering one of three possible combinations: ( b i 2 ,     b i 1 ,     b i ) ,   ( b i 1 ,     b i ,   b i + 1 ) ,     ( b i ,     b i + 1 ,   b i + 2 ) .
Figure 3. An example of approximating neighbor count for bin b i with three bins.
The combination of bins that gives the maximum neighbor count is chosen, so that honest nodes are not incorrectly detected as malicious. This does not benefit the malicious nodes as they broadcast significantly different speed values compared to their real speed to create illusions of a fake event. Hence, even after considering the maximum neighbor count, they do not meet the evaluation criteria to have sufficient neighbors to be classified as honest. Moreover, as the number of bins considered for approximating the neighbor count is small, the number of combinations of bins to be evaluated is also small. Further, once a combination satisfies the criteria to be evaluated as honest, i.e., the neighbor count exceeds half the number of nodes in the node list, the remaining combinations of bins need not be evaluated. As such, the evaluation of all combinations of bins does not have a significant impact on the processing time. The overall process of neighbor count approximation for any bin b i is outlined in Algorithm 1.
Algorithm 1: Neighbor Count Approximation Algorithm
Input:   List   of   bins   B l ,   Bin   index   i ,   Number   of   bins   n u m b i n s ,   Size   of   node   list   N l . s i z e ( )
Output:   Neighbor   count   n e i g h b o r c o u n t
1:   n e i g h b o r c o u n t = 0
2:   for   j = i n u m b i n s   to   i
3:               c u r r e n t c o u n t = 0
4:             for   k = j   to   j + n u m b i n s
5:                       if   bin   number   k   exists   in   B l   then
6:                               c u r r e n t c o u n t = c u r r e n t c o u n t + n o d e c o u n t   for   bin   k
7:                       else
8:                               Continue
9:                     end if
10:         end for
11:         if   c u r r e n t c o u n t > n e i g h b o r c o u n t then
12:                     n e i g h b o r c o u n t = c u r r e n t c o u n t
13:         end if
14:         if   ( n e i g h b o r c o u n t N l . s i z e ( ) /2) then
15:                   Go to step 18
16:         end if
17: end for
18 : Return   n e i g h b o r c o u n t
19: end
After approximating the neighbor count, if the count for a bin is less than half the number of nodes in the node list, the node I D s belonging to that bin are marked as malicious. All the bins are evaluated in this manner and the detected malicious nodes are added to a malicious node list M l , which constitutes the output of the detection algorithm. Our proposed technique can be used to detect collusion among malicious nodes by observing the bins that have a substantial neighbor count but do not meet the requirement to be classified as honest. The overall procedure of the proposed technique is shown in Algorithm 2.
Algorithm 2: False Information Detection Algorithm
Input:   Node   list   N l ,   Distance   R ,   Number   of   bins   n u m b i n s
Output:   Malicious   node   list   M l
1:   Compute   b i n w i d using Equation (2)
2:   for   each   ( I D , s p e e d )   pair   in   N l                                               // data binning starts
3:             Compute   b i n i n d using Equation (1)
4:             if   bin   number   b i n i n d exists then
5:                       Increment   n o d e c o u n t   for   bin   number   b i n i n d
6:           else
7:                       Create   bin   number   b i n i n d
8:                       Initialize   n o d e c o u n t   for   bin   number   b i n i n d to 1
9:             end if
10:           Add   I D   to   n o d e l i s t   for   bin   number   b i n i n d
11: end for                                                                                                                                       // data binning ends
12:   for   each   bin   b i                                                                               // evaluation phase starts
13:           if   ( n o d e c o u n t   for   b i N l . s i z e ( ) /2) then
14:                     Continue
15:         else
16:                     Approximate   n e i g h b o r c o u n t using Algorithm 1
17:                     if   ( n e i g h b o r c o u n t N l . s i z e ( ) /2) then
18:                                   Continue
19:                   else
20:                                   Add   all   I D s   in   n o d e l i s t   of   b i   to   M l
21:                   end if
22:         end if
23: end for                                                                                         // evaluation phase ends
24:   Output   M l
25: end
The binning of the node list data and subsequent evaluation of the binned data enhances the scalability of the proposed technique. It may be noted that the binning of the node list data is carried out by scanning the node list only once. Therefore, even when the number of nodes in the network increases, there is a marginal increase in the processing time of the data binning phase only. Once binning is completed, each bin is collectively evaluated in the evaluation phase. Even when the number of nodes significantly increases, the number of bins does not increase due to the similar speed values of vehicles; only the bin counts for the bins increase. Thereafter, as the nodes in each bin are collectively evaluated in the evaluation, the processing time for the evaluation phase does not increase and maintains the scalability of the proposed technique to larger networks.

3.4. Time Complexity Analysis of Algorithm 2

This subsection analyzes the time complexity of Algorithm 2. The time complexity of the binning phase is O ( n n ) , where n n is the number of nodes in the node list N l . For the evaluation phase, the worst-case time complexity for computing the neighbor count for the bins using Algorithm 1 is O ( n b 3 ) , where n b is the number of bins. However, as the number of bins is significantly smaller compared to the number of nodes in the node list N l ( n b     n n ), the time complexity of Algorithm 2 is dominated by the time complexity of the evaluation phase, which is O ( n n ) . Hence, the overall time complexity of the proposed Algorithm 2 is O ( n n ) . This linear time complexity of the algorithm contributes to a low data processing delay in false information detection.

4. Experimental Results

The performance evaluation of the proposed technique is discussed in this section, with the simulation setup, performance metrics used, and the results obtained. As stated in Section 1, the performance of the proposed technique is compared with the RSUOIDM [] and DCFID [] techniques.

4.1. Simulation Setup

The performance evaluation of the proposed technique is carried out in both urban and highway scenarios. The simulations are carried out on a desktop computer with the Ubuntu 22.04.3 LTS operating system, equipped with an Intel 8th Gen i5-8400 Hexa-Core Processor running at 4 GHz, 8 GB of DDR4 RAM, and an NVIDIA GeForce GTX 1050Ti GPU. We use the Veins framework [] based on the SUMO and OMNET++ simulators for our simulation study. SUMO generates traces of vehicle movements, such as speed, location, acceleration, etc. SUMO also supports OpenStreetMap to import real-world road networks for generating simulation scenarios. The communication between vehicles is established using OMNET++, which is also used to measure network performance. Veins bi-directionally couples SUMO and OMNET++ to facilitate online network simulation. To evaluate the performance of our proposed technique, we import two maps of the city of Stillwater, Oklahoma, United States of America: one for the urban scenario and one for the highway scenario. In the urban scenario, vehicles have lower mobility and travel at a speed of 45–65 kph. The vehicles in the highway scenario travel at a speed of 80–110 kph. The honest vehicles in the simulation broadcast their actual speed in the beacon messages, while the malicious vehicles broadcast significantly lower speed values to resemble the situation of fake road congestion. The malicious vehicles in the urban scenario broadcast false speeds in the 20–30 kph range and the malicious vehicles in the highway scenario broadcast false speeds in the 35–50 kph range. We consider 500 vehicles in our simulation and measure the false information detection performance of the proposed technique by varying the proportion of malicious nodes in the 10–40% range. The values of the parameters used in the simulation are shown in Table 2 below.
Table 2. Parameter values used in simulation.

4.2. Performance Metrics

We use the following commonly used metrics to evaluate the performance of the proposed technique.
Data processing time: The time required for an evaluator node or RSU to evaluate the beacon message information to detect malicious nodes.
Accuracy: The fraction of correctly classified (honest and malicious) nodes out of the total number of nodes evaluated.
Precision: The fraction of correctly detected malicious nodes out of the total number of nodes detected as malicious.
Recall: The fraction of correctly detected malicious nodes out of the total number of actual malicious nodes.
F1 score: The harmonic mean of precision and recall that evenly expresses precision and recall in one metric.
False positive rate (FPR): The fraction of honest nodes incorrectly detected as malicious nodes.

4.3. Results for the Urban Scenario

The performance evaluation results for the urban scenario are discussed in this subsection with Figure 4.
Figure 4. Results for the urban scenario: (a) Data processing time vs. percentage of malicious nodes; (b) accuracy vs. percentage of malicious nodes; (c) precision vs. percentage of malicious nodes; (d) recall vs. percentage of malicious nodes; (e) F1 score vs. percentage of malicious nodes; (f) FPR vs. percentage of malicious nodes.
Data Processing Time: As observed from Figure 4a, the data processing time of the proposed technique is on average 29% lower in the urban scenario due to the collective evaluation of binned data in our technique (Section 3.3), thus offering lower processing time in comparison to the RSUOIDM and DCFID techniques. The data processing times of all the methods remain consistent with an increase in the percentage of malicious nodes as the number of beacon messages remains the same irrespective of the percentage of malicious nodes. The RSUOIDM technique evaluates the beacon messages from vehicles individually, leading to higher processing times. On the other hand, the clustering process of the DCFID technique leads to higher processing times though the evaluation of vehicles is collectively made in this technique after clustering. Our approach performs the binning process using only a single scan over the node list and then collectively evaluates the binned data.
Accuracy: The accuracy of the proposed technique remains consistent in the urban scenario when the proportion of malicious nodes increases. As seen in Figure 4b, when the proportion of malicious nodes increases beyond 30%, the accuracy slightly decreases. The anomaly detection approach in our proposed technique correctly classifies almost all the nodes using the binning method to achieve this high accuracy. The proposed technique offers higher accuracy in comparison to the RSUOIDM and DCFID techniques for all proportions of malicious nodes.
Precision: It is observed from Figure 4c that the precision of our proposed technique decreases slightly when the proportion of malicious nodes increases beyond 30%; however, the value still stays above 0.98. This signifies the correctness of our detection approach. Due to the variations in the vehicle speeds in the simulations, the speed values of some honest nodes do not remain similar to the majority of the honest nodes, and these nodes are incorrectly detected as malicious, resulting in a minor decrease in precision. The proposed technique offers higher precision for all proportions of malicious nodes compared to the RSUOIDM and DCFID techniques.
Recall: The recall value also remains higher than the RSUOIDM and DCFID techniques for all proportions of malicious nodes, which can be observed in Figure 4d. The proposed technique correctly detects almost all the malicious nodes in the network. To create the illusion of a false event, the malicious nodes abruptly lower the speed value in the beacon messages. The binning method in our technique separates these deviating speed values and correctly detects them, exploiting the fact that these values are a minority.
F1 score: The F1 score degrades slightly when the proportion of malicious nodes increases above 30% due to the marginal changes in precision and recall values in these cases. As seen in Figure 4e, the higher F1 score for our proposed technique than the RSUOIDM and DCFID techniques suggests that our technique can successfully detect the malicious nodes in the network without incorrectly classifying the honest nodes to be malicious.
FPR: The FPR of the proposed technique remains low, which can be observed in Figure 4f. The maximum value of FPR arises when the proportion of malicious nodes reaches 40%. In the presence of such a high proportion of malicious nodes, our detection technique incorrectly classifies a few honest nodes to be malicious resulting in a slightly higher FPR. However, in real-life scenarios, such high proportions of malicious nodes do not exist in a network. Overall, our detection technique’s correct classification of the honest nodes offers a 17% lower FPR compared to the RSUOIDM and DCFID techniques.

4.4. Results for the Highway Scenario

The performance evaluation results for the highway scenario are discussed in this subsection with Figure 5.
Figure 5. Results for the highway scenario: (a) Data processing time vs. percentage of malicious nodes; (b) accuracy vs. percentage of malicious nodes; (c) precision vs. percentage of malicious nodes; (d) recall vs. percentage of malicious nodes; (e) F1 score vs. percentage of malicious nodes; (f) FPR vs. percentage of malicious nodes.
Data processing time: Our proposed technique offers a 31% lower data processing time on average in the highway scenario in comparison to the RSUOIDM and DCFID techniques as seen in Figure 5a.
As in the urban scenario, the data processing times of the three techniques are independent of the percentage of malicious nodes. The data processing time of our technique is marginally lower in the highway scenario compared to the urban scenario as due to the high mobility of vehicles in the highway scenario, a smaller number of beacon messages needs to be processed by an evaluating vehicle.
Accuracy: As observed from Figure 5b, the proposed technique offers stable accuracy in the highway scenario as well with increases in the proportion of malicious nodes. Though the accuracy slightly degrades with more than 30% malicious nodes in the network, the accuracy remains marginally better than in the urban scenario. For all proportions of malicious nodes, our technique offers higher accuracy than the RSUOIDM and DCFID techniques by correctly classifying honest and malicious nodes.
Precision: The precision of our proposed technique is slightly better in the highway scenario compared to the urban scenario when the proportion of malicious nodes increases beyond 30%, which can be observed in Figure 5c. The RSUOIDM and DCFID techniques also show better precision in the highway scenario due to the higher deviation between actual speed and false speed values transmitted by vehicles, enabling the techniques to detect false speed values. Overall, the proposed technique offers higher precision in all cases of simulation compared to the RSUOIDM and DCFID techniques.
Recall: Our proposed technique also offers better recall values than the RSUOIDM and DCFID techniques in all proportions of malicious nodes in the highway scenario, which can be observed from Figure 5d. This indicates that our technique correctly detects almost all the malicious nodes in the network, with the binning method accurately separating the deviating speed values.
F1 score: The F1 score of our technique remains higher than the RSUOIDM and DCFID techniques due to the higher precision and recall values, as seen in Figure 5e. This again suggests that in the highway scenario our technique can successfully detect the malicious nodes in the network without incorrectly classifying the honest nodes as being malicious.
FPR: The FPR of the proposed techniques remains lower compared to the RSUOIDM and DCFID techniques for all proportions of malicious nodes. On average, our technique offers a 16% lower FPR than the RSUOIDM and DCFID techniques, which can be observed in Figure 5f. When the proportion of malicious nodes increases beyond 30%, our technique incorrectly classifies a few honest nodes to be malicious, resulting in a slightly higher FPR, whereas the FPR of the RSUOIDM and DCFID techniques increases steeply beyond 25% of malicious nodes.

5. Conclusions

In this paper, we studied the challenges in false information detection in VANETs, such as the reliance on roadside infrastructure and historical or artificial data of vehicles. We proposed a false information detection technique using a distance-based anomaly detection approach to address these challenges. The use of an unsupervised anomaly detection method enables our technique to detect false information without using any historical data of vehicles or any artificial data, as used by the approaches based on supervised learning. The proposed technique uses only the real-time characteristics of the network for detecting false information to offer high detection accuracy and low data processing delay. Simulations were carried out to evaluate the performance of the technique using the Veins framework based on the SUMO and OMNET++ simulators considering up to 40% of malicious nodes in the network. The evaluation results show that the proposed technique offers 30% lower data processing delay and a 17% lower FPR in false information detection compared to the RSUOIDM [] and DCFID [] approaches in scenarios with high proportions of malicious nodes.
In the future, we will extend our unsupervised anomaly detection approach to other security attacks in VANETs where existing works focus on supervised learning approaches or incur high processing delays in malicious node detection.

Author Contributions

Conceptualization, A.B. and A.P.; methodology, A.B.; validation, A.B. and A.P.; formal analysis, A.B.; investigation, A.P.; resources, A.P.; data curation, A.B.; writing—original draft preparation, A.B.; writing—review and editing, A.P.; supervision, A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data supporting the results reported in this article are openly available on our Kaggle repository at https://www.kaggle.com/datasets/abinashborah/vanet-false-information-simulation-data/data (accessed on 23 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Moni, S.S.; Manivannan, D.A. Scalable and Distributed Architecture for Secure and Privacy-preserving Authentication and Message Dissemination in VANETs. Internet Things 2021, 13, 100350. [Google Scholar] [CrossRef]
  2. Aman, M.N.; Javaid, U.; Sikdar, B.A. Privacy-preserving and Scalable Authentication Protocol for the Internet of Vehicles. IEEE Internet Things J. 2021, 8, 1123–1139. [Google Scholar] [CrossRef]
  3. Bayat, M.; Pournaghi, M.; Rahimi, M.; Barmshoory, M. NERA: A New and Efficient RSU Based Authentication Scheme for VANETs. Wirel. Netw. 2020, 26, 3083–3098. [Google Scholar] [CrossRef]
  4. Yu, Y.; Zeng, X.; Xue, X.; Ma, J. LSTM-based Intrusion Detection System for VANETs: A Time Series Classification Approach to False Message Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23906–23918. [Google Scholar] [CrossRef]
  5. Hasrouny, H.; Samhat, A.E.; Bassil, C.; Laouiti, A. VANET Security Challenges and Solutions: A Survey. Veh. Commun. 2017, 7, 7–20. [Google Scholar] [CrossRef]
  6. Malhi, A.K.; Batra, S.; Pannu, H.S. Security of Vehicular Ad-Hoc Networks: A Comprehensive Survey. Comput. Secur. 2020, 89, 101664. [Google Scholar] [CrossRef]
  7. Haydari, A.; Yilmaz, Y. RSU-Based Online Intrusion Detection and Mitigation for VANET. Sensors 2022, 22, 7612. [Google Scholar] [CrossRef] [PubMed]
  8. Cheong, C.; Li, S.; Cao, Y.; Zhang, X.; Liu, D. False Message Detection in Internet of Vehicle through Machine Learning and Vehicle Consensus. Inf. Process. Manag. 2024, 61, 103827. [Google Scholar]
  9. Alzahrani, M.; Idris, M.Y.; Ghaleb, F.A.; Budiarto, R. An Improved Robust Misbehavior Detection Scheme for Vehicular Ad Hoc Network. IEEE Access 2022, 10, 111241–111253. [Google Scholar] [CrossRef]
  10. Paranjothi, A.; Atiquzzaman, M. A Statistical Approach for Enhancing Security in VANETs with Efficient Rogue Node Detection Using Fog Computing. Digit. Commun. Netw. 2022, 8, 814–824. [Google Scholar] [CrossRef]
  11. Hua, J.; Zhang, B.; Wang, J.; Shao, X.; Zhu, J. Rogue Node Detection Based on a Fog Network Utilizing Parked Vehicles. Appl. Sci. 2023, 13, 695. [Google Scholar] [CrossRef]
  12. Rehman, A.; Hassan, M.F.; Hooi, Y.K.; Qureshi, M.A.; Shukla, S.; Susanto, E.; Abdel-Aty, A.H. CTMF: Context-aware Trust Management Framework for Internet of Vehicles. IEEE Access 2022, 10, 73685–73701. [Google Scholar] [CrossRef]
  13. Roy, A.; Madria, S.K. BLAME: A Blockchain-assisted Misbehavior Detection and Event Validation in VANETs. In Proceedings of the 22nd IEEE International Conference on Mobile Data Management, Toronto, ON, Canada, 15–18 June 2021; pp. 69–78. [Google Scholar]
  14. Ahmed, W.; Di, W.; Mukathe, D. A Blockchain-enabled Incentive Trust Management with Threshold Ring Signature Scheme for Traffic Event Validation in VANETs. Sensors 2022, 22, 6715. [Google Scholar] [CrossRef] [PubMed]
  15. Ahmed, W.; Di, W.; Mukathe, D. Privacy-preserving Blockchain-based Authentication and Trust Management in VANETs. IET Netw. 2022, 11, 89–111. [Google Scholar] [CrossRef]
  16. Fernandes, C.P.; Montez, C.; Adriano, D.D.; Boukerche, A.; Wangham, M.S. A Blockchain-based Reputation System for Trusted VANET Nodes. Ad Hoc Netw. 2023, 140, 103071. [Google Scholar] [CrossRef]
  17. Hou, B.; Xin, Y.; Zhu, H.; Yang, Y.; Yang, J. VANET Secure Reputation Evaluation & Management Model Based on Double Layer Blockchain. Appl. Sci. 2023, 13, 5733. [Google Scholar] [CrossRef]
  18. Masood, S.; Saeed, Y.; Ali, A.; Jamil, H.; Samee, N.A.; Alamro, H.; Muthanna, M.S.A.; Khakimov, A. Detecting and Preventing False Nodes and Messages in Vehicular Ad-hoc Networking (VANET). IEEE Access 2023, 11, 93920–93934. [Google Scholar] [CrossRef]
  19. Knorr, E.M.; Ng, R.T.; Tucakov, V. Distance-based Outliers: Algorithms and Applications. VLDB J. 2000, 8, 237–253. [Google Scholar] [CrossRef]
  20. Sommer, C.; German, R.; Dressler, F. Bidirectionally Coupled Network and Road Traffic Simulation for Improved IVC Analysis. IEEE Trans. Mob. Comput. 2011, 10, 3–15. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.