1. Introduction
Vehicular networks are designed based on the wireless communication capabilities of vehicles, infrastructures, and other entities on the road, such as pedestrians, integrating them into a network. Wireless communication between vehicles and other entities that may affect or may be affected by the vehicles is termed vehicle-to-everything (V2X) communication. V2X communication has contributed to various advancements, including autonomous driving and intelligent transportation systems. V2X communication offers safety and comfort to the vehicle as well as to road users. In addition, vehicular networks facilitate critical emergency communication. In vehicular networks, Road-Side Units (RSUs) deployed at selected points along the roads constitute the roadside infrastructure. The vehicles use their On-Board Unit (OBU) devices to communicate amongst themselves as well as with the infrastructure [
1,
2]. The vehicles in a network periodically transmit beacon messages or basic safety messages with their live data, such as speed of travel, position, acceleration, direction, etc. They can further send messages concerning on-road incidents or emergency situations, like accidents or road congestion [
3,
4]. Nonetheless, vehicular networks are susceptible to security threats from malicious vehicles that spread false information to frame fake events for their own advantage. Securing vehicular networks from such malicious activities is crucial due to the risks posed to public safety. However, security assurance is challenging due to the fundamental characteristics of vehicular networks. These characteristics include wireless communication, fast mobility, sporadic connectivity, dynamic topology, and complexities in assessing trust and validating shared information [
5,
6]. Regardless of the challenges, mitigating false information is indispensable for preventing dangerous consequences stemming from reactive actions taken by vehicles responding to the false information received [
7].
The prevailing methods for identifying false information in vehicular networks employ different techniques, including machine learning, trust scores, blockchain, statistical methods, etc. However, these methods have various limitations, which are discussed in
Section 2. Among the existing methods, RSU-based Online Intrusion Detection and Mitigation [
8], referred to as RSUOIDM hereafter, leverages historical data transmitted by the vehicles within the vicinity of RSUs and constructs models for anomaly detection. These models are used by the RSUs to analyze newly received information from vehicles by calculating an anomaly score based on the model. However, this method incurs initialization delays due to its reliance on historical data. Additionally, if traffic patterns in the RSUs’ communication range change between the time of historical data collection and evaluation, the detection accuracy of the technique may degrade. Another approach, using data clustering for false information detection [
9], referred to as DCFID hereon, utilizes unsupervised machine learning with data clustering to detect false information. In this approach, vehicles are clustered into benign and malicious groups based on similarities in their beacon message data. While the accuracy of DCFID is high under the assumption that all malicious nodes broadcast identical false beacon message data, the malicious nodes can disseminate varied values of false data. A recent study [
10] proposes an intelligent trust management strategy (ITMS) for mitigating false messages in vehicular networks. In ITMS, a vehicle evaluates messages from other vehicles using a comprehensive trust evaluation based on a direct trust calculated on its own and indirect trust received from other vehicles. Though this approach offers accurate detection at lower proportions of malicious nodes in the network, the accuracy degrades, and a high number of false alarms are generated at higher proportions of malicious nodes. Moreover, vehicles need to depend on each other in the trust evaluation process.
We introduce a false information detection technique for vehicular networks, addressing the limitations of existing methods. The proposed technique leverages an unsupervised pattern-based anomaly detection approach, and thus we call our technique False Information Mitigation using Pattern-based Anomaly Detection (FIM-PAD). FIM-PAD uses only real-time data transmitted in the network, and it can be independently used by any vehicle to distinguish false information transmitted in its vicinity. To minimize data processing delays, we use a data binning method in the pattern-based anomaly detection, which avoids repeated scanning of information received from other vehicles. FIM-PAD’s
novelty lies in its ability to identify usual patterns between the direction of travel and speed of vehicles and detect false information solely using the real-time network properties, eliminating the need for historical data and cooperation between vehicles. The
motivations for FIM-PAD are to meet high accuracy within minimal delay at high proportions of malicious nodes. We evaluate the performance of FIM-PAD with up to 35% of malicious nodes in simulation scenarios. The results demonstrate that FIM-PAD is capable of fast and accurate false information discovery. On average, it achieves a 38% lower processing delay and at least 19% less false positive rate in comparison to RSUOIDM [
8], DCFID [
9], and ITMS [
10].
The following are the contributions of the paper:
We propose a technique for detecting false information in vehicular networks, FIM-PAD, using an unsupervised pattern-based anomaly detection method. FIM-PAD learns the usual patterns between the direction of travel and speed of vehicles, and thus identifies malicious vehicles not conforming to the learned patterns. FIM-PAD analyzes vehicles traveling in different directions separately, considering the variations in speeds of vehicles traveling in different directions.
Vehicles can independently detect false information with FIM-PAD using only real-time network information without depending on historical data, infrastructures, or other vehicles.
We reduce processing delays with a data binning method for finding patterns in the beacon message data and use the binned data for anomaly detection, thereby eliminating the need for repeated scans of data.
We carry out comprehensive simulations to compare the performance of FIM-PAD with other existing approaches in urban and highway scenarios with varying proportions of malicious nodes.
The remainder of the paper is structured as follows:
Section 2 presents the current related work on false information discovery in vehicular networks;
Section 3 describes the specifics of FIM-PAD;
Section 4 provides the results of performance assessment; and
Section 5 suggests future research plans and concludes the paper.
2. Related Work
In this section, we present an overview of recent studies on false information discovery in vehicular networks.
The approach introduced in [
7] creates a time series of traffic parameters and uses a Long Short-Term Memory (LSTM) neural network to distinguish legitimate and fake events. Though the technique offers high accuracy, it trains the neural network with historical traffic information that may not be appropriate for all scenarios. For example, a network trained on data from urban areas may not work well for evaluating vehicles traveling on highways. A further machine learning-based method in [
11] combines features of vehicles obtained from signal properties like received signal strength and signal direction and uses a Kalman filter algorithm to extract contextual patterns for each vehicle. An artificial neural network to detect false messages is trained with the innovation errors of the filter, which are the differences between predicted and observed values. The method shows high accuracy; however, it also relies on historical data for training. The ensemble learning approach for falsification detection presented in [
12] proposes an ensemble-based random forest classifier that uses randomized search optimization for parameter selection. The experimental evaluations show a high detection accuracy of this technique without evaluating the processing delay. Moreover, this approach requires computationally expensive training of the ensemble-based classifier, and therefore it is not suitable for real-time detection of false information.
The false information detection technique proposed in [
13] uses the OBUs on vehicles to create a fog layer that is controlled by a central node, known as the guard node. The centralized node uses a statistical method to evaluate the speeds of vehicles reported in beacon messages. If the stated speed of a node substantially deviates from the others in the vicinity, the centralized node marks the node as malicious. Although the technique offers modest latency with high accuracy, its dependence on a central node makes it susceptible to a single point of failure. A further fog computing-based statistical method was proposed in [
14], where a fog layer is dynamically formed with vehicles parked beside the road. All fog nodes accumulate data from the beacon messages of neighboring vehicles and calculate the average speed to apply a statistical test in parallel to detect malicious vehicles. Although this method performs well in small-sized networks, its accuracy declines with an increased number of vehicles. Furthermore, the authors of [
15] proposed a trust management approach that uses contextual knowledge obtained from messages transmitted by vehicles. This approach applies a statistical technique to anomaly detection for false information detection. Though this approach achieves high accuracy in networks with a small number of nodes, the scalability of the method is limited due to its high computational costs in larger networks.
Some studies use blockchain-based approaches for false detection in vehicular networks. One such approach, presented in [
16], authenticates traffic events to detect malicious vehicles by leveraging neighbor evidence and incident reports submitted by each vehicle to RSUs. A blockchain network operates between the RSUs in this approach, with data of vehicles added as blocks by a mining RSU after attaining an agreement with other RSUs. Similarly, the authors of [
17] introduced a trust management paradigm using blockchain that incorporates a threshold ring signature method, allowing vehicles to verify message authenticity and reliability anonymously while preserving privacy. This method enables RSUs to block false information and confirm the credibility of the messages. The study in [
18] proposed an alternative blockchain-based trust management system that evaluates the reliability of vehicles and their transmitted data to discover false information. When some incidents are reported by vehicles to nearby RSUs, the trust model is used to validate those, and RSUs collectively update and store vehicle trust values on the blockchain. A reputation system based on blockchain was presented in [
19], where vehicles validate event reports from other vehicles to determine their reputations, which are then stored on a blockchain maintained by RSUs. Another study in [
20] presented a reputation assessment and management framework using two parallel blockchains, one reputation chain maintained by vehicles and the other, an event chain, maintained by RSUs. This approach calculates the trust scores of vehicles using both direct and indirect trust based on vehicles’ historical reputations and uses the scores to verify the information shared by the vehicles. Even though the blockchain-based methods in [
16,
17,
18,
19,
20] exhibit high accuracy in false information detection, they are not scalable to large networks due to their high computational overhead.
The false message detection technique outlined in [
21] assesses node profiles using a reward–penalty scheme. Vehicles earn rewards for transmitting genuine information and incur penalties for sending false messages. If a message sender’s reward-to-penalty ratio drops under a specified limit, the evaluation procedure is initiated. Any message is agreed upon only if the sender’s reward-to-penalty ratio is above the limit. However, this approach tends to incorrectly classify a high number of genuine messages as false. The technique we propose in this paper, FIM-PAD, addresses the disadvantages of the prevailing methods [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. FIM-PAD uses unsupervised pattern-based anomaly detection in real-time data transmitted within the network to discover false information without depending on any prior information about vehicles or historical traffic data.
3. The Proposed Technique: FIM-PAD
In this section, we present the details of the proposed FIM-PAD technique. We begin by explaining pattern-based anomaly detection and the attack model considered in this study.
Pattern-based anomaly detection identifies data points that deviate from the usual patterns within a dataset [
22,
23]. Instead of focusing on individual data values, this technique finds anomalies based on their failure to align with the typical patterns observed in the majority of the data in a dataset. Pattern-based anomaly detection fundamentally considers data points showing substantially different features than the normal patterns in the data to be anomalous. In unsupervised pattern-based anomaly detection, the frequent or usual patterns in a dataset are first discovered without using any labels for normal or anomalous data. Then, the data points that do not conform to the learned regular patterns are identified to be anomalies [
24,
25].
Attack model: In the attack model considered in this study, malicious vehicles transmit a false speed in the beacon messages, which is lower than its actual speed to generate the impression of on-road congestion or an emergency incident [
13]. The vehicles evaluate the beacon message data transmitted by the other vehicles in their vicinity to discover false information not conforming to the usual patterns in the messages. FIM-PAD works under the following assumptions: It is assumed that the majority of vehicles in the network are honest, as assumed by some other works [
7,
13]. We also make a realistic assumption that the speed of vehicles traveling in opposite directions in an area can be different due to varying traffic densities and on-road events or activities such as road work, lane closures, etc.
3.1. Overview of FIM-PAD
In FIM-PAD, we use an unsupervised learning approach to first find patterns in the data transmitted by the vehicles in the beacon messages. We then perform a collective evaluation of the vehicles to identify malicious vehicles that do not conform to the identified patterns. Each vehicle maintains a node list to store the vehicle , the direction of travel, and the speed value broadcast by all other vehicles in its transmission range. Hence, each tuple in is of the form , where , , and denote the , direction of travel, and speed in the beacon message of a vehicle. An evaluator vehicle finds the association between the direction of travel and speed values transmitted by other vehicles in its vicinity. From these associations, it identifies the patterns exhibited by the majority of the vehicles to classify the vehicles not conforming to the detected patterns to be malicious.
The vehicles in a region traveling in the same direction move with similar speeds under the same traffic situation, and their movements are shaped by the movement of other vehicles in proximity. As such, if a vehicle transmits a substantially dissimilar speed value compared to the other vehicles traveling in the same direction in the same area, the vehicle is considered to be spreading false information. Therefore, such a vehicle is classified as malicious. Vehicles use pattern-based anomaly detection to spot the deviating speed values by evaluating the real-time beacon messages arriving from other vehicles. The overall approach of FIM-PAD is illustrated in
Figure 1. To find the usual patterns between the direction of travel and the speed of vehicles, we use approximations with a data binning method to minimize processing delay. The binned data is used in the subsequent pattern mining and detection phase to find normal patterns and identify malicious vehicles. We explain the data binning phase and pattern mining and detection phase in
Section 3.2 and
Section 3.3, respectively.
3.2. Data Binning Phase
To find the patterns or relationships between the direction of travel and the speed of vehicles, the speed values of vehicles are discretized into bins. The binning also benefits the detection stage as groups of vehicles are collectively evaluated with the binned information instead of evaluating the vehicles one at a time. An evaluator vehicle creates two sets of bins: one set of bins for the vehicles traveling in the same direction as itself and the other set for the vehicles traveling in the opposite direction. For each bin in both sets, a node count and a list of nodes in that bin are maintained. When a vehicle reads each tuple of the form
in its node list
, the bins are created dynamically based on the speeds of the vehicles. For each vehicle in the node list, the bin index
is calculated as follows:
where the bin width parameter
is computed as follows. To approximate the range of similar speed values
in the beacon messages of the majority of vehicles traveling in the same direction, a set of successive bins is considered. If the number of bins considered in the set for this approximation is
,
is calculated based on
and
using Equation (2), i.e.,
bins of equal width
constitute the overall range
.
A small number is considered for
to reduce the processing time of the pattern mining and detection phase. After computing
for a vehicle, the corresponding bin for the direction of that vehicle’s travel is created if it does not previously exist, and the node count for that bin is set as one. If the corresponding bin is previously created, its node count is incremented by one. In both cases, the
of the node is inserted into the list of node
s for that bin. The sets of bins created for either direction are used in the pattern mining and detection phase to identify malicious vehicles without having to scan the node list once more. As the speeds of vehicles in an area are close to each other, only a small number of bins are created. As a result, the collective processing of nodes as bins reduces the processing time of the pattern mining and detection phase, as discussed in
Section 3.3.
3.3. Pattern Mining and Detection Phase
In the pattern mining and detection phase of FIM-PAD, the bins created in the binning phase are used to find the usual patterns. These patterns are of the form
, where
and
respectively denote the direction of travel and a range of similar speeds as defined in
Section 3.2. Hence, for an evaluator vehicle,
can necessarily have two values: traveling in the same direction as itself and traveling in the opposite direction.
is obtained for each direction from the bins created for the respective direction in the binning phase by finding a set of consecutive bins that cumulatively gives the maximum node count. We call this set of bins frequent bins and denote it by
as they represent the common pattern between the direction of travel and speed for the majority of the vehicles moving in that direction.
To find
for each direction, we use a sliding window-based evaluation to obtain the rolling maximum cumulative node count,
, starting with the first bin and considering
bins in each window. The cumulative node count for a window is the sum of the node counts for the bins in the window. Since the bins are constructed as required based on speed values in the beacon messages received from other vehicles, not all consecutive bins may exist. Hence, we need to consider only bins that exist. For the same reason, we need to find the minimum and maximum bin ids
and
and find the rolling maximum cumulative node count starting from
and continuing up to
. An example of finding
and hence
is shown in
Figure 2 with
= 3 and assuming that all the consecutive bins exist. For simplicity, we show only the node count for each bin in the figure. The overall procedure of finding frequent bins is summarized in Algorithm 1.
Algorithm 1: Frequent Bins Finding Algorithm |
Input: List of bins , Number of bins Output: Set of consecutive bins with maximum cumulative node count 1: 2: minimum bin id in 3: maximum bin id in 4: for to 5: for bin 6: 7: for to 8: if bin id exists in then 9: 10: = + for bin 11: else 12: Continue 13: end if 14: end for 15: if then 16: 17: 18: end if 19: end for 20: Return 21: end |
Once the set of frequent bins
for each direction is obtained, the bins in this set are considered to constitute
for the corresponding direction. The nodes belonging to these bins constitute the usual pattern of the form
for direction
, and hence these nodes are classified as honest. The nodes belonging to the other bins for direction
, unless these bins are adjacent to the bins in
, do not conform to this usual pattern. Hence, these nodes are classified as malicious and are inserted into the malicious node set
, which FIM-PAD outputs. The two adjacent bins, one on either side, are excluded as the speed values of the nodes in these bins are very close to the speed values of the honest nodes. This helps in reducing false alarms, and malicious nodes are not benefited either, which transmit considerably deviating speed values in the beacon messages than the honest nodes. The entire evaluation process of FIM-PAD is outlined in Algorithm 2.
Algorithm 2: FIM-PAD Algorithm |
Input: Node list , Range parameter , Number of bins Output: Set of malicious nodes 1: Calculate using Equation (3) 2: for each tuple in //start of binning phase 3: Calculate using Equation (1) 4: if bin id for direction exists then 5: Increment for bin id 6: else 7: Create bin id for direction 8: Set for bin id to 1 9: end if 10: Add to for bin id for direction 11: end for //end of binning phase 12: for each direction //pattern mining and detection phase starts 13: Find using Algorithm 1 14: for each bin not in and not adjacent to 15: Add all s in of to 16: end for 17: end for //pattern mining and detection phase ends 18: Output 19: end |
3.4. Time Complexity Analysis of FIM-PAD
We analyze the time complexity of FIM-PAD in this subsection. For the data binning phase, the time complexity is
; here,
is the number of vehicles in
. For the pattern mining and detection phase, the time complexity for finding
with Algorithm 1 is
and the time complexity of finding the malicious nodes after finding
is
. Since the number of bins is insignificant in comparison to the number of vehicles in
, i.e.,
, the time complexity of FIM-PAD can be expressed as
. This linear time complexity of FIM-PAD results in low data processing delays in false information detection, which is observed in the experimental evaluations in
Section 4.
5. Conclusions
In this study, we examined the challenges associated with enhancing the security of vehicular networks, focusing on the detection of false information transmitted by the malicious nodes within these networks. These challenges include dependence on roadside infrastructures, past information about vehicles, or historical traffic data to train machine learning models. We presented FIM-PAD (False Information Mitigation using Pattern-based Anomaly Detection), a technique for detecting false information, leveraging an unsupervised pattern-based anomaly detection approach. The novelty of FIM-PAD comes from the use of an unsupervised learning approach to discover common patterns between the direction and speed of vehicles to detect the malicious vehicles that do not agree with the common patterns identified. FIM-PAD considers the speed variations of vehicles traveling in different directions by analyzing vehicles based on their directions separately. Using FIM-PAD, a vehicle can individually detect false information transmitted in its region using only real-time network data without depending on past data about vehicles, traffic data in a region, or other network entities. Adopting an unsupervised anomaly detection method allows FIM-PAD to discover false information without relying on past traffic data or historical information about vehicles. Simulation studies with up to 35% of malicious vehicles in the scenarios demonstrate that FIM-PAD, on average, offers a 38% reduced data processing delay and at least 19% less FPR in comparison to three prevailing techniques, RSUOIDM [
8], DCFID [
9], and ITMS [
10].
Future extension of this work is envisioned as enhancing this unsupervised anomaly detection approach to secure vehicular networks from other security attacks. In particular, we intend to focus on security attacks, the solutions for which in the current literature rely on supervised machine learning methods or encounter increased data processing delays in detecting malicious nodes.