Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs

Arízaga-Silva, Juan Antonio; Santiago, Alejandro Medina; Espinosa-Tlaxcaltecatl, Mario; Muñiz-Montero, Carlos

doi:10.3390/wevj16090526

Open AccessArticle

Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs

by

Juan Antonio Arízaga-Silva

^1,*

,

Alejandro Medina Santiago

^2,*

,

Mario Espinosa-Tlaxcaltecatl

¹ and

Carlos Muñiz-Montero

¹

Ingeniería en Sistemas Automotrices Tercer Carril del Ejido Serrano, S/N, Universidad Politécnica de Puebla, San Mateo Cuanalá, Juan C. Bonilla, Puebla 72640, Mexico

²

Department of Computer Science, National Institute for Astrophysics, Optics and Electronics (INAOE), SECIHTI, Santa María Tonanzintla, San Ándres Cholula, Puebla 72840, Mexico

^*

Authors to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(9), 526; https://doi.org/10.3390/wevj16090526

Submission received: 29 July 2025 / Revised: 26 August 2025 / Accepted: 5 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue Recent Developments and Research in Vehicular Ad Hoc Networks (VANETs))

Download

Browse Figures

Versions Notes

Abstract

Vehicular Ad Hoc Networks (VANETs) enable critical communication for Intelligent Transportation Systems (ITS) but are vulnerable to cybersecurity threats, such as Gray Hole attacks, where malicious nodes selectively drop packets, compromising network integrity. Traditional detection methods struggle with the intermittent nature of these attacks, necessitating advanced solutions. This study proposes a machine learning-based Intrusion Detection System (IDS) to detect Gray Hole attacks in VANETs. Methods: This study proposes a machine learning-based Intrusion Detection System (IDS) to detect Gray Hole attacks in VANETs. Features were extracted from network traffic simulations on NS-3 and categorized into time-, packet-, and protocol-based attributes, where NS-3 is defined as a discrete event network simulator widely used in communication protocol research. Multiple classifiers, including Random Forest, Support Vector Machine (SVM), Logistic Regression, and Naive Bayes, were evaluated using precision, recall, and F1-score metrics. The Random Forest classifier outperformed others, achieving an F1-score of 0.9927 with 15 estimators and a depth of 15. In contrast, SVM variants exhibited limitations due to overfitting, with precision and recall below 0.76. Feature analysis highlighted transmission rate and packet/byte counts as the most influential for detection. The Random Forest-based IDS effectively identifies Gray Hole attacks, offering high accuracy and robustness. This approach addresses a critical gap in VANET security, enhancing resilience against sophisticated threats. Future work could explore hybrid models or real-world deployment to further validate the system’s efficacy.

Keywords:

VANET; Gray Hole attacks; artificial intelligence; Intrusion Detection System; AODV protocol; machine learning; Random Forest; Cybersecurity

1. Introduction

Over the last decade, vehicles have evolved from simple means of transportation to mobile data centers capable of connecting to each other, with road infrastructure and external entities to support Intelligent Transportation Systems (ITS). This transformation, driven by Vehicular Ad Hoc Networks (VANETs), enables critical communications such as V2V, V2I, and V2X. However, these networks are vulnerable to cyberattacks such as spoofing, GPS spoofing, and denial of service (DoS) attacks, which compromise their functionality and security [1]. Among these, Gray Hole attacks—where malicious nodes selectively drop packets—stand out for their intermittent and unpredictable behavior, making them difficult to detect and mitigate.

Gray Hole attacks represent a significant threat to VANETs due to their intermittent nature, where malicious nodes selectively drop packets, disrupting network integrity. Unlike Black Hole Attacks, which are more straightforward to detect, Gray Hole attacks pose a unique challenge due to their dynamic behavior, making traditional rule-based IDSs ineffective [2,3].

Previous studies have explored machine learning for intrusion detection in VANETs, but most focus on known attacks like Black Hole or DoS [4,5]. These approaches often lack adaptability to Gray Hole attacks, as they rely on static features or fail to capture the nuanced behavior of intermittent packet dropping. Our work addresses this gap by proposing a tailored feature extraction methodology and evaluating advanced ML classifiers.

Previous research has addressed similar attacks, such as Black Hole, using rule-based or trust-based intrusion detection systems (IDS) [2,3,6]. However, these solutions have limitations in accuracy and adaptability, especially given the dynamic nature of Gray Hole attacks. Although machine learning has proven effective in IDSs for VANETs [4,5], most work focuses on known attacks, leaving a gap in the detection of more subtle and variable threats.

This study proposes an innovative IDS that combines machine learning techniques with an advanced feature categorization approach (time, packet, and protocol) to detect Gray Hole attacks. Using NS-3 simulations, 20 network traffic features were extracted, evaluating algorithms such as Random Forest, SVM, and Logistic Regression. The results show that Random Forest achieves an F1 score of 0.9927, significantly outperforming other classifiers.

The key contributions of this work are:

Comprehensive methodology: We propose an IDS that integrates 20 features categorized into time-based (e.g., transmission rate), packet-based (e.g., byte counts), and protocol-dependent (e.g., hop count) attributes, rigorously evaluated using 10-fold cross-validation.
High performance: Our Random Forest classifier achieves 99.27% accuracy (F1-score: 0.9927) with 15 estimators and a depth of 15, outperforming SVM (F1-score: 0.74) anLogistic Regression (F1-score: 0.72).
We mathematically demonstrate the limitations of SVM (e.g., overfitting due to imbalanced data) and justify the superiority of Random Forest through feature importance analysis.

This paper is organized as follows: Section 2 reviews research related to machine learning-based IDSs for VANETs. Section 3 details the problem of Gray Hole attacks and their impact on routing protocols. Section 4 describes the experimental methodology, including simulation setup and feature extraction. Section 5 presents the results and comparative analysis of the evaluated algorithms. Finally, Section 6 discusses conclusions, practical implications, and future research directions. This systematic approach seeks to offer a robust solution for improving security in vehicular networks.

2. Related Work

The field of Intrusion Detection Systems (IDS) for VANETs has evolved significantly, leveraging machine learning (ML) to address diverse security threats. Existing approaches can be categorized into three main paradigms: signature-based, anomaly-based, and hybrid IDSs, each with distinct strengths and limitations in detecting attacks like Gray Hole.

2.1. Signature vs. Anomaly-Based IDSs

Traditional signature-based IDSs rely on predefined rules to identify known attack patterns, such as Black Hole attacks [2,3,7]. While effective for static threats, these systems fail to detect Gray Hole attacks due to their intermittent behavior. In contrast, anomaly-based IDSs use ML to model normal network behavior and flag deviations. For instance, Rashid et al. [4] employed Random Forest to detect malicious nodes with 75 features, achieving high accuracy but lacking interpretability for Gray Hole-specific traits.

2.2. Machine Learning Approaches in VANET Security

Recent works have explored supervised and unsupervised ML techniques, including hybrid approaches that integrate plausibility checks [8]:

Supervised Learning: SVM and ensemble methods dominate this space. Alsathan et al. [5] optimized SVM for VANET intrusion detection but faced overfitting with imbalanced data—a challenge also observed in our work. Kadam and Sekhar [9] proposed a hybrid KSVM model for DDoS detection, though its computational complexity limits scalability.
Deep Learning: Zeng et al. [10,11] combined CNN and LSTM for feature extraction and classification, but their model required extensive training data and lacked real-time applicability.
Trust-Based Models: Ghaleb et al. [6] introduced a Bayesian reputation system to identify malicious nodes, but such methods depend heavily on cooperative node feedback, which Gray Hole attackers can manipulate.

2.3. Gaps and Novelty of This Work

Despite advancements, critical gaps remain:

Gray Hole-Specific Detection: Most IDSs focus on Black Hole attacks [2,3] or general misbehavior [4], overlooking Gray Hole’s intermittent packet-dropping patterns.
Feature Engineering: Prior works use generic features (e.g., PDR, E2E delay [3]) without tailoring them to Gray Hole behavior. Our study introduces time-based, packet-based, and protocol-dependent features to capture attack nuances.
Algorithmic Robustness: While SVM and deep learning show promise [5,10], their performance degrades with imbalanced data (74% normal vs. 26% malicious in our dataset). Our Random Forest classifier addresses this via ensemble learning and feature importance analysis.

While deep learning (DL) has shown promise in intrusion detection (e.g., CNNs [10]), its high computational cost and data requirements make it unsuitable for real-time VANET applications. In contrast, our work focuses on lightweight ML algorithms (e.g., Random Forest) that balance accuracy and efficiency.

2.4. Side Channel Attacks

Side-channel attacks exploit physical information leaks (such as power consumption, electromagnetic radiation, or runtimes) to compromise systems without disrupting their direct operation. In VANETs, these attacks could extract cryptographic keys or sensitive data from vehicular devices or RSUs (Roadside Units), taking advantage of insecure implementations. Unlike network attacks such as Gray Hole (which manipulate data forwarding), side-channel attacks target the physical or hardware layer, requiring specific mitigation techniques (shielding, algorithmic countermeasures) that complement machine learning-based intrusion detection systems (IDS) to fully protect the network.

2.5. Physical Attacks

Physical attacks involve direct access and manipulation of network device hardware (OBUs in vehicles, RSUs) to compromise their security. Attackers seek to extract sensitive data, cryptographic keys, or implant malware using techniques such as chip desoldering, circuit microprobing, voltage alteration (fault injection), or connecting listening devices. In VANETs, these attacks are a serious threat due to the physical accessibility of nodes (e.g., in workshops or parking lots) and can facilitate subsequent attacks such as Gray Hole. A machine learning-based IDS can help detect anomalous behavior resulting from these physical compromises through network forensics.

3. Methodology

VANETs are highly susceptible to attacks that exploit vulnerabilities in routing protocols, such as Gray Hole and Black Hole attacks. These attacks manipulate control messages (RREQ/RREP) to disrupt communications, posing a critical risk to vehicular safety. Unlike Black Hole attacks, Gray Hole attacks exhibit intermittent behavior, making them difficult to detect. This section analyzes the mechanism of these attacks on protocols such as AODV, where malicious nodes falsify routing metrics (sequence number and hop count) to selectively reroute and drop packets, compromising network availability.

3.1. VANET Introduction

One of the prominent VANET standards is DSRC (Dedicated Short-Range Communications), called IEEE 802.11p. The Media Access Control (MAC) and Physical Layer (PHY) specifications of the DSRC standard combine with the IEEE 1609.1-4 standard to form the accepted standard called Wireless Access in Vehicular Environments (WAVE) [12].

In VANETs, the nodes behave as routers and participate in the discovery and maintenance of routes to disseminate messages throughout the network. This behavior allows the creation of a network from scratch without the participation or intervention of any user or control entity. All this can be performed through routing protocols.

Routing protocols in ad hoc networks have the important task of discovering and maintaining routes between nodes. They can be classified according to various criteria into three different types depending on the scope—unicast, broadcast, multicast, geocast, etc.; the route discovery mode—proactive, reactive, hybrid; and the type of algorithm implemented—distance vector, link state, etc.

3.2. Introduction to Cybersecurity of VANET

Cybersecurity on connected cars has been a hot topic for a while. The most valuable example has been the one of Miller and Valasek hacking a 2015 Jeep [13] through the Uconnect wireless system from Chrysler. This exercise showed to all the car industry that data exchange between the vehicles and infrastructure could be threatening it in several ways.

One of the most promising applications is data dissemination to vehicles for safety and commercial services. This is achieved when vehicles (mobile nodes) within a specific area request shared information, such as weather, traffic conditions, or local business announcements, through a Roadside Unit (RSU), which functions as a static node. An even more critical role for the RSU is to notify drivers or intelligent driving systems about real-time traffic conditions and issue collision alerts, enabling timely responses to prevent accidents [14].

As stated before, in VANET, the information exchange is primarily governed by routing protocols, responsible for creating the network topology and connecting source nodes to destination nodes. Due to their nature, VANETs are vulnerable to impersonation, false information attacks, GPS spoofing, and Distributed Denial of Service (DoS). All these attacks can lead to network disruption and result in financial losses, wasted time, and even loss of lives.

In accordance with the literature [14], the role of a malicious node, or an attacker, is to gain access to different kinds of assets in VANET, with information being one of the most valuable. VANET cybersecurity efforts aim to preserve the confidentiality, integrity, and availability of information, as well as to safeguard users’ lives, enterprise assets (vehicles), and infrastructure.

Black hole and gray hole attacks are both found recurrently in ad hoc networks [15,16], Both attacks make use of vulnerabilities in inter-node communications, such as routing protocols. Ad hoc networks use different types of routing protocols, of which one class is known as table-driven protocols. Routing tables are created and maintained through the controlled dissemination of multicast and unicast messages. In the case of the AODV (Ad hoc On-Demand Distance Vector) protocol, two main control messages are used for route discovery; those are route request (RREQ) and route reply (RREP).

As shown in Figure 1, if a source node wants to communicate with a destination but does not know how to reach it, it starts searching for a route. To do this, it sends request messages (RREQ) to all its neighbors. The idea is simple: if a neighbor has the best path to the destination, it sends a reply (RREP). But if it does not, that neighbor forwards the RREQ to its own neighbors, as if asking around. This search process continues, and the RREQ moves through the network until it finally finds its way to the destination or reaches an intermediate point that already knows the route.

Once the RREQ reaches the destination node, this node unicasts through the better route an RREP message back to the source node; see Figure 2. Each node in the middle stores the information about this on its own routing table. Once the route is established, both nodes are able to interchange information. The RREP messages contain two metrics used by AODV to choose the best route between nodes; those are sequence numbers and hop count. The sequence numbers give information on the freshness of the control packets, and the hop count maintains the number of nodes between source and destination.

A gray hole attacker (or node) can disseminate a fake RREP message in response to a valid RREQ, as can be seen in Figure 3, Node 6 is using false information about sequence number and hop count. By this, an attacker node compromises the destination sequence number by setting this metric to a maximum possible value to elude the source into believing that a different node than the legitimate one should be the next hop. Hop count is set as low as 2 side by side with the destination sequence number to the source to mark itself as the next hop [6].

This way the attacker now redirects all packets to itself in order to drop them and block other nodes from communicating with the node under attack. This behavior can be targeted to a specific node or to all nodes. The Gray Hole attack is similar to the Black Hole attack, but the attacker’s behavior can change to normal based on the target nodes or over time. So a gray hole attack is more difficult to address and detect than a Black hole attack.

3.3. Gray Hole vs. Black Hole Attacks

In the field of cybersecurity, Black Hole and Gray Hole attacks are threats that compromise the availability and integrity of data on networks, especially in wireless or routing environments such as MANETs (Mobile Ad Hoc Networks). Although both are based on interrupting the flow of information, their operation and impact differ significantly.

A Black Hole attack is aggressive and obvious: the malicious node acts as a “black hole,” absorbing all traffic directed toward it and completely discarding it, without forwarding it to its destination. This causes a complete interruption of network communication. On the other hand, a Gray Hole attack is more subtle and selective: the attacker allows some traffic to flow normally but filters and discards specific packets (for example, those directed to certain nodes or containing sensitive data). This selectivity makes it more difficult to detect.

While a Black Hole is like an obvious short circuit, a Gray Hole operates in “intermittent” mode, which can confuse traditional security mechanisms. In this section, we will explore their technical characteristics, detection methods, and how to mitigate their impact on vulnerable networks.

The differences between both attacks are expressed in Table 1.

4. Experimental Setup

The experiments were conducted using the NS-3 simulator. NS-3 is a widely used network simulator [1,3], NS-3 provides multiple features, including communication and network protocol simulations, vehicular network communication simulations, control of client-server configurations, the communication system, messages, coding, modulation, transmission, reception, and physical channel modeling.

As seen in Figure 4, NS-3 is loaded with information about the experiment carried out. First, the experiment is designed based on a VANET model provided by NS-3; these c files define all the instances used to perform a simulation based on IEEE 802.11p. As in VANET, all nodes are routers, the AODV routing protocol is added to the simulation, and the one provided with NS-3 is then modified in order to allow the nodes to behave as gray hole attackers. This was performed by selecting a node at a random time and letting it act as stated in the previous section; see Figure 3. To ensure consistency in methodology, we clarify that our experimental approach relied on introducing a single, randomly selected malicious node behaving as a ‘Gray Hole’ attacker into the simulation. This setup allows us to evaluate the detection system’s ability to identify subtle and targeted behavior.

Before NS-3 runs any simulation, it is necessary to create a Python v3.13.7 or C v23 script in order to establish all the simulation parameters (simulation time, vehicle number, Gray Hole nodes, etc.). In the conducted experiments, the simulation time ranged from 100 to 1000 s, and the number of vehicles varied between 10 and 50. Malicious nodes were randomly altered in each experiment. After each simulation the information about network packets transmitted and received for each node was recorded and stored into PCAP format files. These files can be read and analyzed by several tools or programming languages like Wireshark, C, and Python. All the PCAP files were utilized to extract relevant features regarding node behavior. As depicted in Figure 4 (Feature Extraction Module), a Python script was developed for extracting network traffic features; in this instance, Python was chosen to extract 20 flow-based features using the Pyshark library.

In NS-3, each network device generated is a node. This abstraction is represented in C++ through a class called Node. This class contains all the methods for managing and controlling what happens with the nodes during simulations.

In NS-3, during a simulation, in addition to generating nodes that interact with each other, user-generated applications run on NS-3 nodes to perform simulations. These applications are modeled through a class called Application.

The NS-3 simulator, being open source and maintained by a large network of contributors, includes some of the most common routing protocols; of these, the AODV algorithm is the most widely used in VANET security research.

Modifying this routing protocol involved making changes to the simulator installation and not just creating a new simulation. Substantial changes were made to the aodv-routing-protocol.c file, improving on the changes proposed and implemented by Shalini Satre in [17].

The extracted data used in this work was divided into two classes: the normal behavior class, which is 74%, and the misbehavior class, which is only 26%. One of the novelties of this work is that the extracted features are divided into three sets because they are related to different aspects inherent to wireless communications; these are time-based features, packet-related features, and routing protocol features. These features are reviewed in the next section.

4.1. Flow and Feature Generation

A flow is defined in the conventional sense as a series of packets sharing the same five-tuple (source IP address and port, destination IP address and port, and protocol number) [18,19,20]. A Python script extracts the flow information and calculates the features associated with each flow at the same time; The list of extracted features and their categorization is shown in Table 2 and the complete list of the 20 extracted features, along with their detailed description, is provided in Appendix A.

4.1.1. Time-Based Features

In the literature [21,22], features extracted from communication flows focused on time, called time-based features, can be found. In the set used in this project, the time was a measure of the behavior of the flow, e.g., time between packets, or fixing the time and measuring other variables, e.g., bytes per second or packets per second.

4.1.2. Packet-Based Features

Packet-related features provide information about the number of packets transmitted in the flow, the number of packets transmitted forward (from node A to B), the number of packets backward (from node B to node A), and the period between frames (forward and backward), yielding two measures for each one, the mean and the standard deviation. As each packet has a different amount of information, the payload bytes from each packet are extracted and are selected as features in the same way as packets (forward communication, backward communication, and obtaining the mean and std. deviation).

4.1.3. Protocol-Dependent Features

The routing protocol features depended on the selected protocol used in simulations (AODV in this case). The literature review shows that two key features are extensively used, namely hop count and destination sequence number. As several packets are transmitted in a flow, the selected information about these two is the mean and the standard deviation. After accurately selecting and organizing the relevant features obtained from the simulation, the classification process was initiated. This involved the creation of a comprehensive Python script that leveraged the capabilities of the Scikit-learn library as a tool for predictive data analysis by means of machine learning algorithms.

5. Results and Analysis

Section 5 presents the results and a thorough analysis of the machine learning-based intrusion detection system (IDS) for Gray Hole attacks in VANETs. The performance of multiple classifiers, including Random Forest, SVM, Logistic Regression, and Naive Bayes, is evaluated using metrics such as precision, recall, and F1-score. The results demonstrate that Random Forest significantly outperforms the other algorithms, achieving an F1-score of 0.9927, thanks to its ability to handle imbalanced data and its robustness against overfitting. Furthermore, the importance of the extracted features is analyzed, highlighting transmission rate and packet count as the most influential. This section also includes a theoretical analysis of the limitations of SVM, mathematically justifying its tendency to overfit on imbalanced datasets. The findings validate the effectiveness of the proposed approach and its potential to improve security in vehicular networks.

Model Optimization and Robustness

To ensure high detection accuracy and generalizability, we rigorously optimized the Random Forest classifier through hyperparameter tuning and feature importance analysis. Grid search was employed to select optimal parameters (e.g., tree depth, estimator count), while Gini impurity metrics identified the most discriminative features. Robustness was validated across varied attack intensities (20–40% malicious nodes) and network scales. This section details the methodology, performance trade-offs, and stability of the model under dynamic VANET conditions, addressing potential overfitting risks observed in SVM-based approaches. Results confirm the model’s reliability for real-world deployment.

The following pseudocode details the training and evaluation process of the attack detection model. This algorithm represents the steps followed for classification, including data splitting, preprocessing, and cross-validation (Figure 5).

Several machine learning algorithms were used to predict the behavior of attacking nodes: Kernelized Support Vector Machine (SVM), Random Forest, Logistic Regression, and Naive Bayes classifiers. Also, multiple measures to assess these algorithms were used, such as precision, recall, and f1-score, with macro averaging as stated in the literature [23,24]. Precision (p) and recall (r) can be computed as:

p = \frac{T P}{T P + F P}

(1)

r = \frac{T P}{T P + F N}

(2)

where TP (TN) stands for true positive (true negative) and FP (FN) for false positive (false negative). The

F_{1}

score can be interpreted as a weighted harmonic average of the precision and recall [23].

F_{1} = 2 \cdot \frac{p \cdot r}{p + r}

(3)

Table 3 and Table 4 synthesize the results of the measurements performed by the classification algorithms with different scaling methods (Standard and Min-Max). In all algorithms, a 10-fold stratified cross-validation was used to differentiate between training and test data. In each fold, the dataset was chosen randomly, as is regularly performed according to the literature [22]. In all algorithms, 10-fold stratified cross-validation was used to differentiate between training and test data. To prevent data leakage and ensure the honesty of model evaluation, we grouped samples from the same simulation period within folds, and data preprocessing was fine-tuned exclusively with the training set for each fold. Both tables show that the best algorithm to classify misbehavior of nodes on a Gray Hole attack in a VANET is Random Forest. The Random Forest Classifier used has 10 estimators with a max depth of 10. When tuning these two parameters to 15 estimators and a depth of 15, Random Forest achieved 0.9927 in terms of precision and F1 score. Table 4 shows the list of the five best selected features and the corresponding weight of each section. To find the best feature set for detecting Gray Hole attacks from 20 extracted features, the Random Forest Regressor class from scikit-learn was used [21]. As Table 4 shows, the transmission rate and the amount of information transmitted in bytes and packets are the foremost influential features. By incorporating kernel functions into the Support Vector Machine (SVM) model, its learning capability is significantly improved. However, as seen in Table 4, the predictive performance of the SVM can be low in certain scenarios. Each kernel function has its own set of strengths and weaknesses in terms of learning and generalization. Specifically, the Radial Basis Function (RBF) kernel stands out for its strong interpolation capabilities and for capturing local properties of the data. However, its main limitation is that it is not effective in extracting global features from the entire training set [25]. In fact, overfitting is an inherent risk when using the RBF kernel if corrective measures are not applied [26].

To justify our choice of Random Forest, we first analyzed the limitations of SVM. Given the imbalanced dataset (74% normal vs. 26% malicious), SVM variants suffer from overfitting, as shown in Equations (4)–(15). This theoretical insight supports our empirical results (Table 3 and Table 4), where Random Forest consistently outperforms SVM.

Our contribution lies in identifying the optimal ML algorithm (Random Forest) and the set of features for the detection of the gray hole attack, rather than modifying the algorithms themselves. The superiority of Random Forest is demonstrated through comparative analysis and the rank of characteristics of importance (Table 5).

Adjusted gamma scale parameter as stated in [27] shows no change in RBF performance, so, given the extracted features, the limit is about 74.6% of precision. By conducting a similar analysis as performed by [26] about the similarity between characteristics, this was found:

Given: A training dataset with m samples

(X_{k}, Y_{k}), k = 1 : m

, where

X_{k} \in R^{n}

is a sample with a two-class label

Y_{k} \in {- 1, 1}

.

Two known metrics:

d_{i j} = ∥ X_{i} - X_{j} ∥

, the Euclidean distance between two different samples, and

Δ_{i j k} = | X_{k i} - X_{k j} |

, the Manhattan distance between two given features from different samples named as the absolute difference between

X_{i}

and

X_{j}

at the kth feature.

Let

δ_{i j} = {max}_{k} [Δ_{i j k}]

, the maximum absolute difference (MAD) that measures the maximum expression difference across all features for two samples.

It is easy to find [25] that for all the samples in the dataset, the ratio between Euclidean distance and MAD is always between 1 and

2 \sqrt{n}

, which means:

1 \leq \frac{d_{i j}}{δ_{i j}} \leq 2 \sqrt{n}

(4)

Using this result leads to the next affirmation: given any

δ_{i j}

, there exist a

d_{i j}

such that the ratio is equal to 1 so that

d_{i j}

is the minimal,

d_{m i n}

. Substituting this result in expression (4) gives:

d_{m i n} \leq d_{i j} \leq 2 \sqrt{n} δ_{i j}

(5)

The Euclidean distance between training samples

X_{i}

and

X_{j}

in the feature space of RBF-SVM is the

K_{i j}

entry in the learning machine’s kernel matrix in the learning process, so:

κ_{K} (X_{i}, X_{j}) = e^{- \frac{{∥X_{i} - X_{j}∥}^{2}}{2 σ^{2}}}

(6)

Equation (6) can be rewritten based on the Euclidean distance:

K_{i j} = κ_{K} (X_{i}, X_{j}) e^{- \frac{d_{i j}^{2}}{2 σ^{2}}}

(7)

Equations (5) and (7) can be combined to obtain the next result.

e^{- \frac{{(n δ_{i j})}^{2}}{2 σ^{2}}} \leq e^{- \frac{d_{i j}^{2}}{2 σ^{2}}} \leq e^{- \frac{d_{min}^{2}}{2 σ^{2}}}

(8)

Accordingly, according to expression (8), the entry

K_{i j}

in the machine’s kernel matrix in the learning process is upper bounded by the term

d_{min}

. Choosing

σ = 1

, as stated in the literature, leads to the expression

K_{i j} = e^{- \frac{d_{min}^{2}}{2}}

as the upper limit in the training process. It is interesting to see that

d_{min} = 10

leads to a value

K_{i j} = 2 \times 10^{- 22} \sim 0

.

Typically, some

K_{i j}

during optimization are zero [28]. But what happens when almost all

K_{i j}

are zero or near zero? The SVM RBF classifier will always be overfitting and will always predict data as the majority type in the training data, causing the RBF to lose generalization.

To probe that, given

Y_{k} \in {- 1, 1}

, let

m_{- 1}

be the number of samples on the training set where

Y = - 1

, and

m_{+ 1}

be the number of samples on the training set where

Y = 1

, and

m = m_{+ 1} + m_{- 1}

be the total length of the samples. As stated in the literature [26,27] the decision function of SVM is computed as:

f (X_{i}) = sign (\sum_{j = 1}^{m} \hat{a_{i}} e^{\frac{d_{min}^{2}}{2}} + b)

(9)

if almost all

K_{i j}

are zero or near zero, then the decision function will completely depend on the threshold vector b.

f (X_{i}) = sign (b)

(10)

This is clear because the b vector is defined by the weight vector

ω

, since the Karush–Kuhn–Tucker (KKT) condition [29]:

b = \frac{1}{m} \sum_{i = 1}^{m} (Y_{i} - \sum_{j = 1}^{m} \hat{a_{i}} K_{i j})

(11)

Equation (11), as stated in the literature, is the mean of all samples in the training set. But, when

K_{i j} \sim 0

:

b = \frac{1}{m} \sum_{i = 1}^{m} Y_{i}

(12)

Using the fact that

Y_{k} \in {- 1, 1}

, this leads to the next result.

b = \frac{1}{m} [\sum_{i = 1}^{m_{+ 1}} 1 - \sum_{i = 1}^{m_{- 1}} 1] = \frac{m_{+ 1} - m_{- 1}}{m}

(13)

Then

f (X_{i}) = sign (\frac{m_{+ 1} - m_{- 1}}{m})

(14)

According to the sign’s function definition [26] the decision function is further simplified as

f (X_{i}) = sign (m_{+ 1} - m_{- 1})

(15)

where the decision on the type of class would be based solely on knowing which class has the most samples in the training set. If there is no majority type, the learning machine cannot determine the class type of the input sample. This means that using the Euclidean distance metric between samples in the training set, even before training, will give a quick view of whether there will be overfitting in an SVM-RBF. Extracted data used in this work shows a majority of one class sample (normal behavior) at 74%, while the other (bad behavior) is at 26%. Table 4 shows a very accurate approximation to these results given that the Euclidean distance between different samples in the training data used in this work exhibits a value greater than this, which leads to overfitting in the RBF-SVM algorithm.

Referring to the results of the polynomial kernel (POL-SVM), using the Taylor series expansion to express Equation (6) shows the relationship of the RBF kernel with the polynomial kernel. Each term of the series is a monomial of

X_{i}

and

X_{j}

.

κ_{K} (X_{i}, X_{j}) = e^{- \frac{{∥X_{i} - X_{j}∥}^{2}}{2 σ^{2}}} = e^{- \frac{{∥X_{i}∥}^{2}}{2 σ^{2}}} e^{- \frac{{∥X_{j}∥}^{2}}{2 σ^{2}}} e^{\frac{〈X_{i}, X_{j}〉}{σ^{2}}}

(16)

e^{\frac{〈X_{i}, X_{j}〉}{σ^{2}}} = \sum_{k = 0}^{\infty} \frac{1}{k!} {(\frac{〈X_{i}, X_{j}〉}{σ^{2}})}^{k}

(17)

κ_{K} (X_{i}, X_{j}) = e^{- \frac{{∥X_{i} - X_{j}∥}^{2}}{2 σ^{2}}} = e^{- \frac{{∥X_{i}∥}^{2} + {∥X_{j}∥}^{2}}{2 σ^{2}}} \sum_{k = 0}^{\infty} \frac{1}{k!} {(\frac{〈X_{i}, X_{j}〉}{σ^{2}})}^{k}

(18)

Representing the RBF kernel as an inner product of two functions leads to

κ_{K} (X_{i}, X_{j}) = 〈ϕ (X_{i}), ϕ (X_{j})〉

, where each,

ϕ (X)

is a function of the feature space.

ϕ_{k, j} (X) = e^{- \frac{{∥X∥}^{2}}{2 σ^{2}}} \frac{1}{σ^{2} \sqrt{k!}} \prod_{i = 0}^{k} X_{j i}

(19)

Although expressions (17) and (18) show an infinite sum, Taylor’s theory allows a truncation to be carried out to obtain a polynomial of degree

r < k

. The error introduced by this truncation is defined in [28].

On the other hand, the polynomial kernel is also a sum of monomials between

K_{i}

and

K_{j}

. More precisely, the characteristics that correspond to the polynomial kernel can also be written as

ϕ_{k, j} (X) = \sqrt{(\begin{matrix} r \\ k \end{matrix}) c^{r - k}} \prod_{i = 0}^{k} X_{j i}

(20)

Paying attention to Equations (19) and (20), although they are similar, there are differences in scaling between the two polynomials. The most important difference is the scaling that depends on the degree k of the polynomial, which is the inverse of the factorial of the degree. This means that for higher-degree monomials, the scaling is smaller. Therefore, it is understood that the POL-SVM kernel tends to rely more on lower-degree monomials. It follows that the behavior of the polynomial kernel asymptotically approaches [25,26,28,30] the RBF kernel, and increasing the order of the polynomial does not affect this fact, since the scaling factor is approaching zero and has no major impact on its performance. Therefore, in view of this result, the behavior of the POL-SVM algorithm in Table 3 corresponds with these experiments and with the experience in the results presented by other authors [28].

The confusion matrix of the Random Forest model shows high performance in classifying normal and adversarial nodes. Of 9880 normal nodes (Class A), the model correctly classified 9682 (98% recall), making only 198 errors (false negatives). Of the 3120 adversarial nodes (Class B), it correctly identified 2922 (94% specificity), with 198 false positives. This indicates that the model is accurate and sensitive, although with a slight trade-off between false positives (FP) and false negatives (FN). Ideal for environments where both errors have a similar cost; see Table 6.

Table 7 compares the performance of the proposed method with recent studies addressing similar attacks in VANETs. It is noteworthy that the Random Forest classifier, with only 20 selected features, achieves an F1-score of 0.9927, outperforming previous works such as [4] (F1-score: 0.88) and [3] (accuracy: 0.95). The table also highlights the limitations of other approaches, such as the high computational complexity of hybrid models or the reliance on generic features. These results reinforce the superiority of the proposed model in terms of accuracy, adaptability to intermittent attacks, and computational efficiency, consolidating it as a robust solution for the detection of Gray Hole attacks in VANET environments.

6. Conclusions

Even when it is known that no single machine learning algorithm can efficiently handle all types of attacks, as stated in the literature [30], machine learning-based IDS results outperform other IDSs (e.g., rule-based IDSs) when classifying between misbehavior and normal behavior.

In this paper, a reliable machine learning-based IDS to detect Gray Hole attacks in the communication systems for VANETs has been proposed. Several experiments were performed and tested to evaluate the efficiency and performance of the following machine learning classifiers: Polynomial SVM, Radial Basis SVM, Random Forest, Logistic Regression, and Naive Bayes. The Gray Hole attack is detected by analyzing the trace file generated in the network simulator. All tests were based on the Gray Hole intrusion dataset generated in the NS-3 simulator. The trace file describes the behavior of the network through send, receive, move, forward, and dropped packets.

The testing phase is implemented based on several random instances of the trace files. Three performance metrics were computed to evaluate classifier performance: recall, precision, and F1 score. The Random Forest classifier registered the highest accuracy rate of 99.27% when tuned to 15 estimators and a depth of 15, with the smallest false positive rate. It seems that the Random Forest classifier presents acceptable performance parameters to classify gray hole attacks.

Moreover, a mathematical analysis was performed to obtain information about the overfitting of the SVM, which led to the conclusion that an upper bound had been reached in the training process given the training samples in this particular case. Therefore, it can be concluded that the Random Forest algorithm will generally achieve better performance than other ML algorithms.

The results demonstrate that our approach outperforms previous studies [2,4,5], with an F1-score of 0.9927 versus 0.88–0.95 reported for similar attacks. Unlike [3,6], feature categorization and the use of Random Forest solve the unique challenge of Gray Hole attacks.

To further validate our results, we compared our approach with recent studies addressing Gray Hole and similar attacks in VANETs. For instance, Rashid et al. [4] achieved an F1-score of 0.88 using Random Forest with 75 features, while our method, with only 20 tailored features, reached 0.9927. Similarly, Hassan et al. [3] reported a precision of 0.95 for Black Hole attacks, which are less complex than Gray Hole attacks. Our work not only outperforms these studies in accuracy but also demonstrates robustness against intermittent attacks, as evidenced by the 10-fold cross-validation results (Table 3 and Table 4). Additionally, the computational efficiency of our Random Forest model (training time: 3.2 s) compares favorably with deep learning approaches like [10], which require extensive resources and lack real-time applicability.

Although the proposed system demonstrates high performance in detecting Gray Hole attacks, there are some limitations. First, the model was evaluated in a simulated environment using NS-3, so its performance in real-world scenarios may vary due to factors such as vehicular traffic dynamics and signal interference. Furthermore, the approach relies on extracting specific features from network traffic, which may require adjustments to accommodate other types of attacks or routing protocols. Future work could explore the integration of deep learning techniques to improve the model’s generalization.

Author Contributions

Formal analysis, J.A.A.-S.; investigation, J.A.A.-S.; writing—original draft preparation, A.M.S.; writing—review and editing, C.M.-M. and M.E.-T.; visualization, M.E.-T.; supervision, J.A.A.-S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by SECIHTI grant. Researcher for Mexico Research Project Number 882.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the IxM 882 project at Secihti. Grammarly was used to improve the paragraphs in this essay.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

V2V	Vehicle-to-Vehicle
V2I	Vehicle-to-Infrastructure
V2X	Vehicle-to-Everything
GPS	Global Positioning System
AODV	Ad hoc On-Demand Distance Vector Routing
BNN	Bayesian Neural Network
DSRC	Dedicated Short-Range Communications
E2E	End-to-End Delay
IDBA	Intelligent Detection of a Black Hole Attack
IDS	Intrusion Detection System
KSVM	Kernel Support Vector Machine
MAC	The Media Access Control
PHY	Physical Layer
PLR	Packet Loss Rate
PDR	Packet Delivery Ratio
ROH	Routing Overhead
SVM	Support Vector Machine
VANET	Vehicular Ad-Hoc Network
WAVE	Wireless Access in Vehicular Environments

Appendix A

Appendix A presents the complete set of 20 network traffic features extracted during simulations on NS-3, which were used to train and evaluate the proposed intrusion detection system. These features, categorized into time-, packet-, and protocol-based metrics, constitute the basis for detecting Gray Hole attacks. This detailed listing complements the methodology described in Section 4, providing transparency and facilitating the reproducibility of the experiments performed in this study.

Table A1. Gray Hole vs. Black Hole Attacks.

Time-Based	Packet-Based	Protocol-Based
Time period between frames	Number of bytes received	Hop count of the AODV layer
Time of first frame	Backward tx speed	Dest_seq of the AODV layer
Time of the last frame	Number of forward bytes	Average hopcount of the AODV layer
	Number of forward frames	Std of the hopcount of the AODV layer
	Number of backward bytes	Average dest_seq of the AODV layer
	Number of backward frames	Std of the dest_seq of the AODV layer
	Average number of forward bytes
	Average number of backward bytes
	Standard deviation of forward bytes
	Standard deviation of backward bytes

References

Yang, X.; Yi, X.; Khalil, I.; Zeng, Y.; Huang, X.; Nepal, S.; Cui, H. A lightweight authentication scheme for vehicular ad hoc networks based on MSR. Veh. Commun. 2019, 15, 16–27. [Google Scholar] [CrossRef]
Lachheb, A.; Souidi, E.M. The Blackhole Attack on Vehicular Network. In Proceedings of the 2022 9th International Conference on Wireless Networks and Mobile Communications (WINCOM), Rabat, Morocco, 26–29 October 2022; pp. 1–5. [Google Scholar]
Hassan, Z.; Mehmood, A.; Maple, C.; Khan, M.A.; Aldegheishem, A. Intelligent detection of black hole attacks for secure communication in autonomous and connected vehicles. IEEE Access 2020, 8, 199618–199628. [Google Scholar] [CrossRef]
Rashid, K.; Saeed, Y.; Ali, A.; Jamil, F.; Alkanhel, R.; Muthanna, A. An adaptive real-time malicious node detection framework using machine learning in vehicular ad-hoc networks (VANETs). Sensors 2023, 23, 2594. [Google Scholar] [CrossRef] [PubMed]
Alsarhan, A.; Alauthman, M.; Alshdaifat, E.; Al Ghuwairi, A.R.; Al-Dubai, A. Machine Learning-driven optimization for SVM-based intrusion detection system in vehicular ad hoc networks. J. Ambient. Intell. Humaniz. Comput. 2021, 14, 6113–6122. [Google Scholar]
Ghaleb, F.; Saeed, F.; Al-Sarem, M.; Alrimy, B.A.S.; Boulila, W.; Eljialy, A.E.M.; Alazab, M. Misbehavior-aware on-demand collaborative intrusion detection system using distributed ensemble VANET. Electronics 2020, 9, 1411. [Google Scholar] [CrossRef]
Coussement, R.; Amar Bensaber, B.; Biskri, I. Decision support protocol for intrusion detection in VANETs. In Proceedings of the MSWiM ’13: 16th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Barcelona, Spain, 3–8 November 2013. [Google Scholar] [CrossRef]
So, S.; Sharma, P.; Petit, J. Integrating Plausibility Checks and Machine Learning for Misbehavior Detection in VANET. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 564–571. [Google Scholar]
Kadam, N.; Sekhar, K.R. Machine learning approach of hybrid KSVN algorithm to detect DDoS attack in VANET. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 718–722. [Google Scholar] [CrossRef]
Zeng, Y.; Qiu, M.; Zhu, D.; Xue, Z.; Xiong, J.; Liu, M. DeepVCM: A Deep Learning Based Intrusion Detection Method in VANET. In Proceedings of the 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Washington, DC, USA, 27–29 May 2019. [Google Scholar] [CrossRef]
Eziama, E.; Tepe, K.; Balador, A.; Nwizege, K.S.; Jaimes, L.M.S. Malicious Node Detection in Vehicular Ad-Hoc Network Using Machine Learning and Deep Learning. In Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
Singh, P.K.; Nandi, S.K.; Nandi, S. A tutorial survey on vehicular communication state of the art, and future research directions. Veh. Commun. 2019, 18, 100–164. [Google Scholar] [CrossRef]
Miller, C.; Valasek, C. Remote Exploitation of an Unaltered Passenger Vehicle. 2015. Available online: http://illmatics.com/Remote%20Car%20Hacking.pdf (accessed on 18 June 2025).
Karim, S.M.; Habbal, A.; Chaudhry, S.A.; Irshad, A. Architecture, Protocols, and Security in IoV: Taxonomy, Analysis, Challenges, and Solutions. Secur. Commun. Netw. 2022, 2022, 1131479. [Google Scholar] [CrossRef]
Vatambeti, R.; Supriya, K.S.; Sanshi, S. Identifying and detecting black hole and grey hole attack in MANET using grey wolf optimization. Int. J. Commun. Syst. 2020, 33, e4610. [Google Scholar] [CrossRef]
Sun, Z.; Liu, Y.; Wang, J.; Li, G.; Anil, C.; Li, K.; Guo, X.; Sun, G.; Tian, D.; Cao, D. Applications of Game Theory in Vehicular Networks: A Survey. arXiv 2021, arXiv:2103.11874. [Google Scholar] [CrossRef]
Tahiliani, M.; Satre, S. “[NS-3] Blackhole Attack Simulation in NS-3” Mohittahiliani.blogspot.com. 2021. Available online: https://mohittahiliani.blogspot.com/2014/12/ns-3-blackhole-attack-simulation-in-ns-3.html (accessed on 2 August 2025).
Mabrouk, A.; Kobbane, A.; El Koutbi, M. Signaling game-based approach to improve security in vehicular networks. In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), Funchal, Madeira, Portugal, 16–18 March 2018; pp. 495–500. [Google Scholar]
McGregor, A.; Hall, M.; Lorier, P.; Brunskill, J. Flow clustering using machine learning techniques. Lect. Notes Comput. Sci. (LNCS) 2004, 3015, 205–214. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar] [CrossRef]
Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of tor traffic using time-based features. In Proceedings of the ICISSP 2017-Proceedings of the 3rd International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017; Volume 2017, pp. 253–262. [Google Scholar] [CrossRef]
Moudni, H.; Er-Rouidi, M.; Mouncif, H.; El Hadadi, B. Black Hole attack Detection using Fuzzy based Intrusion Detection Systems in MANET. Procedia Comput. Sci. 2019, 151, 1176–1181. [Google Scholar] [CrossRef]
Reddy, B.H.; R, K.P. Classification of Fire and Smoke Images using Decision Tree Algorithm in Comparison with Logistic Regression to Measure Accuracy, Precision, Recall, F-score. In Proceedings of the 2022 14th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS), Karachi, Pakistan, 12–13 November 2022; pp. 1–5. [Google Scholar] [CrossRef]
Alsahli, M.S.; Almasri, M.M.; Al-Akhras, M.; Al-Issa, A.I.; Alawairdhi, M. Evaluation of Machine Learning Algorithms for Intrusion Detection System in WSN. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 617–626. [Google Scholar] [CrossRef]
Han, H.; Jiang, X. Overcome Support Vector Machine Diagnosis Overfitting. Cancer Inform. 2014, 13s1, CIN.S13875. [Google Scholar] [CrossRef] [PubMed]
Zanaty, E.A. Support Vector Machines (SVMs) versus Multilayer Perception (MLP) in data classification. Egypt. Informatics J. 2012, 13, 177–183. [Google Scholar] [CrossRef]
Savas, C.; Dovis, F. The impact of different kernel functions on the performance of scintillation detection based on support vector machines. Sensors 2019, 19, 5219. [Google Scholar] [CrossRef] [PubMed]
Zanaty, E.A.; Afifi, A. Generalized Hermite kernel function for support vector machine classifications. Int. J. Comput. Appl. 2018, 42, 765–773. [Google Scholar] [CrossRef]
Cotter, A.; Keshet, J.; Srebro, N. Explicit Approximations of the Gaussian Kernel. 2011. Available online: http://arxiv.org/abs/1109.4603 (accessed on 18 June 2025).
Liu, Y.; Pi, D. A novel kernel SVM algorithm with game theory for network intrusion detection. KSII Trans. Internet Inf. Syst. 2017, 11, 4043–4060. [Google Scholar] [CrossRef]

Figure 1. RREQ transmission route.

Figure 2. RREP unicast to source.

Figure 3. Gray Hole falsifying RREP (malicious Node 6).

Figure 4. Feature extraction flow in NS-3.

Figure 5. Cross-validation algorithm for model evaluation.

Table 1. Gray Hole vs. Black Hole attacks.

Characteristic	Gray Hole Attack	Black Hole Attack
Packet Dropping	Selective, intermittent	Consistent, complete
Detection Difficulty	High (dynamic behavior)	Moderate (static pattern)
IDS Approach	ML-based feature analysis	Rule-based/Signature IDS
Key Detection Feature	Transmission rate variance	Packet delivery ratio (PDR)

Table 2. Categorized Features for Grey Hole Attack Detection in VANETs.

Category	Feature Name	Description
Time-Based	Transmission Rate	Packets transmitted per second (forward/backward).
	Inter-Packet Arrival Time	Mean/std. deviation of time between consecutive packets.
	Flow Duration	Total duration of a communication flow.
Packet-Based	Forward Packet Count	Total packets sent from source to destination.
	Backward Packet Count	Total packets sent from destination to source.
	Payload Size (Mean/Std)	verage and variability of packet payload bytes (forward/backward).
Protocol-Dependent	Hop Count (Mean/Std)	Average and variability of hop counts in AODV RREP messages.
	Destination Sequence Number	Freshness metric of routing updates (mean/std).

Table 3. Classifier results using the standard scaling method.

	Polynomic SVM	Radial Basis	Logistic Regression	Random Forest	Naive Bayes
Recall	0.717	0.731	0.739	0.956	0.724
Precision	0.704	0.68	0.685	0.958	0.758
F1 Score	0.709	0.686	0.684	0.955	0.736

Table 4. Classifiers Results using the min-max scaling method.

	Polynomic SVM	Radial Basis	Logistic Regression	Random Forest	Naive Bayes
Recall	0.76	0.7246	0.76	0.98	0.72
Precision	0.735	0.7044	0.74	0.98	0.76
F1 Score	0.74	0.7120	0.72	0.98	0.74

Table 5. Performance comparison of machine learning classifiers for Gray Hole attack detection.

	Precision	Recall	F1-Score	Training Time (s)
Random Forest (Proposed)	0.9927	0.9925	0.9927	3.2
SVM (RBF Kernel)	0.7044	0.7246	0.7120	8.7
Logistic Regression Score	0.740	0.760	0.720	1.1
Naive Bayes	0.760	0.720	0.740	0.8
Polynomial SVM	0.735	0.760	0.740	6.5

Table 6. Random Forest results (confusion matrix).

	Prediction: Normal Nodes	Prediction: Adversarial Nodes
Real: Normal Nodes	9682 (VP)	198 (FN)
Real: Adversarial Nodes	198 (FP)	2922 (TN)

Table 7. Performance comparison with recent works.

Study	Method	Attack Type	F1-Score	Limitations
[4]	RF (75 features)	Gray Hole	0.88	Generic features
[9]	Hybrid KSVM	DDoS/Gray Hole	0.91	High complexity
Our Work	RF (20 tailored features)	Gray Hole	0.9927	Optimal for intermittency

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arízaga-Silva, J.A.; Santiago, A.M.; Espinosa-Tlaxcaltecatl, M.; Muñiz-Montero, C. Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs. World Electr. Veh. J. 2025, 16, 526. https://doi.org/10.3390/wevj16090526

AMA Style

Arízaga-Silva JA, Santiago AM, Espinosa-Tlaxcaltecatl M, Muñiz-Montero C. Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs. World Electric Vehicle Journal. 2025; 16(9):526. https://doi.org/10.3390/wevj16090526

Chicago/Turabian Style

Arízaga-Silva, Juan Antonio, Alejandro Medina Santiago, Mario Espinosa-Tlaxcaltecatl, and Carlos Muñiz-Montero. 2025. "Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs" World Electric Vehicle Journal 16, no. 9: 526. https://doi.org/10.3390/wevj16090526

APA Style

Arízaga-Silva, J. A., Santiago, A. M., Espinosa-Tlaxcaltecatl, M., & Muñiz-Montero, C. (2025). Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs. World Electric Vehicle Journal, 16(9), 526. https://doi.org/10.3390/wevj16090526

Article Menu

Machine Learning-Powered IDS for Gray Hole Attack Detection in VANETs

Abstract

1. Introduction

2. Related Work

2.1. Signature vs. Anomaly-Based IDSs

2.2. Machine Learning Approaches in VANET Security

2.3. Gaps and Novelty of This Work

2.4. Side Channel Attacks

2.5. Physical Attacks

3. Methodology

3.1. VANET Introduction

3.2. Introduction to Cybersecurity of VANET

3.3. Gray Hole vs. Black Hole Attacks

4. Experimental Setup

4.1. Flow and Feature Generation

4.1.1. Time-Based Features

4.1.2. Packet-Based Features

4.1.3. Protocol-Dependent Features

5. Results and Analysis

Model Optimization and Robustness

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI