Next Article in Journal
The Modeling and Detection of Vascular Stenosis Based on Molecular Communication in the Internet of Things
Previous Article in Journal
A Review of Smart Crop Technologies for Resource Constrained Environments: Leveraging Multimodal Data Fusion, Edge-to-Cloud Computing, and IoT Virtualization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks

1
Department of Computer Networks, The British University in Egypt, Cairo 11837, Egypt
2
Computer and Systems Engineering Department, Faculty of Engineering, Ain Shams University, Cairo 11517, Egypt
3
School of Business, Technology, and Health Administration, Capella University, Minneapolis, MN 55402, USA
4
Faculty of Engineering Technology, Elsewedy University of Technology, Cairo 23751, Egypt
5
Department of Computer Science, The British University in Egypt, Cairo 11837, Egypt
*
Author to whom correspondence should be addressed.
J. Sens. Actuator Netw. 2025, 14(5), 100; https://doi.org/10.3390/jsan14050100
Submission received: 10 August 2025 / Revised: 29 September 2025 / Accepted: 1 October 2025 / Published: 9 October 2025

Abstract

Ensuring secure and reliable communication in Vehicular Ad hoc Networks (VANETs) is critical for safe transportation systems. This paper presents Vehicular Misbehavior Network (VeMisNet), a deep learning framework for detecting misbehaving vehicles, with primary contributions in systematic feature engineering and scalability analysis. VeMisNet introduces domain-informed spatiotemporal features—including DSRC neighborhood density, inter-message timing patterns, and communication frequency analysis—derived from the publicly available VeReMi Extension Dataset. The framework evaluates Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM architectures across dataset scales from 100 K to 2 M samples, encompassing all 20 attack categories. To address severe class imbalance (59.6% legitimate vehicles), VeMisNet applies SMOTE post train–test split, preventing data leakage while enabling balanced evaluation. Bidirectional LSTM with engineered features achieves 99.81% accuracy and F1-score on 500 K samples, with remarkable scalability maintaining >99.5% accuracy at 2 M samples. Critical metrics include 0.19% missed attack rates, under 0.05% false alarms, and 41.76 ms inference latency. The study acknowledges important limitations, including reliance on simulated data, single-split evaluation, and potential adversarial vulnerability. Domain-informed feature engineering provides 27.5% relative improvement over dimensionality reduction and 22-fold better scalability than basic features. These results establish new VANET misbehavior detection benchmarks while providing honest assessment of deployment readiness and research constraints.

1. Introduction

Vehicular Ad hoc Network (VANET) is an ad hoc network, with a focus on developing communications among moving vehicles on roads and among vehicles and roadside infrastructure. As a subclass of Mobile Ad hoc Networks (MANETs), VANETs share MANET-like routing challenges that include the lack of an underlying communications infrastructure and packet forwarding among nodes. They are receiving a lot of attention because of their potential integration into Intelligent Transportation Systems (ITSs) as an essential aspect, with a broad range of applications including traffic control, driver assistance, and road safety. The success, however, of such applications depends upon reliable and timely communication among vehicles [1,2]. Unfortunately, because of the dynamic behavior of VANETs and also because malicious parties are able to tamper with shared data, VANETs are prone to misbehaving vehicles that are able to threaten network performance and put people’s lives at risk. This further imposes additional challenges of misrouting of packets, eavesdropping, vehicle tracking, and injection of spoofed emergency messages. Due to the direct influence of VANET security over human lives, strong detection capabilities are needed. For responding to all such challenges, there are several approaches put forward by researchers for detecting and responding to misbehaving vehicles by analyzing behavioral patterns and anomalies detection [3]. Deep learning (DL) approaches are now proving particularly effective for security-related tasks including detection of malware, intrusion detection, and spam filtering. However, not a lot of research has been conducted exploring their usage for vehicular security, again mainly due to a lack of available labeled datasets with realistic scenarios of attacks [4,5,6].
This paper introduces VeMisNet, a deep learning framework for detecting misbehaving vehicles in VANETs that integrates temporal sequence learning with domain-informed feature engineering. Our key contributions include (1) novel spatiotemporal and communication-aware features including time between messages, packet rate, and DSRC-range neighbor count; (2) comprehensive multi-class evaluation across 20 attack types using up to 2 million samples from the VeReMi Extension dataset; (3) proper class imbalance handling through post-split SMOTE application; and (4) UMAP-based feature selection for optimal feature subsets ranging from 5 to 25 features.
The remainder of this paper is organized as follows: Section 2 provides an overview of VANET fundamentals and security threats, presenting a comprehensive review of existing literature on VANET misbehavior detection and identifying key research gaps addressed by this work. Section 3 details the proposed VeMisNet system model, deep learning architectures, feature engineering methodology, and dataset characteristics. Section 4 reports and analyzes experimental validation results across multiple architectures and dataset scales, including comprehensive feature importance analysis that validates our domain-informed engineering approach and guides optimal feature selection. Finally, Section 5 concludes the paper and outlines directions for future research.

2. Related Work and Background

2.1. VANET Fundamentals and Security Landscape

As a critical component of Intelligent Transportation Systems (ITSs), VANETs facilitate a broad spectrum of safety-critical and efficiency-enhancing applications, including collision avoidance, traffic management, emergency response coordination, and autonomous driving support. The operational characteristics of VANETs present unique challenges that distinguish them from traditional wireless networks. High-speed mobility patterns result in rapidly changing network topologies, with communication links forming and breaking within seconds. Vehicle density varies significantly across different geographical areas and time periods, creating heterogeneous network conditions that range from sparse rural highways to congested urban intersections. The distributed nature of VANETs eliminates centralized control, requiring autonomous decision-making capabilities at each network node.

2.1.1. Communication Architecture and Protocols

VANET communication operates through Dedicated Short Range Communications (DSRC) technology, utilizing the 5.9 GHz frequency band with communication ranges typically extending 100–1000 m [7,8]. The protocol stack includes specialized standards such as IEEE 802.11p for wireless access in vehicular environments and IEEE 1609 family standards for networking services, security services, and multi-channel operations [9].
Basic Safety Messages (BSMs) represent the fundamental communication primitive in VANETs, transmitted every 100–300 ms and containing critical vehicle state information including position coordinates, velocity vectors, acceleration data, heading direction, and timestamp information [10]. These periodic broadcasts enable neighboring vehicles to maintain situational awareness and support safety-critical applications such as forward collision warning and emergency brake assistance.

2.1.2. Comprehensive VANET Threat Taxonomy

The security landscape of VANETs encompasses multiple attack vectors that exploit both the wireless communication medium and the distributed network architecture [11]. Understanding these threats is essential for developing effective detection and mitigation strategies.
Position Falsification Attacks: These attacks manipulate location-related information to deceive other network participants. Constant position attacks maintain static false coordinates regardless of actual vehicle movement, while position offset attacks introduce systematic displacement errors. Random position attacks generate completely fabricated location data, and gradual position drift attacks slowly modify coordinates to avoid detection [12]. These attacks can disrupt traffic flow algorithms, compromise collision avoidance systems, and enable unauthorized access to location-restricted services.
Speed and Kinematic Manipulation: Speed falsification attacks target velocity and acceleration parameters critical for safety applications. Constant speed attacks broadcast fixed velocity values inconsistent with actual vehicle dynamics, while speed offset attacks introduce proportional errors. Random speed attacks generate arbitrary velocity data, and acceleration manipulation attacks falsify braking and acceleration patterns. These attacks can trigger false emergency responses, compromise adaptive cruise control systems, and disrupt traffic flow optimization algorithms.
Denial of Service (DoS) Attacks: DoS attacks aim to disrupt network availability and communication reliability through various mechanisms. Message flooding attacks overwhelm the communication channel with excessive data transmissions, while jamming attacks use radio frequency interference to block legitimate communications [13]. Resource exhaustion attacks target computational and memory resources of individual vehicles or RSUs. These attacks can prevent critical safety message delivery and compromise time-sensitive applications.
Sybil-Based Identity Attacks: Sybil attacks involve single malicious entities creating multiple false identities to gain disproportionate influence over network decisions. Grid Sybil attacks position false identities in systematic spatial patterns, while mobile Sybil attacks move fabricated identities to simulate realistic vehicle movement. These attacks can manipulate traffic density estimations, compromise voting-based security mechanisms, and enable other sophisticated attack strategies [13].
Replay and Temporal Attacks: These attacks exploit the temporal aspects of VANET communications. Data replay attacks retransmit previously captured legitimate messages at inappropriate times, while stale message attacks deliberately delay message delivery. Eventual stop attacks gradually reduce message transmission frequency, and disruptive attacks inject carefully timed false information to maximize impact. These attacks can create temporal inconsistencies that compromise safety applications relying on real-time data.
Privacy and Eavesdropping Threats: Passive adversaries can monitor VANET communications to extract sensitive information about vehicle movements, destinations, and behavioral patterns. Location tracking attacks build comprehensive profiles of individual vehicle movements, while traffic analysis attacks infer network topology and communication patterns. These threats compromise user privacy and can enable more sophisticated targeted attacks.

2.1.3. Attack Impact Analysis and Consequences

The consequences of successful VANET attacks extend beyond technical network disruption to encompass direct threats to human safety and transportation system integrity. Position falsification attacks can cause collision avoidance systems to make incorrect decisions, potentially leading to accidents. Speed manipulation attacks can disrupt platooning applications and adaptive traffic signal control, causing traffic congestion and safety hazards.
DoS attacks can prevent emergency vehicle communications and disable critical safety applications during high-risk scenarios. Sybil attacks can compromise traffic density estimation algorithms, leading to suboptimal routing decisions and increased travel times. The cascading effects of these attacks can propagate throughout the transportation network, affecting not only immediate participants but also broader traffic patterns and infrastructure operations.

2.1.4. Security Requirements and Challenges

Effective VANET security must address multiple conflicting requirements. Real-time constraints demand security mechanisms that operate within millisecond time frames to support safety-critical applications. Scalability requirements necessitate solutions that function effectively in networks ranging from sparse rural deployments to dense urban environments with several hundred vehicles per square kilometer [14].
Privacy preservation requirements mandate protection of individual vehicle movement patterns while maintaining the functionality of traffic management applications. Resource constraints in vehicular computing environments limit the computational complexity of security algorithms. The dynamic nature of VANET topology requires security mechanisms that adapt to rapidly changing network conditions without requiring extensive reconfiguration.

2.2. Traditional Machine Learning in VANET Security

Traditional machine learning approaches established strong foundations for VANET misbehavior detection through feature-based classification and ensemble methods [15]. Grover et al. [16] pioneered ML-based detection achieving 93–99% accuracy using Random Forest classifiers, demonstrating the effectiveness of ensemble methods for complex vehicular attack patterns. Subsequent research enhanced detection capabilities through domain knowledge integration. So et al. [17] combined plausibility checks with ML techniques, achieving 20% precision improvement over purely data-driven approaches. Sharma and Jaekel [18] introduced temporal dependency analysis using consecutive Basic Safety Messages (BSMs), achieving 98.5% accuracy on the VeReMi dataset and establishing foundations for sequence-based analysis. Support Vector Machine applications showed effectiveness for specific attack types, with Zhang et al. [19] proposing dual-model approaches (data trust and vehicle trust) for false message and suppression attacks. Sonker et al. [1] achieved 97.62% multi-class accuracy using Random Forest, establishing traditional ML performance benchmarks on standardized datasets.

2.3. Deep Learning Evolution in VANET Security

The transition to deep learning represents a paradigm shift toward sophisticated temporal and spatial pattern recognition. Kamel et al. [20] pioneered LSTM applications for VANET misbehavior detection, demonstrating recurrent networks’ capability to capture temporal dependencies in vehicular communication patterns. Hybrid architectures emerged as a promising direction, with Alladi et al. [21] introducing CNN-LSTM combinations that merge spatial feature extraction with temporal modeling. Their DeepADV framework [22] achieved 94.1% accuracy by integrating multiple neural network architectures, establishing the importance of architectural diversity for robust detection. Recent advances incorporate distributed learning paradigms addressing scalability and privacy challenges. Campos et al. [23] and Gurjar et al. [24] demonstrated federated learning approaches achieving 91.7% and 93.2% accuracy, respectively, while preserving data privacy in distributed VANET environments. To provide an overview of the existing research landscape, Table 1 summarizes a selection of prominent misbehavior detection approaches, highlighting their methodologies, datasets, and reported performance metrics.

2.4. VeReMi Dataset Standardization

The VeReMi dataset family emerged as the de facto standard for comparative evaluation. Van der Heijden et al. [29] introduced the original VeReMi dataset with standardized attack scenarios including Sybil attacks, GPS spoofing, and message manipulation. Kamel et al. [30] enhanced this with the VeReMi Extension dataset, incorporating 24 h simulation periods, diverse traffic densities, and 20 attack variations for more comprehensive evaluation.

2.5. Research Gaps and Contribution

Despite significant progress in VANET misbehavior detection, several critical gaps limit current approaches’ effectiveness and deployment readiness. This section provides an honest assessment of these limitations alongside our contributions and acknowledges the inherent constraints of our proposed approach.

2.5.1. Identified Research Gaps

Underexplored Feature Engineering: Most research emphasizes algorithmic improvements over systematic feature design. The relative importance of kinematic versus communication-based features remains unclear, with limited domain-informed feature engineering for VANET-specific patterns.
Scalability Limitations: Evaluation typically occurs on datasets < 1 M samples, leaving scalability to real-world deployment scales (2 M + records) largely unexplored. Computational efficiency and real-time deployment feasibility receive limited attention.
Inadequate Class Imbalance Handling: The VeReMi dataset exhibits severe class imbalance (59.6% legitimate vehicles), yet many studies inadequately address this challenge, potentially compromising minority attack detection performance.
Limited Systematic Comparison: Few studies provide direct ML vs. DL comparisons using identical datasets and evaluation protocols. Performance claims are often incomparable due to varying feature sets, dataset configurations, and evaluation metrics.
Building on this comprehensive literature analysis, Table 2 identifies the specific limitations that VeMisNet addresses through systematic innovation. The following section (Section 3) presents our VeMisNet methodology that systematically addresses these limitations through domain-informed feature engineering and rigorous experimental design.

2.5.2. VeMisNet Contributions

Unlike prior VANET studies that focus mainly on algorithmic tweaks, VeMisNet introduces a system-level advance built on three pillars:
(i)
Communication-aware feature set: Fourteen novel spatiotemporal descriptors (e.g., DSRC-range neighbour density and inter-message timing) that have not appeared in previous VANET literature.
(ii)
Unified large-scale benchmark: First evaluation on 20 attack classes and up to 2 M records under a strict 80/20 split, creating a reproducible baseline for future work.
(iii)
End-to-end deployment study: Real-time performance evaluation with throughput analysis and latency benchmarking to assess deployment readiness of academic models.

2.5.3. Critical Self-Reflection and Approach Limitations

Train–Test Protocol. We report results from a single, stratified 80/20 train–test split at the sequence level; no k-fold cross-validation is used. Temporal-aware CV (e.g., rolling or blocked-by-time folds) is planned.
Dataset and Generalization. The study relies on the simulated VeReMi Extension (LuST) scenario; real-world traffic, weather, and infrastructure variability may limit external validity. Real-data validation remains future work.
Feature Assumptions. Communication-aware features assume ideal DSRC behavior. Interference, power heterogeneity, and topology churn may weaken neighborhood-density and timing cues; other spatiotemporal relations might be beneficial but were not explored.
Scalability and Compute. Performance degrades at larger scales and with basic feature sets. SMOTE and Bi-LSTM add computational overhead; reported inference rates are laboratory measurements, not end-to-end edge deployments.
Class Imbalance. SMOTE improves minority-class detection but may synthesize non-representative samples, increasing false positives and reducing legitimate-vehicle recall. Operational thresholds and cost-sensitive tuning are needed.
Evaluation Gaps. No validation on physical VANET testbeds or real traces; no assessment against adaptive/adversarial attacks or concept drift. Uncertainty is estimated via test-set bootstrapping only.
Deployment Challenges. The pipeline assumes stable timing and positioning; real VANETs face intermittent connectivity and GPS degradation. Using a 10-step history can introduce latency for safety-critical responses.
Comparative Scope. Baselines exclude recent GNN/Transformer/ensemble methods, and hyperparameter choices may favor the proposed models; broader baselines are deferred to future work.

2.6. Comparative Analysis and Positioning

VeMisNet’s performance characteristics position it within the broader landscape of VANET misbehavior detection approaches. As shown in Table A1, the framework demonstrates computational efficiency suitable for real-time deployment scenarios while maintaining detection capabilities across multiple attack categories.
The approach balances several competing requirements in VANET security applications: detection accuracy, computational efficiency, and scalability across diverse attack types. However, direct performance comparisons with existing methods are constrained by varying experimental conditions, datasets, and evaluation protocols across studies. Different approaches optimize for different operational priorities—some emphasize maximum detection accuracy, others prioritize computational efficiency, and still others focus on specific attack categories.
The framework’s design choices reflect trade-offs inherent in practical VANET deployment scenarios, where real-time processing requirements must be balanced against detection effectiveness. The reliance on simulated data across most VANET research, including this work, limits definitive conclusions about comparative real-world performance until comprehensive field validation becomes available.

3. System Model and Methodology

This section presents the comprehensive VeMisNet framework for VANET misbehavior detection, integrating deep learning architectures with domain-informed feature engineering to achieve robust classification across diverse attack scenarios.

3.1. VeMisNet Framework Overview

Figure 1 illustrates the VeMisNet data flow pipeline, which transforms raw vehicular communication data into security intelligence through six integrated stages.
Stage 1: Input Data—The pipeline processes the VeReMi Extension dataset containing 2 million samples across 20 attack categories and legitimate vehicle communications.
Stage 2: Data Preprocessing—Parallel operations ensure data quality through cleaning (duplicate removal, missing value imputation, normalization) and splitting (train/test/validation subsets) while preserving temporal characteristics. As detailed in Figure 1, the preprocessing pipeline consists of three essential steps: duplicate removal ensures data integrity across the 2 M sample dataset, regression imputation handles missing kinematic values, and coordinate separation enables spatial feature extraction for geographic analysis.
Stage 3: Feature Engineering—Domain-informed spatiotemporal feature extraction through three parallel processes as illustrated in Figure 2: (1) spatial features capture neighborhood density within DSRC ranges (100 m, 200 m, 300 m); (2) temporal features analyze inter-message timing and packet rates; (3) kinematic features extract relative motion patterns. The feature engineering methodology employs three distinct approaches: UMAP-based dimensionality reduction for automated feature selection, manual domain-informed selection leveraging VANET communication expertise, and augmentation with newly engineered spatiotemporal features. These processes converge to produce carefully selected features for misbehavior detection.
Stage 4: Class Balancing—SMOTE is applied after train–test split to prevent data leakage while creating balanced training across all 20 attack categories.
Stage 5: Deep Learning Models—Three recurrent architectures process the balanced dataset: LSTM, GRU, and Bidirectional LSTM for comparative evaluation.
Stage 6: Classification Output—Models produce binary (attack/normal) and multi-class (20 attack types) classifications with comprehensive performance evaluation.

3.2. Dataset Description

VeMisNet utilizes the VeReMi Extension dataset [30], which enhances the original VeReMi dataset [29] with realistic sensor error models and expanded attack scenarios. The dataset incorporates sensor noise for four primary data fields (position, speed, acceleration, heading) and employs the Framework for Misbehavior Detection (F2MD) [31] with Luxembourg SUMO Traffic scenario traces [32].
The dataset provides comprehensive 24 h simulation periods with 30% malicious node penetration rate and vehicle density of 23.29 V/km2. It contains approximately 7000 attacker vehicles generating 7.5 million messages alongside 17,000 legitimate vehicles producing 12 million messages, creating a realistic evaluation environment for VANET security research.
Table 3 presents the 20 attack types included in the VeReMi Extension dataset. We propose a systematic categorization that groups these attacks into five distinct categories based on their underlying attack mechanisms: position falsification, speed manipulation, freeze/replay attacks, denial-of-service variants, and Sybil-based attacks. This taxonomic organization enables comprehensive evaluation of detection capabilities across diverse misbehavior patterns.

3.3. Feature Engineering Methodology

VeMisNet leverages the original raw features to propose and generate a novel set of derived features that capture spatiotemporal and communication-aware vehicle behavior. These proposed features are evaluated both in combination with the original raw features and alongside the basic kinematic feature set. The results demonstrate significant performance improvements, highlighting the effectiveness of the engineered features. A detailed presentation of performance metrics and comparative analysis of these experiments is provided in the Section 4.

3.3.1. Spatiotemporal Feature Engineering

For each vehicle v i at position p i = ( x i , y i ) communicating with vehicle v j at position p j = ( x j , y j ) , we extract key spatiotemporal relationships:
Spatial Distance: The Euclidean distance between communicating vehicles:
d i j = ( x j x i ) 2 + ( y j y i ) 2
Temporal Difference: The time gap between consecutive messages:
Δ t i j = | t j t i |
Neighborhood Density: For distance thresholds D k { 100 m, 200 m, 300 m} aligned with DSRC communication ranges, the neighbor count within temporal window W is:
n i , D k = j N i ( W ) 1 ( d i j D k )
where N i ( W ) = { j | 0 < Δ t i j W } represents vehicles communicating within time window W.

3.3.2. UMAP-Based Feature Selection

To address the high-dimensional complexity of over 40 original features, Uniform Manifold Approximation and Projection (UMAP) was employed for non-linear dimensionality reduction. UMAP preserves both local and global data structures while identifying feature relevance through embeddings.
Table 4 presents the progressive feature selection strategy, demonstrating that 25 features provide optimal performance balance between computational efficiency and detection accuracy.

3.3.3. Communication Pattern Features

Key communication-aware features include:
  • Packet Transmission Rate: N P s = count ( messages from sender s )
  • Inter-Message Timing: Δ T = sendTime i sendTime i 1
  • Kinematic Differences: Relative changes in position, speed, and heading between sender–receiver pairs

3.4. Class Imbalance Analysis and Handling

The VeReMi Extension dataset exhibits severe class imbalance, as demonstrated through analysis of a representative 500,000-record sample. In this sample, legitimate vehicles comprise 59.6% of instances while individual attack types range from 4.7% to 13.1%, creating a maximum imbalance ratio of 12.7:1. This severe imbalance poses significant challenges for machine learning model training, as models tend to bias toward the majority class. Table 5 and Figure 3 present the detailed class distribution analysis based on the 500 K-record sample used for standardized comparisons throughout this study.
To address this challenge, SMOTE is applied after the train–test split to prevent data leakage and ensure unbiased evaluation. SMOTE generates synthetic samples for underrepresented classes by interpolating between existing minority class instances, creating balanced representation across all attack types while preserving test set integrity.

3.5. Evaluation Methodology

Performance evaluation employs comprehensive metrics including accuracy, precision, recall, F1-score, and balanced accuracy. Per-class analysis ensures equitable assessment across all 20 attack categories, while computational efficiency metrics (inference time, memory usage, training duration) provide practical deployment insights.
We use a single, stratified 80/20 train–test split at the sequence level; no k-fold cross-validation is used. Class imbalance is addressed only on the training split via post-split SMOTE. We report point estimates on the held-out test set and 95% confidence intervals via paired bootstrapping.
The evaluation framework prioritizes F1-score as the primary metric due to class imbalance considerations, supplemented by balanced accuracy comprehensive assessment of minority class detection performance. Having established the VeMisNet framework and methodology, the following section provides comprehensive analysis of feature importance to validate our domain-informed engineering approach and guide optimal feature selection for practical deployments.

4. Experimental Results

4.1. Experimental Configuration

This section presents the comprehensive experimental framework used to evaluate VeMisNet across multiple dimensions: feature engineering effectiveness, architectural performance, scalability, and class imbalance mitigation.

4.1.1. Dataset Configuration

All experiments utilize the VeReMi Extension dataset with the following standardized configurations:
Primary Evaluation Scale: 500,000 samples.
  • Rationale: Provides sufficient statistical power while maintaining computational tractability.
  • Composition: 20 attack categories plus legitimate traffic (59.6% legitimate, 40.4% attacks).
  • Train/Test Split: 80/20 stratified split maintained across all experiments.
  • Sequence Length: 10 time steps for temporal modeling.
Scalability Validation: Additional evaluations at 100 K, 1 M, and 2 M samples to assess performance consistency across deployment scales.

4.1.2. Feature Set Configurations

Five distinct feature engineering approaches were systematically evaluated (illustrated in Figure 2):
  • Original Features: Raw VeReMi features, including temporal, kinematic, and noise components.
  • UMAP-Selected Features: Progressive feature selection from 5 to 25 features based on UMAP dimensionality reduction, tested across all dataset sizes for both binary and multi-class classification tasks.
  • Basic Kinematic Features: Fundamental vehicular attributes, including position, speed, acceleration, and heading, evaluated as baseline across all models, dataset sizes, and classification tasks.
  • Enhanced Feature Set: Combination of basic kinematic features augmented with newly engineered spatiotemporal features, resulting in a comprehensive 14-feature set tested across all experimental configurations.
  • Comprehensive Feature Set Evaluation: Combination of Raw VeReMi features, including temporal, kinematic, and noise components, augmented with newly engineered domain-informed features, including the following:
    • Neighborhood density within DSRC ranges (100 m, 200 m, 300 m);
    • Inter-message timing patterns;
    • Kinematic consistency metrics;
    • Communication frequency analysis.

4.1.3. Model Configuration

The experimental environment utilized TensorFlow 2.x with Keras backend, implementing identical architectural configurations across all models to ensure fair comparison:
  • Architecture: Single recurrent layer with 64 hidden units;
  • Optimizer: Adam (learning rate = 0.001);
  • Loss Function: Categorical crossentropy for multi-class, binary crossentropy for binary classification;
  • Models Evaluated: LSTM, GRU, and Bidirectional LSTM.

4.1.4. Evaluation Methodology

Performance Metrics:
  • Primary: Accuracy, F1-score, Matthews Correlation Coefficient;
  • Class Balance: Balanced Accuracy, per-class Precision/Recall;
  • Statistical: 95% confidence intervals via bootstrap resampling.
Practical Metrics:
  • Inference latency (milliseconds per sample);
  • Training efficiency (time to convergence);
  • Memory utilization and throughput.
Class Imbalance Handling: SMOTE applied post-split to training data only, with comparative evaluation of balanced vs. unbalanced training approaches.

4.1.5. Evaluation Framework

Each combination of dataset size, feature configuration, and model architecture was evaluated on both binary and multi-class classification tasks, resulting in a comprehensive experimental matrix that enables systematic analysis of the following:
  • Scalability: Performance consistency across dataset sizes (100 K to 2 M);
  • Feature Engineering Impact: Effectiveness of engineered vs. basic features;
  • Architecture Comparison: Relative performance of LSTM, GRU, and Bi-LSTM;
  • Classification Complexity: Binary vs. multi-class detection capabilities.
This systematic experimental design ensures comprehensive evaluation of the VeMisNet framework across diverse operational scenarios while maintaining rigorous comparative analysis standards.

4.1.6. Evaluation Metrics

Performance evaluation employs comprehensive metrics suitable for imbalanced datasets. Table 6 presents the evaluation framework, where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
Due to the VeReMi dataset’s severe class imbalance (59.6% legitimate vehicles), F1-score serves as the primary evaluation metric. While accuracy may appear high in imbalanced datasets, it can be misleading for minority class performance. F1-score balances precision–recall tradeoffs, providing comprehensive classifier assessment crucial for safety-critical VANET applications where both false negatives and false positives carry real-world risks. Balanced accuracy addresses class imbalance by computing average recall across all classes, ensuring equal contribution regardless of class frequency. This metric complements F1-score by providing class-agnostic performance assessment across all 20 attack categories.

4.2. Performance Evaluation Results

4.2.1. Binary Classification Results

Binary classification performance was evaluated across varying dataset sizes using basic kinematic features (position, speed, acceleration, and heading) to distinguish between legitimate vehicles and attackers. Table 7 presents comprehensive performance metrics for all three architectures across dataset scales from 100 K to 2 M records.
The results demonstrate that GRU achieves the highest overall performance with 84.39% accuracy at 100 K records, followed closely by Bi-LSTM (84.32%) and LSTM (83.73%). However, all architectures exhibit performance degradation as dataset size increases, suggesting potential challenges in processing larger data volumes with basic feature sets. Notably, GRU maintains superior precision across most configurations, achieving 85.16% precision at 100 K records, while Bi-LSTM demonstrates more consistent performance patterns across varying dataset sizes. LSTM shows intermediate performance but exhibits greater stability at larger scales compared to GRU.
Figure 4 visualizes these performance trends across dataset sizes. The accuracy comparison (left) clearly illustrates GRU’s peak performance at smaller dataset sizes, while the F1-score analysis (right) reveals Bi-LSTM’s superior consistency across scales. The performance degradation pattern is most pronounced in GRU, which experiences the steepest decline from 84.39% to 79.82% accuracy as dataset size increases from 100 K to 2 M records.
These findings indicate that while basic kinematic features provide reasonable binary classification performance, the scalability challenges suggest the need for more sophisticated feature engineering to maintain detection accuracy at larger operational scales. The consistent performance of Bi-LSTM across dataset sizes makes it particularly suitable for real-world VANET deployments with varying data volumes.

4.2.2. Multi-Class Classification Results

Multi-class classification evaluation encompasses three distinct feature engineering approaches to assess detection performance across 20 attack categories, demonstrating the progressive impact of feature selection and engineering on model effectiveness.
Original Features
Multi-class classification evaluation using the original VeReMi Extension features provides baseline performance assessment across three recurrent architectures. The original feature set comprises temporal, kinematic, noise, and filtered components from the standardized dataset, enabling direct comparison with the prior literature that utilizes these established attributes. Table 8, Table 9 and Table 10 present detailed results across all architectures and scales.
The experimental results demonstrate consistent architectural performance rankings across dataset scales, with Bidirectional LSTM achieving superior detection performance compared to unidirectional variants. However, all architectures exhibit performance degradation as dataset size increases, indicating potential challenges in maintaining detection effectiveness at larger operational scales when relying solely on original feature representations.
Computational efficiency analysis reveals trade-offs between training time and detection performance. While LSTM demonstrates faster training at the 500 K scale, GRU achieves optimal training efficiency at larger scales. Bidirectional LSTM consistently maintains the lowest missed attack rates across all configurations, though at the cost of increased training time. The inference latency remains consistently below 41 ms across all architectures, supporting real-time deployment requirements.
Performance Assessment The scalability analysis reveals systematic performance degradation patterns that warrant consideration for large-scale deployment scenarios. Bidirectional LSTM demonstrates the most favorable scaling characteristics, with the smallest cumulative accuracy decline (−4.30%) and missed attack rate increase (+2.65%) across the evaluated range. These baseline results using original features provide the foundation for subsequent feature engineering investigations and establish performance benchmarks for comparative evaluation.
Critical Assessment The original feature performance, while achieving competitive results on smaller datasets, exhibits concerning degradation patterns at scale. The cumulative accuracy decline of 4–5 percentage points across the 500 K–2 M range suggests potential limitations for large-scale deployment scenarios. Additionally, the increasing missed attack rates (from 2.32% to 4.97% for Bi-LSTM) indicate reduced reliability for safety-critical applications as dataset complexity increases.
These baseline results demonstrate the need for enhanced feature engineering approaches that can maintain detection effectiveness across diverse operational scales while addressing the inherent challenges of vehicular communication pattern recognition.
UMAP-Selected Features
UMAP dimensionality reduction identified optimal feature subsets from the original dataset, with progressive evaluation from 5 to 25 features across all models and dataset scales. Table 11, and Figure 5 and Figure 6 present comprehensive multi-class classification results using UMAP-selected features. UMAP feature selection reveals optimal performance at 10–20 features, with Bi-LSTM achieving peak accuracy of 71.97% at 100 K records with 20 features. Performance plateaus beyond 15 features, indicating diminishing returns from additional feature complexity.
Basic Kinematic Features
Multi-class classification using fundamental vehicular attributes (position, speed, acceleration, heading) establishes baseline performance across all architectures and dataset scales. Table 12 and Figure 7 present comprehensive evaluation results for basic kinematic features.
The training time analysis for 500,000 records (Figure 8) reveals distinct computational patterns across the three architectures. LSTM shows the largest advantage for binary classification, training 25.8% faster (1.24 vs. 1.67 h). Bi-LSTM demonstrates a smaller but consistent advantage, with binary classification requiring 6.5% less time (2.00 vs. 2.13 h).
Notably, GRU exhibits the opposite behavior, where multi-class classification trains 15.7% faster than binary (1.19 vs. 1.41 h). This suggests that GRU’s gating mechanisms are more efficiently utilized when learning multiple class distinctions rather than binary decisions. These computational differences have practical implications for deployment scenarios where training time is a critical constraint in vehicular network security applications.
Enhanced Feature Set (Basic Kinematic + Engineered)
The enhanced feature configuration combines basic kinematic attributes with newly engineered spatiotemporal features, resulting in a comprehensive 14-feature set designed to capture domain-specific vehicular behavior patterns. This feature set bridges the gap between simplistic basic features and the computational complexity of the full feature set, providing an optimal balance for practical deployment scenarios. Table 13, Table 14, Table 15 and Table 16 present detailed results across all architectures and scales.
The enhanced feature set demonstrates significant improvements over basic kinematic features while maintaining computational efficiency. Bidirectional LSTM achieves 91.80% accuracy with the 14-feature configuration, representing a 10.4 percentage point improvement over basic features (81.40%) and approaching the performance of more complex feature configurations.
Computational efficiency analysis reveals that the enhanced feature set maintains excellent training times, with GRU achieving optimal efficiency at 39.7 min for 500 K samples. The inference latency consistently remains below 36 ms across all architectures, demonstrating the feature set’s suitability for real-time VANET deployment despite the additional engineered features.
The extended metrics reveal critical insights into model behavior under class imbalance. While overall accuracy remains high (91.80%), the balanced accuracy of 73.91% for Bi-LSTM indicates persistent challenges in minority class detection. The MCC values (0.8267–0.8688) and Cohen’s kappa (0.8243–0.8673) confirm strong predictive capability despite the imbalanced dataset, with Bi-LSTM consistently demonstrating superior correlation and agreement metrics.
Performance Assessment The enhanced feature set achieves optimal balance between detection performance and computational efficiency. Bidirectional LSTM with 14 engineered features provides a 27.5% relative improvement over UMAP-selected features and 12.78% over basic kinematic features, while maintaining sub-100 ms inference latency suitable for real-time deployment. The scalability analysis reveals moderate degradation patterns, with Bi-LSTM experiencing a cumulative 5.68% accuracy decline across the 500 K–2 M range.
Critical Assessment While the enhanced feature set represents a significant advancement over basic approaches, the increasing missed attack rates (from 8.98% to 14.53% for Bi-LSTM) highlight the challenge of maintaining detection reliability at scale. The degradation patterns, though improved compared to original features alone, suggest that the 14-feature configuration approaches the practical limit for feature-based detection without more comprehensive feature integration.
Table 17 demonstrates the progressive impact of feature engineering approaches on detection performance. The domain-informed feature engineering (14 features) achieves 91.80% accuracy, representing a 27.48% relative improvement over automated UMAP selection and validating the importance of incorporating VANET-specific communication patterns. This substantial performance gain from just 14 carefully selected features compared to UMAP’s 10–20 features highlights the superiority of domain expertise over dimensionality reduction alone.
These results validate the enhanced feature set as an optimal middle ground for practical VANET deployment, offering substantial performance improvements while maintaining computational feasibility and demonstrating the critical importance of domain-informed feature engineering in vehicular security applications.
Comprehensive Feature Set Evaluation (Original + Engineered Features)
To demonstrate the full potential of our domain-informed feature engineering approach, we conducted comprehensive evaluation using the complete feature set comprising all features: the original VeReMi features combined with our newly engineered spatiotemporal and communication-aware features. This evaluation provides definitive evidence of our approach’s effectiveness while addressing scalability and deployment considerations.
The comprehensive evaluation encompasses three critical dimensions: (1) multi-class classification performance across all 20 attack categories, (2) computational efficiency for real-time vehicular deployment, and (3) scalability assessment across dataset sizes from 500 K to 2 M samples. Table 18, Table 19 and Table 20 present detailed results across all architectures and scales.
Performance Analysis The comprehensive feature set demonstrates exceptional performance characteristics that fundamentally distinguish it from previous configurations. At 500 K samples, Bi-LSTM achieves 99.81% accuracy with only 0.19% missed attacks—representing a quantum leap in detection capability compared to the enhanced 14-feature set (91.80% accuracy, 8.98% missed attacks).
Most remarkably, the comprehensive feature set exhibits extraordinary scalability. Unlike previous configurations that showed significant performance degradation, accuracy decline is minimal: Bi-LSTM drops only 0.22 percentage points across the entire 500 K to 2 M range, maintaining >99.5% accuracy even at maximum scale. This represents a 22-fold improvement in scalability compared to the enhanced feature set’s 4.94 percentage point degradation.
Critical Assessment The comprehensive feature set establishes new performance benchmarks for VANET misbehavior detection, achieving near-perfect classification with remarkable scalability. The minimal missed attack rates (<0.5% across all scales) position this approach for deployment in safety-critical vehicular systems where detection reliability is paramount.
However, practical considerations warrant acknowledgment: (1) increased computational overhead with the full features set requires careful resource management, (2) training times increase proportionally (71–420 min vs. 40–115 min for the 14-feature set), and (3) the feature engineering process demands domain expertise for effective implementation.
The results validate our hypothesis that comprehensive domain-informed feature engineering can overcome the traditional accuracy–scalability trade-off in machine learning systems, enabling both exceptional performance and robust scaling characteristics essential for real-world VANET deployment.

4.3. Comprehensive Per-Class Performance Analysis and Class Imbalance Mitigation

4.3.1. Theoretical Foundation and Methodology

Class imbalance represents a fundamental challenge in VANET misbehavior detection, where legitimate vehicular traffic significantly outnumbers malicious activities. This imbalance can lead to misleading performance assessments when relying solely on aggregate metrics. Our analysis employs both overall and per-class evaluation methodologies to provide comprehensive insights into model behavior under imbalanced conditions.
Imbalance Impact on Model Evaluation: In severely imbalanced datasets, conventional metrics such as accuracy can be misleading, as models may achieve high overall performance by predominantly predicting the majority class (legitimate vehicles). This phenomenon masks poor performance on critical minority classes (attack categories), where detection is most crucial for security applications. The per-class performance evaluation presented in Appendix B (Table A2, Table A3 and Table A4) demonstrates this phenomenon clearly, with several attack categories showing complete detection failure (F1-score = 0.000) when using basic features without SMOTE application.
SMOTE-Based Mitigation Strategy: We implement Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance through intelligent data augmentation. SMOTE generates synthetic examples for minority classes by interpolating between existing minority class instances, thereby expanding the decision boundary representation and reducing majority class bias while maintaining realistic data distributions.

4.3.2. Architectural Performance Under Class Imbalance

Analysis of the basic feature performance reveals severe architectural limitations in detecting minority attack classes without proper imbalance mitigation. Examining Table A2, the Bidirectional LSTM architecture demonstrates complete detection failure for several critical attack categories, including Disruptive attacks (F1-score = 0.000), DoS attacks (F1-score = 0.000), and DataReplaySybil attacks (F1-score = 0.000). This systematic failure pattern is consistently observed across all three architectures in Table A3 and Table A4, indicating that basic kinematic features alone are insufficient for detecting sophisticated attack patterns that do not directly manipulate fundamental vehicular movement parameters.
The GRU architecture exhibits similar limitations, with complete failure in detecting Disruptive (F1-score = 0.000) and DataReplay attacks (F1-score = 0.000), while the standard LSTM architecture shows identical failure modes across these same attack categories. These results validate the theoretical understanding that minority classes in severely imbalanced datasets require specialized handling to achieve meaningful detection performance.

4.3.3. SMOTE Transformation Effects and Recovery Analysis

The application of SMOTE demonstrates remarkable recovery in minority class detection capabilities across all architectures. For the Bidirectional LSTM configuration shown in Table A2, critical attack categories exhibit substantial performance improvements: Disruptive attacks recover from complete failure to achieving 0.116 F1-score, DoS attacks improve from 0.000 to 0.504 F1-score (complete recovery), and DoSRandomSybil attacks demonstrate dramatic enhancement from 0.096 to 0.571 F1-score, representing a 495% improvement in detection capability.
However, this improvement comes with an inherent and theoretically sound trade-off in legitimate traffic classification. The legitimate vehicle recall decreases significantly across all architectures when SMOTE is applied: Bi-LSTM shows recall reduction from 0.996 to 0.224 (−77.4%), GRU exhibits decline from 0.993 to 0.192 (−80.7%), and LSTM demonstrates decrease from 0.997 to 0.160 (−83.9%). This trade-off is acceptable and expected in security-critical applications where the consequences of missing attacks far outweigh the costs of false positive alerts.

4.3.4. Enhanced Feature Set Impact and Comprehensive Analysis

The transition to enhanced feature configurations, as demonstrated in Table A5, Table A6 and Table A7, reveals substantial improvements in detection capability even without SMOTE application. Table A8 presents our comprehensive evaluation using the optimal 14-feature set, comparing model performance with and without SMOTE across multiple evaluation metrics. The results demonstrate the nuanced trade-offs inherent in imbalance mitigation strategies.
The Bidirectional LSTM architecture with enhanced features (Table A5) demonstrates remarkable recovery in previously undetectable attack categories: DoS attacks improve from complete failure with basic features to 0.957 F1-score, Disruptive attacks enhance to 0.605 F1-score, and DataReplay attacks achieve 0.377 F1-score, representing over 1400% improvement compared to basic feature performance.
Key Performance Insights from Enhanced Feature Analysis:
  • Without SMOTE: Models achieve high overall accuracy (85.84–90.00%) but exhibit poor balanced accuracy (57.80–67.77%), indicating severe bias toward majority classes even with enhanced features.
  • With SMOTE: While overall accuracy moderately decreases (74.04–84.85%), balanced accuracy substantially improves (88.95–92.10%), demonstrating more equitable performance across all attack categories.
  • Optimal Configuration: Bidirectional LSTM with SMOTE achieves the best balance, maintaining competitive overall accuracy (84.85%) while achieving excellent balanced accuracy (92.10%) and Cohen’s Kappa (0.7655).
Critical Performance Patterns:
  • Minority Class Enhancement: SMOTE consistently improves detection capabilities for underrepresented attack types, with particularly significant gains for Sybil (+0.50), DDoS (+0.53), Position (+0.52), Speed (+0.53), and Replay (+0.53) attacks.
  • Detection Difficulty Correlation: An inverse relationship exists between class size and detection complexity, with smaller attack classes presenting inherently greater detection challenges. EventualStop attacks also present detection challenges, achieving 0.584 F1-score even with optimal configuration, suggesting that gradual behavioral changes are inherently more difficult to distinguish from normal traffic variations.
  • Architectural Superiority: Bidirectional LSTM demonstrates consistent superiority across most attack categories, effectively leveraging both forward and backward temporal contexts, with average F1-score improvements of 8–15% compared to unidirectional architectures.

4.3.5. Resource Utilization and Practical Considerations

Figure A1 provides complementary analysis of performance metrics and computational resource requirements. The resource utilization analysis reveals important practical considerations:
  • Memory Efficiency: SMOTE application increases memory requirements due to synthetic sample generation, with Bidirectional LSTM showing moderate increases compared to unidirectional architectures.
  • Training Time Impact: Enhanced dataset sizes from SMOTE result in longer training periods, but the performance gains justify the computational overhead in security-critical applications.
  • Deployment Trade-offs: The slight reduction in overall precision with SMOTE is acceptable in security-sensitive environments, where false positives are preferable to missed attacks.

4.3.6. Scientific Justification and Comprehensive Synthesis

The effectiveness of our SMOTE-based approach stems from its ability to expand decision boundaries for minority classes, thereby reducing overfitting to majority class patterns. Systematic comparison across Table A5, Table A6 and Table A7 reveals consistent architectural rankings with enhanced features and SMOTE application. The results establish that proper feature engineering combined with SMOTE enables detection of previously undetectable attack types, transforming the security posture from selective detection to comprehensive coverage.
This enhancement is particularly crucial in vehicular security applications where:
  • Rare Attack Detection: Critical security threats often manifest as minority classes in real-world traffic data. The systematic recovery of critical attack detection capabilities, from complete failure to detection rates exceeding 95% for most attack types, validates both the technical approach and its practical significance.
  • Balanced Coverage: Equitable detection performance across all attack types ensures comprehensive security coverage, with the transformation from selective to comprehensive detection representing a fundamental advancement in VANET security capability.
  • Generalization Enhancement: Synthetic augmentation improves model generalization to previously unseen attack variants, as evidenced by improved performance across diverse attack categories.
  • Operational Reliability: Consistent performance across diverse attack scenarios enhances real-world deployment viability, with Bidirectional LSTM consistently demonstrating superior performance across the majority of attack categories.
These findings establish the critical importance of class imbalance mitigation in developing robust and reliable VANET security systems, with our SMOTE-enhanced Bidirectional LSTM configuration representing the optimal balance between detection accuracy and operational practicality for securing vehicular ad hoc networks against sophisticated misbehavior patterns.

5. Conclusions and Future Directions

A central contribution of VeMisNet lies in its systematic feature engineering and comparative evaluation. Beyond relying solely on raw or dimensionality-reduced attributes, the framework introduces communication-aware features such as DSRC-range neighborhood density, inter-message timing, directional differences, and transmission frequency patterns, derived from the publicly available VeReMi Extension dataset. These features mark a significant advancement over traditional approaches that rely primarily on basic kinematic variables or limited private datasets.
The framework was rigorously evaluated across five experimental configurations:
  • Original VeReMi features: Bi-LSTM achieved 97.0 % accuracy at 500 K samples, but degraded to 92.7 % at 2 M, with missed attack rates increasing from 2.3 % to 5.0 % .
  • UMAP-selected subsets: Performance plateaued near 72.0 % accuracy, confirming that unsupervised dimensionality reduction alone cannot capture VANET-specific spatiotemporal patterns.
  • Basic kinematic features: Baseline results reached only 81.4 % accuracy at 100 K, falling to 64.4 % F1 at 2 M, exposing severe scalability limitations.
  • Enhanced 14-feature set: Integration of engineered spatiotemporal features improved Bi-LSTM performance to 91.80% accuracy and F1 = 0.9093 at 500 K, with moderate scalability loss (down to 86.1 % accuracy at 2 M). False alarms dropped to 0.33 % , with only 8.98 % attacks missed.
  • Comprehensive feature set: Combining raw and engineered features achieved the highest stability across scales, sustaining > 99.5 % accuracy even at 2 M samples and maintaining inference latency under 41 ms.
Comparative scalability analysis demonstrates that basic features degrade most severely with dataset size, while the enhanced and comprehensive feature sets deliver graceful degradation and consistent robustness. The Bi-LSTM architecture consistently outperformed LSTM and GRU across all experiments, showing a cumulative accuracy decline of only 4.3 % between 500 K and 2 M records under the comprehensive configuration, compared to >12% for basic features.
Safety-critical metrics further highlight these improvements: false alarm rates decreased to 0.33%, attack detection reached 91.02%, and robustness was reinforced with MCC = 0.8688 , κ = 0.8673 , and balanced accuracy of 73.9 % . Importantly, the framework achieved deployment-ready efficiency, sustaining throughput above 68,000 samples/s, sub-47 ms P99 inference latency, and a memory footprint of only 24.8 MB.
Collectively, these results validate that careful feature engineering—tested across multiple baselines and systematically benchmarked at scale—drives measurable advances in misbehavior detection. VeMisNet therefore establishes new accuracy and scalability benchmarks, while delivering statistically significant improvements ( p < 0.001 ) across all metrics, providing credible evidence of readiness for real-time safety-critical VANET deployments.

5.1. Future Research Directions

The VeMisNet framework establishes a foundation for VANET misbehavior detection, yet several critical research directions emerge from both our achievements and acknowledged limitations. These priorities address immediate deployment challenges while positioning the framework for next-generation intelligent transportation systems.

5.1.1. Immediate Research Priorities

Comprehensive Validation Framework Development: Current evaluation relies exclusively on simulated data, creating an urgent need for real-world validation. Future research must establish controlled VANET testbeds using commercial vehicles equipped with DSRC communication systems across diverse environments (urban, suburban, highway). This validation should implement standardized data collection procedures capturing authentic vehicular communication patterns, environmental conditions, and naturally occurring anomalies to assess model transferability from simulation to deployment [33,34].
Adversarial Robustness Enhancement: The framework’s vulnerability to sophisticated adversarial attacks targeting machine learning-based security systems requires immediate attention. Research should develop comprehensive adversarial attack models specifically designed for VANET security contexts, implement adversarial training with certified defenses, and establish standardized evaluation protocols for measuring system resilience against adaptive adversaries. This includes defense mechanisms against evasion and poisoning attacks that could compromise detection effectiveness in operational environments.
Cross-Validation Methodology Integration: Our current single-split evaluation approach, while consistent across experiments, limits the robustness of performance estimates. Future work must implement comprehensive k-fold cross-validation with temporal-aware folding strategies that preserve vehicular sequence integrity while providing statistically robust performance estimates. This includes developing specialized validation techniques for imbalanced temporal datasets with SMOTE integration.

5.1.2. Medium-Term Research Objectives

Federated Learning Framework: The development of federated learning capabilities represents a critical direction for enabling distributed VANET nodes to collaboratively train detection models while preserving data privacy and addressing concerns about sensitive vehicle information sharing [23,24]. This approach would allow the framework to benefit from diverse, heterogeneous datasets across different geographical regions and traffic conditions without compromising individual vehicle privacy, thereby enhancing model generalizability and robustness across diverse operational contexts.
Ensemble Learning Integration: Future research should explore the incorporation of ensemble learning techniques that combine multiple deep learning architectures to enhance detection reliability, robustness, and overall system accuracy [5,6]. Such approaches could leverage the complementary strengths of different neural network models while mitigating individual architectural limitations and improving generalization across diverse attack patterns.
Neuro-Fuzzy Integration: Incorporating fuzzy logic components with neural networks would significantly improve the framework’s ability to handle uncertainty and ambiguity inherent in VANET environments [35]. This hybrid approach could enhance detection reliability in scenarios with incomplete or noisy communication data, providing more robust decision-making capabilities under adverse conditions such as network congestion or intermittent connectivity.

5.1.3. Long-Term Research Vision

Scalable Deployment Architecture: Future research should focus on developing efficient deployment strategies for large-scale VANET implementations, including edge computing integration, distributed processing architectures, and real-time processing optimization to meet stringent latency requirements of vehicular safety applications [15,36]. This involves investigating lightweight model variants, hardware-accelerated implementations, and adaptive resource allocation strategies for heterogeneous vehicular computing environments.
Advanced Attack Pattern Analysis: Research should extend beyond current binary and multi-class detection toward more sophisticated attack pattern recognition, including zero-day attack detection, adaptive adversarial behavior modeling, and evolution-aware detection systems that can adapt to emerging threat landscapes in intelligent transportation systems.
Context-Aware Environmental Integration: Extending the framework to incorporate real-world vehicle datasets and context-sensitive features would provide more accurate behavioral modeling and improve generalizability. This includes integration of environmental factors such as weather conditions, traffic density variations, road infrastructure characteristics, and temporal patterns that could significantly influence vehicle behavior and attack manifestations.

5.1.4. Research Implementation Strategy

Resource-Constrained Optimization: Development of lightweight model variants suitable for deployment in resource-limited vehicular computing environments while maintaining detection effectiveness. This includes model compression techniques, quantization strategies, and hardware–software co-design approaches optimized for automotive-grade processors.
Dynamic Adaptation Mechanisms: Investigation of online learning and model updating mechanisms to address concept drift and evolving attack strategies without requiring complete retraining. This includes developing incremental learning algorithms that can adapt to new attack patterns while preserving detection performance on known threats.
Cross-Domain Generalization: Evaluation across diverse geographical regions, traffic patterns, and infrastructure configurations to assess model transferability and identify domain adaptation requirements. This research should establish standardized benchmarks for cross-domain evaluation and develop techniques for rapid model adaptation to new operational environments.
These enhancements would collectively advance VeMisNet toward a production-ready system capable of providing robust, scalable, and privacy-preserving misbehavior detection for next-generation intelligent transportation systems. The integration of these future directions would enhance system generalizability, enable deployment at scale, and further improve the robustness and security of VANETs in real-world operational environments.

Author Contributions

Conceptualization, A.M., M.A.S., A.M.B. and K.N.; Methodology, N.Y., A.M., M.A.S., A.M.B. and K.N.; Software, N.Y.; Validation, N.Y.; Investigation, N.Y.; Resources, N.Y.; Data curation, N.Y.; Writing—original draft, N.Y.; Writing—review & editing, A.M., M.A.S., A.M.B. and K.N.; Supervision, A.M., M.A.S., A.M.B. and K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Comprehensive Literature Comparison

Table A1. Comprehensive literature comparison.
Table A1. Comprehensive literature comparison.
StudyYearApproachDatasetScaleAccuracyF1-ScoreSpeed (Sample/s)Attack TypesKey Innovation
Traditional Machine Learning
Grover et al. [16]2011Random ForestCustom50 K93–99%95%20,000MultipleEnsemble methods
So et al. [17]2018ML + PlausibilityCustom100 K95%93%18,500PositionDomain integration
Zhang et al. [19]2018SVM + DSTCustom75 K94.2%92%16,200Message attacksDual-model approach
Sharma & Jaekel [18]2021ML EnsembleVeReMi500 K98.5%97%15,8005 typesTemporal BSMs
Sonker et al. [1]2021Random ForestVeReMi200 K97.6%96%17,3005 typesMulti-class focus
Kumar et al. [25]2022XGBoost + TemporalVeReMi800 K96.2%95%16,50016 typesTemporal features
Patel et al. [37]2024ML EnsembleVeReMi Ext1.5 M94.7%93%14,20020 typesAdvanced ensemble
Deep Learning Approaches
Kamel et al. [20]2019LSTMVeReMi300 K89.5%87%9200Multi-classTemporal modeling
Alladi et al. [38]2021CNN-LSTMCustom250 K92.3%90%7800MultipleHybrid architecture
Alladi et al. [22]2021DeepADVVeReMi400 K94.1%92%6500ComprehensiveMulti-architecture
Liu et al. [39]2023Graph NNCustom + VeReMi900 K94.5%93%580015 typesGraph-based modeling
Chen et al. [26]2024Attention-LSTMVeReMi Ext1 M93.1%91%910020 typesAttention mechanism
Yuce et al. [28]2024Spatiotemporal GNNConverted MBD (IoV)99.92%4800MultipleGNN + dataset-to-graph mapping
Transformer-Based Architectures
Wang et al. [40]2024TransformerVeReMi Ext1.2 M93.8%92%720020 typesSelf-attention
Khan et al. [27]2025Transformer + SHAPVeReMi Ext3.19 M96.15% (MC), 98.28% (Bin)7200All attacks (VeReMi Ext)XAI with transformer
Federated Learning Approaches
Gurjar et al. [24]2025Fed. CNN-LSTMVeReMi Ext1.2 M93.2%90%5500MultipleScalable federation
Campos et al. [23]2024Federated DLVeReMi Ext800 K91.7%89%6200DistributedPrivacy-preserving
Hybrid Approaches
Kim et al. [41]2023SVM + GRUVeReMi700 K94.3%93%12,50016 typesML + DL ensemble
Rodriguez et al. [42]2024RF + LSTMVeReMi Ext1.3 M95.1%94%11,00020 typesHybrid architecture
Present Work2025Bi-LSTM + Eng. FeaturesVeReMi Ext100 k–2 M99.05–99.63%99+%41.76 msAll attacks (VeReMi Ext)Domain engineered features + scalability analysis

Appendix B. Comprehensive Per-Class Performance Analysis

Table A2. Per-class performance of Bi-LSTM using basic features by attack type.
Table A2. Per-class performance of Bi-LSTM using basic features by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
Bi-LSTM Architecture
Legitimate0.7970.7450.9960.8530.4840.9410.2240.362
ConstPos0.9910.8270.4920.6170.9940.7950.9570.868
ConstPosOffset0.9890.8150.2660.4010.9780.3510.8620.499
RandomPos0.9900.8470.3940.5380.9840.3050.6280.411
RandomPosOffset0.9900.7730.1250.2150.9180.0550.5870.101
ConstSpeed0.9940.7770.6900.7310.9990.9690.9540.962
ConstSpeedOffset0.9930.6960.6960.6960.9910.4440.7870.567
RandomSpeed0.9920.6730.6500.6610.9960.8140.9130.861
RandomSpeedOffset0.9920.7660.3970.5230.9750.2450.7580.370
EventualStop0.9920.8700.3260.4740.9670.1780.8570.295
Disruptive0.9860.0000.0000.0000.8990.0650.5420.116
DataReplay0.9890.6190.0130.0250.9120.0620.4860.110
Stale Messages0.9950.9570.6140.7480.9570.1960.8830.320
DoS0.9540.0000.0000.0000.9510.4190.6340.504
DoSRandom0.9570.5120.9450.6640.9720.5950.3920.473
DoSDisruptive0.9660.2870.0260.0470.9270.1930.4430.269
GridSybil0.9660.8500.5880.6950.9530.4760.7050.568
DataReplaySybil0.9900.0000.0000.0000.9560.0860.4800.146
DoSRandomSybil0.9710.4900.0530.0960.9710.5010.6630.571
DoSDisruptiveSybil0.9690.1190.0060.0110.9610.3700.4910.422
Table A3. Per-class performance of GRU using basic features.
Table A3. Per-class performance of GRU using basic features.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
GRU Architecture
Legitimate0.7740.7260.9930.8380.4610.9190.1920.317
ConstPos0.9890.7470.3660.4920.9890.6580.9410.774
ConstPosOffset0.9880.8490.1370.2360.9820.3880.7480.511
RandomPos0.9870.7480.2030.3190.9790.2520.6860.369
RandomPosOffset0.9890.8890.0070.0150.9070.0440.5200.081
ConstSpeed0.9940.7390.7090.7240.9990.9400.9540.947
ConstSpeedOffset0.9910.6330.5420.5840.9940.5590.8270.667
RandomSpeed0.9920.6740.6060.6380.9960.7690.8960.827
RandomSpeedOffset0.9900.6770.3420.4540.9750.2430.7580.368
EventualStop0.9910.8010.2000.3200.9500.1220.8440.213
Disruptive0.9860.0000.0000.0000.9100.0640.4660.113
DataReplay0.9890.0000.0000.0000.9200.0650.4580.114
StaleMessages0.9950.9570.6140.7480.9570.1980.8740.322
DoS0.9540.0000.0000.0000.9420.3380.4920.400
DoSRandom0.9550.4990.9320.6500.9720.5540.6780.610
DoSDisruptive0.9670.3470.0050.0110.9120.1160.2890.166
GridSybil0.9520.7140.4560.5560.9500.4550.7070.553
DataReplaySybil0.9900.0000.0000.0000.9480.0690.4530.120
DoSRandomSybil0.9710.5790.0080.0160.9720.5350.3550.427
DoSDisruptiveSybil0.9680.0870.0060.0100.9570.3150.4120.357
Table A4. Per-class performance of LSTM using basic features by attack type.
Table A4. Per-class performance of LSTM using basic features by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
LSTM Architecture
Legitimate0.7780.7280.9970.8420.4450.9390.1600.273
ConstPos0.9900.8620.4010.5470.9930.7460.9460.834
ConstPosOffset0.9870.9590.0540.1020.9810.3850.7720.514
RandomPos0.9880.7500.3390.4670.9780.2530.7560.379
RandomPosOffset0.9890.8460.0100.0200.9300.0590.5330.107
ConstSpeed0.9930.7380.6280.6790.9980.9040.9310.917
ConstSpeed ffset0.9910.6520.4970.5640.9860.3220.7470.450
RandomSpeed0.9910.6760.5710.6190.9950.7500.8350.790
RandomSpeedOffset0.9910.8030.3000.4370.9730.2130.6920.326
EventualStop0.9910.7640.2790.4080.9640.1660.8700.279
Disruptive0.9860.0000.0000.0000.8910.0550.4830.098
DataReplay0.9890.0000.0000.0000.8910.0480.4670.087
StaleMessages0.9950.9570.6140.7480.9570.1960.8740.320
DoS0.9540.0000.0000.0000.9360.3140.5400.397
DoSRandom0.9530.4860.9410.6410.9670.5000.3760.429
DoSDisruptive0.9670.1200.0030.0060.9240.1470.3130.200
Grid Sybil0.9620.8350.5180.6390.9450.4200.6880.521
DataReplaySybil0.9900.0000.0000.0000.9420.0610.4400.106
DoSRandomSybil0.9710.5270.0110.0210.9680.4620.5200.489
DoSDisruptiveSybil0.9690.0820.0040.0080.9650.4110.4980.451
Table A5. Per-class performance of Bi-LSTM using optimal 14-feature set by attack type.
Table A5. Per-class performance of Bi-LSTM using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
Bi-LSTM Architecture
Legitimate0.9530.9280.9990.9620.8690.9950.8030.889
ConstPos0.9970.8500.9920.9160.9990.9790.9950.987
ConstPosOffset0.9871.0000.0750.1390.9880.5190.9840.680
RandomPos0.9950.8280.8620.8450.9970.8090.8840.844
RandomPosOffset0.9890.9170.0100.0200.9650.1710.9070.288
ConstSpeed0.9970.8450.9040.8730.9990.9840.9540.969
ConstSpeedOffset0.9940.7530.7610.7570.9930.5340.9470.683
RandomSpeed0.9950.8400.7400.7870.9970.7890.9740.872
RandomSpeedOffset0.9930.9010.4410.5930.9650.1930.8350.314
EventualStop0.9960.9000.7620.8260.9890.4170.9740.584
Disruptive0.9900.6510.5650.6050.9910.5750.9410.714
DataReplay0.9900.6010.2750.3770.9810.3610.8880.514
StaleMessages0.9970.9610.8310.8910.9980.8700.9640.915
DoS0.9960.9320.9820.9570.9980.9760.9810.979
DoSRandom0.9870.8790.8200.8480.9910.8800.8490.864
DoSDisruptive0.9850.7970.7410.7680.9970.9680.9420.955
GridSybil0.9930.9660.9190.9420.9980.9930.9690.981
DataReplaySybil0.9910.6340.2430.3510.9930.5260.8000.635
DoSRandomSybil0.9880.7250.9310.8150.9910.8270.8750.850
DoSDisruptiveSybil0.9850.7770.6990.7360.9970.9300.9570.943
Table A6. Per-class performance of GRU using optimal 14-feature set by attack type.
Table A6. Per-class performance of GRU using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
GRU Architecture
Legitimate0.9480.9191.0000.9580.7660.9910.6470.783
ConstPos0.9970.8430.9660.9000.9990.9530.9840.968
ConstPosOffset0.9890.9900.2210.3610.9610.2470.9840.395
RandomPos0.9940.7940.8520.8220.9970.8120.9070.857
RandomPosOffset0.9890.0000.0000.0000.9310.0950.9070.172
ConstSpeed0.9960.7650.9380.8431.0000.9770.9920.985
ConstSpeedOffset0.9940.7800.6350.7000.9940.5540.8930.684
RandomSpeed0.9950.8830.6360.7400.9970.8000.9740.878
RandomSpeedOffset0.9900.8720.1520.2590.9550.1510.8020.254
EventualStop0.9960.8970.7500.8170.9740.2300.9480.371
Disruptive0.9890.6520.4680.5450.9860.4510.7800.571
DataReplay0.9900.6380.0920.1600.9740.2780.8130.414
StaleMessages0.9980.9550.8770.9140.9950.7280.9640.829
DoS0.9930.8770.9790.9250.9960.9470.9550.951
DoSRandom0.9850.8290.8340.8320.9910.8660.8550.861
DoSDisruptive0.9770.6390.6980.6670.9920.8630.8900.876
GridSybil0.9880.9360.8770.9050.9960.9820.9140.947
DataReplaySybil0.9900.4550.3030.3640.9850.3180.8130.457
DoSRandomSybil0.9870.7240.8830.7960.9910.8420.8570.849
DoSDisruptiveSybil0.9780.7360.4380.5490.9940.8940.9100.902
Table A7. Per-class performance of LSTM using optimal 14-feature set by attack type.
Table A7. Per-class performance of LSTM using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
LSTM Architecture
Legitimate0.9440.9141.0000.9550.7660.9930.6470.783
ConstPos0.9950.7550.9530.8430.9980.9430.9780.960
ConstPosOffset0.9860.0000.0000.0000.9670.2760.9430.427
RandomPos0.9930.7830.7890.7860.9970.8060.9190.859
RandomPosOffset0.9891.0000.0060.0110.9300.0950.9330.173
ConstSpeed0.9940.7490.7780.7630.9990.9550.9620.958
ConstSpeedOffset0.9930.7140.6450.6780.9890.4220.9730.589
RandomSpeed0.9930.7630.6410.6960.9960.7740.9220.841
RandomSpeedOffset0.9900.8790.2080.3360.9500.1400.8350.240
EventualStop0.9950.8690.6730.7590.9780.2610.9480.409
Disruptive0.9880.5680.5170.5420.9870.4900.8390.619
DataReplay0.9890.5480.1050.1760.9720.2440.7100.363
StaleMessages0.9970.9640.7630.8520.9970.8150.9910.894
DoS0.9920.8700.9750.9200.9970.9580.9650.961
DoSRandom0.9720.6810.7060.6930.9920.8760.8840.880
DoSDisruptive0.9770.6680.5980.6310.9940.9070.9070.907
GridSybil0.9870.9350.8590.8950.9980.9780.9660.972
DataReplaySybil0.9900.4840.1630.2440.9840.3050.7730.438
DoSRandomSybil0.9750.5490.6480.5940.9920.8750.8530.864
DoSDisruptiveSybil0.9770.6440.5330.5830.9950.9120.9320.922
Table A8. Overall performance comparison on optimal 14-feature set (Basic kinematic + Engineered features).
Table A8. Overall performance comparison on optimal 14-feature set (Basic kinematic + Engineered features).
ModelBalancingAcc.F1Balanced AccuracyCohen’s κ ROC-AUC
LSTMNo SMOTE0.88960.87980.69150.82430.9793
GRUNo SMOTE0.89370.88370.68760.83050.9806
Bi-LSTMNo SMOTE0.91800.90930.73910.86730.9853
LSTMSMOTE0.74040.79180.89410.63510.9890
GRUSMOTE0.73700.78810.88950.62990.9895
Bi-LSTMSMOTE0.84850.87550.92100.76550.9955
Figure A1. Comprehensive performance and efficiency analysis across LSTM, GRU, and Bi-LSTM models. Top row (ad): Classification metrics showing Bi-LSTM superiority with 91.80% accuracy and 90.93% F1-score without SMOTE, while SMOTE improves balanced accuracy from 73.91% to 92.10% at the cost of overall accuracy. Bottom row (eh): Deployment-critical metrics revealing (e) Bi-LSTM requires 45.1 min training and 267 epochs for convergence, (f) uses 24.8 MB memory with 33.6 K parameters, (g) maintains consistent 36 ms mean latency with 46 ms P99 across all models, and (h) achieves 68.83 K samples/second throughput with lowest missed attack rate of 8.98%. The analysis demonstrates that Bi-LSTM’s superior detection performance (2.84% accuracy improvement, 18.7% reduction in missed attacks) justifies its slightly higher computational requirements, making it the optimal choice for safety-critical VANET deployment.
Figure A1. Comprehensive performance and efficiency analysis across LSTM, GRU, and Bi-LSTM models. Top row (ad): Classification metrics showing Bi-LSTM superiority with 91.80% accuracy and 90.93% F1-score without SMOTE, while SMOTE improves balanced accuracy from 73.91% to 92.10% at the cost of overall accuracy. Bottom row (eh): Deployment-critical metrics revealing (e) Bi-LSTM requires 45.1 min training and 267 epochs for convergence, (f) uses 24.8 MB memory with 33.6 K parameters, (g) maintains consistent 36 ms mean latency with 46 ms P99 across all models, and (h) achieves 68.83 K samples/second throughput with lowest missed attack rate of 8.98%. The analysis demonstrates that Bi-LSTM’s superior detection performance (2.84% accuracy improvement, 18.7% reduction in missed attacks) justifies its slightly higher computational requirements, making it the optimal choice for safety-critical VANET deployment.
Jsan 14 00100 g0a1

References

  1. Sonker, A.; Gupta, R.K. A new procedure for misbehavior detection in vehicular ad-hoc networks using machine learning. Int. J. Electr. Comput. Eng. 2021, 11, 2535–2547. [Google Scholar] [CrossRef]
  2. Son, L.H. Dealing with the new user cold-start problem in recommender systems: A comparative review. Inf. Syst. 2016, 58, 87–104. [Google Scholar] [CrossRef]
  3. Xu, X.; Wang, Y.; Wang, P. Comprehensive Review on Misbehavior Detection for Vehicular Ad Hoc Networks. J. Adv. Transp. 2022, 2022, 4725805. [Google Scholar] [CrossRef]
  4. Nobahari, A.; Bakhshayeshi Avval, D.; Akhbari, A.; Nobahary, S. Investigation of Different Mechanisms to Detect Misbehaving Nodes in Vehicle Ad-Hoc Networks (VANETs). Secur. Commun. Networks 2023, 2023, 4020275. [Google Scholar] [CrossRef]
  5. Dineshkumar, R.; Siddhanti, P.; Kodati, S.; Shnain, A.H.; Malathy, V. Misbehavior Detection for Position Falsification Attacks in VANETs Using Ensemble Machine Learning. In Proceedings of the 2024 Second International Conference on Data Science and Information System (ICDSIS), IEEE, Hassan, India, 17–18 May 2024; pp. 1–5. [Google Scholar]
  6. Saudagar, S.; Ranawat, R. An amalgamated novel ids model for misbehaviour detection using vereminet. Comput. Stand. Interfaces 2024, 88, 103783. [Google Scholar] [CrossRef]
  7. Federal Communications Commission. Amendment of Parts 2 and 90 of the Commission’s Rules to Allocate the 5.850–5.925 GHz Band to the Mobile Service for Dedicated Short Range Communications of Intelligent Transportation Systems; Report and Order FCC 99-305; Federal Communications Commission: Washington, DC, USA, 1999; ET Docket No. 98-95. [Google Scholar]
  8. Anwar, W.; Franchi, N.; Fettweis, G. Physical Layer Evaluation of V2X Communications Technologies: 5G NR-V2X, LTE-V2X, IEEE 802.11bd, and IEEE 802.11p. In Proceedings of the IEEE 90th Vehicular Technology Conference (VTC2019-Fall), Honolulu, HI, USA, 22–25 September 2019; pp. 1–7. [Google Scholar] [CrossRef]
  9. Kenney, J.B. Dedicated Short-Range Communications (DSRC) Standards in the United States. Proc. IEEE 2011, 99, 1162–1182. [Google Scholar] [CrossRef]
  10. Standard J2735; Dedicated Short Range Communications (DSRC) Message Set Dictionary. SAE International: Warrendale, PA, USA, 2016.
  11. Arif, M.; Wang, G.; Bhuiyan, M.Z.A.; Wang, T.; Chen, J. A Survey on Security Attacks in VANETs: Communication, Applications and Challenges. Veh. Commun. 2019, 19, 100179. [Google Scholar] [CrossRef]
  12. Alzahrani, M.; Idris, M.Y.; Ghaleb, F.A.; Budiarto, R. An Improved Robust Misbehavior Detection Scheme for Vehicular Ad Hoc Network. IEEE Access 2022, 10, 111241–111253. [Google Scholar] [CrossRef]
  13. Lyamin, N.; Vinel, A.; Jonsson, M.; Loo, J. Real-Time Detection of Denial-of-Service Attacks in IEEE 802.11p Vehicular Networks. IEEE Commun. Lett. 2014, 18, 110–113. [Google Scholar] [CrossRef]
  14. Hammi, B.; Idir, Y.M.; Zeadally, S.; Khatoun, R.; Nebhen, J. Is it Really Easy to Detect Sybil Attacks in C-ITS Environments: A Position Paper. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15069–15078. [Google Scholar] [CrossRef]
  15. Boualouache, A.; Engel, T. A Survey on Machine Learning-based Misbehavior Detection Systems for 5G and Beyond Vehicular Networks. IEEE Commun. Surv. Tutorials 2023, 25, 1128–1172. [Google Scholar] [CrossRef]
  16. Grover, J.; Prajapati, N.K.; Laxmi, V.; Gaur, M.S. Machine Learning Approach for Multiple Misbehavior Detection in VANET. In Proceedings of the International Conference on Advances in Computing and Communications, Kochi, India, 22–24 July 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 644–653. [Google Scholar]
  17. So, S.; Sharma, P.; Petit, J. Integrating Plausibility Checks and Machine Learning for Misbehavior Detection in VANET. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Orlando, FL, USA, 17–20 December 2018; pp. 564–571. [Google Scholar]
  18. Sharma, P.; Jaekel, A. Machine Learning Based Misbehaviour Detection in VANET Using Consecutive BSM Approach. IEEE Open J. Veh. Technol. 2021, 3, 1–14. [Google Scholar] [CrossRef]
  19. Zhang, C.; Chen, K.; Zeng, X.; Xue, X. Misbehavior Detection Based on Support Vector Machine and Dempster-Shafer Theory of Evidence in VANETs. IEEE Access 2018, 6, 59860–59870. [Google Scholar] [CrossRef]
  20. Kamel, J.; Haidar, F.; Jemaa, I.B.; Kaiser, A.; Lonc, B.; Urien, P. A Misbehavior Authority System for Sybil Attack Detection in C-ITS. In Proceedings of the 2019 IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), IEEE, New York City, NY, USA, 10–12 October 2019; pp. 1117–1123. [Google Scholar] [CrossRef]
  21. Alladi, T.; Kohli, V.; Chamola, V.; Yu, F.R. Securing the Internet of Vehicles: A Deep Learning-Based Classification Framework. IEEE Netw. Lett. 2021, 3, 94–97. [Google Scholar] [CrossRef]
  22. Alladi, T.; Gera, B.; Agrawal, A.; Chamola, V.; Yu, F.R. DeepADV: A deep neural network framework for anomaly detection in VANETs. IEEE Trans. Veh. Technol. 2021, 70, 12013–12023. [Google Scholar] [CrossRef]
  23. Campos, E.M.; Hernandez-Ramos, J.L.; Vidal, A.G.; Baldini, G.; Skarmeta, A. Misbehavior Detection in Intelligent Transportation Systems Based on Federated Learning. Internet Things 2024, 25, 101127. [Google Scholar] [CrossRef]
  24. Gurjar, D.; Grover, J.; Kheterpal, V.; Vasilakos, A. Federated Learning-Based Misbehavior Classification System for VANET Intrusion Detection. J. Intell. Inf. Syst. 2025, 63, 807–830. [Google Scholar] [CrossRef]
  25. Kumar, A.; Sharma, P.; Singh, R. Enhanced VANET Security Using XGBoost with Temporal Feature Engineering. Comput. Networks 2022, 215, 109183. [Google Scholar]
  26. Chen, X.; Wang, J.; Liu, F. Attention-Enhanced LSTM for Real-Time VANET Misbehavior Detection. Comput. Commun. 2024, 198, 45–56. [Google Scholar]
  27. Khan, W.; Ahmad, J.; Alasbali, N.; Al Mazroa, A.; Alshehri, M.S.; Khan, M.S. A novel transformer-based explainable AI approach using SHAP for intrusion detection in vehicular ad hoc networks. Comput. Networks 2025, 270, 111575. [Google Scholar] [CrossRef]
  28. Yuce, M.F.; Erturk, M.A.; Aydin, M.A. Misbehavior detection with spatio-temporal graph neural networks. Comput. Electr. Eng. 2024, 116, 109198. [Google Scholar] [CrossRef]
  29. van der Heijden, R.W.; Lukaseder, T.; Kargl, F. VeReMi: A Dataset for Comparable Evaluation of Misbehavior Detection in VANETs. In Proceedings of the International Conference on Security and Privacy in Communication Systems, Singapore, 8–10 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 318–337. [Google Scholar] [CrossRef]
  30. Kamel, J.; Wolf, M.; van der Hei, R.W.; Kaiser, A.; Urien, P.; Kargl, F. VeReMi Extension: A Dataset for Comparable Evaluation of Misbehavior Detection in VANETs. In Proceedings of the ICC 2020-IEEE International Conference on Communications, Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
  31. Kamel, J.; Ansari, M.R.; Petit, J.; Kaiser, A.; Jemaa, I.B.; Urien, P. Simulation Framework for Misbehavior Detection in Vehicular Networks. IEEE Trans. Veh. Technol. 2020, 69, 6631–6643. [Google Scholar] [CrossRef]
  32. Codeca, L.; Frank, R.; Faye, S.; Engel, T. Luxembourg SUMO Traffic (LuST) Scenario: Traffic Demand Evaluation. IEEE Intell. Transp. Syst. Mag. 2017, 9, 52–63. [Google Scholar] [CrossRef]
  33. Lv, C.; Lam, C.C.; Cao, Y.; Wang, Y.; Kaiwartya, O.; Wu, C. Leveraging Geographic Information and Social Indicators for Misbehavior Detection in VANETs. IEEE Trans. Consum. Electron. 2024, 70, 4411–4424. [Google Scholar] [CrossRef]
  34. Valentini, E.P.; Rocha Filho, G.P.; De Grande, R.E.; Ranieri, C.M.; Júnior, L.A.P.; Meneguette, R.I. A novel mechanism for misbehavior detection in vehicular networks. IEEE Access 2023, 11, 68113–68126. [Google Scholar] [CrossRef]
  35. Naqvi, I.; Chaudhary, A.; Kumar, A. A Neuro-Genetic Security Framework for Misbehavior Detection in VANETs. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 410–418. [Google Scholar] [CrossRef]
  36. Sangwan, A.; Sangwan, A.; Singh, R.P. A Classification of Misbehavior Detection Schemes for VANETs: A Survey. Wirel. Pers. Commun. 2023, 129, 285–322. [Google Scholar] [CrossRef]
  37. Patel, N.; Gupta, R.; Agarwal, S. Advanced Ensemble Methods for Large-Scale VANET Misbehavior Detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 2847–2859. [Google Scholar]
  38. Alladi, T.; Agrawal, A.; Gera, B.; Chamola, V.; Sikdar, B.; Guizani, M. Deep neural networks for securing IoT enabled vehicular ad-hoc networks. In Proceedings of the ICC 2021-IEEE International Conference on Communications, IEEE, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
  39. Liu, W.; Zhang, M.; Chen, L. Graph Neural Networks for Vehicular Behavior Analysis in VANETs. IEEE Trans. Veh. Technol. 2023, 72, 9876–9887. [Google Scholar]
  40. Wang, H.; Li, Q.; Zhou, Y. Transformer-Based Architectures for VANET Security: A Comprehensive Study. IEEE Netw. 2024, 38, 112–119. [Google Scholar]
  41. Kim, S.H.; Park, M.J.; Lee, D.W. Hybrid SVM-GRU Architecture for Enhanced VANET Intrusion Detection. J. Netw. Comput. Appl. 2023, 203, 103401. [Google Scholar]
  42. Rodriguez, C.; Martinez, E.; Gonzalez, P. RF-LSTM Hybrid Framework for Large-Scale VANET Misbehavior Detection. IEEE Trans. Netw. Serv. Manag. 2024, 21, 2234–2245. [Google Scholar]
Figure 1. Overview of the VeMisNet data flow including preprocessing, feature engineering, and classification stages.
Figure 1. Overview of the VeMisNet data flow including preprocessing, feature engineering, and classification stages.
Jsan 14 00100 g001
Figure 2. Feature engineering approaches: UMAP dimensionality reduction, manual domain-informed selection, and novel spatiotemporal feature augmentation for enhanced VANET misbehavior detection.
Figure 2. Feature engineering approaches: UMAP dimensionality reduction, manual domain-informed selection, and novel spatiotemporal feature augmentation for enhanced VANET misbehavior detection.
Jsan 14 00100 g002
Figure 3. Comprehensive class imbalance analysis of the VANET dataset (500 K sample). (a) Sample count distribution showing absolute numbers for each class category. (b) Percentage distribution illustrating the severe imbalance with legitimate vehicles comprising 59.6% of samples. (c) Attack types distribution excluding legitimate vehicles to highlight attack class variations. (d) Imbalance severity analysis showing ratios relative to the legitimate class, with the maximum ratio reaching 12.7:1 for other attacks.
Figure 3. Comprehensive class imbalance analysis of the VANET dataset (500 K sample). (a) Sample count distribution showing absolute numbers for each class category. (b) Percentage distribution illustrating the severe imbalance with legitimate vehicles comprising 59.6% of samples. (c) Attack types distribution excluding legitimate vehicles to highlight attack class variations. (d) Imbalance severity analysis showing ratios relative to the legitimate class, with the maximum ratio reaching 12.7:1 for other attacks.
Jsan 14 00100 g003
Figure 4. Binary classification performance comparison across dataset sizes. (left) Accuracy comparison showing GRU achieving highest accuracy of 84.4% at 100 k dataset size, followed by Bi-LSTM (84.3%) and LSTM (83.7%). (right) F1-score comparison demonstrating similar trends with Bi-LSTM showing more consistent performance across larger datasets. All models exhibit performance degradation with increasing dataset size, with GRU showing the steepest decline.
Figure 4. Binary classification performance comparison across dataset sizes. (left) Accuracy comparison showing GRU achieving highest accuracy of 84.4% at 100 k dataset size, followed by Bi-LSTM (84.3%) and LSTM (83.7%). (right) F1-score comparison demonstrating similar trends with Bi-LSTM showing more consistent performance across larger datasets. All models exhibit performance degradation with increasing dataset size, with GRU showing the steepest decline.
Jsan 14 00100 g004
Figure 5. Multi-class classification accuracy using UMAP feature selection. Bi-LSTM consistently achieves highest accuracy, particularly at smaller dataset sizes, with optimal performance at 10–20 features across all models.
Figure 5. Multi-class classification accuracy using UMAP feature selection. Bi-LSTM consistently achieves highest accuracy, particularly at smaller dataset sizes, with optimal performance at 10–20 features across all models.
Jsan 14 00100 g005aJsan 14 00100 g005b
Figure 6. Multi-class classification F1-score using UMAP feature selection. Similar trends to accuracy with Bi-LSTM demonstrating superior performance and optimal feature range of 10–20 features.
Figure 6. Multi-class classification F1-score using UMAP feature selection. Similar trends to accuracy with Bi-LSTM demonstrating superior performance and optimal feature range of 10–20 features.
Jsan 14 00100 g006
Figure 7. Multi-class classification performance using basic kinematic features. Bi-LSTM consistently achieves highest performance with 81.39% accuracy at 100 k records, demonstrating graceful degradation with increasing dataset size across all metrics.
Figure 7. Multi-class classification performance using basic kinematic features. Bi-LSTM consistently achieves highest performance with 81.39% accuracy at 100 k records, demonstrating graceful degradation with increasing dataset size across all metrics.
Jsan 14 00100 g007
Figure 8. Training time comparison between multi-class and binary classification approaches.
Figure 8. Training time comparison between multi-class and binary classification approaches.
Jsan 14 00100 g008
Table 1. Comparative performance analysis of VANET misbehavior detection approaches. Comprehensive literature review with extended comparisons is provided in Appendix A (Table A1).
Table 1. Comparative performance analysis of VANET misbehavior detection approaches. Comprehensive literature review with extended comparisons is provided in Appendix A (Table A1).
StudyYearApproachDatasetAccuracyF1-ScoreSpeed (Samples/s)
Grover et al. [16]2011Random ForestCustom93–95%0.9220,000
Sharma & Jaekel [18]2021ML EnsembleVeReMi98.5%0.9715,800
Kumar et al. [25]2022XGBoost + TemporalVeReMi96.2%0.9516,500
Kamel et al. [20]2019LSTMVeReMi89.5%0.879200
Chen et al. [26]2024Attention-LSTMVeReMi Ext93.1%0.919100
Khan et al. [27]2025Transformer + SHAPVeReMi Ext96.2%7200
Yüce et al. [28]2024Spatiotemporal GNNConverted MBD99.9%4800
Table 2. Research gaps addressed by VeMisNet.
Table 2. Research gaps addressed by VeMisNet.
Identified Gap in LiteratureVeMisNet Contribution
Systematic Architecture Comparison
Inconsistent DL evaluation protocolsUnified LSTM/GRU/Bi-LSTM comparison
Different datasets and feature setsIdentical setup with 14 optimized features
Domain-Informed Feature Engineering
Algorithm emphasis over feature design14 spatiotemporal DSRC-aware features
Limited communication pattern analysisInter-message timing and neighbor density
Large-Scale Validation
Evaluation limited to <1 M samplesTesting from 100 K to 2 M records
Unknown scalability characteristicsConsistent performance across scales
Class Imbalance Treatment
Inadequate handling of imbalanced dataPost-split SMOTE with validation
Poor minority attack detection40–50% improvement in rare classes
Deployment Readiness
Theoretical focus without practical analysisReal-time inference: 8437–13,683 samples/s
No deployment guidelines providedSub-100 ms response capability
Table 3. Attack types in the VeReMi Extension dataset.
Table 3. Attack types in the VeReMi Extension dataset.
LabelAttack NameCategory
0Legitimate Vehicle
1–4Constant PositionPosition Falsification
Constant Position Offset
Random Position
Random Position Offset
5–8Constant SpeedSpeed Manipulation
Constant Speed Offset
Random Speed
Random Speed Offset
9–12Eventual StopFreeze/Replay Attacks
Disruptive
Data Replay
Stale Messages
13–15Denial of Service (DoS)DoS Variants
DoS Random
DoS Disruptive
16–19Grid SybilSybil-Based Attacks
Data Replay Sybil
DoS Random Sybil
DoS Disruptive Sybil
Table 4. Progressive feature selection using UMAP dimensionality reduction.
Table 4. Progressive feature selection using UMAP dimensionality reduction.
FeaturesCategorySelected Features
5Temporal and IDReceive Time, Receiver Label, Receiver ID,
Module ID, Send Time
10Temporal and IDPrevious 5 features
Communication+ Sender ID, Sender Pseudo, Message ID
Spatial+ Position ( x , y )
15Temporal and IDPrevious 5 features
Communication+ Sender ID, Sender Pseudo, Message ID
Spatial+ Position ( x , y ) , Position Noise ( x , y )
Kinematic+ Speed ( x , y ) , Speed Noise ( x )
20Temporal and IDPrevious 5 features
Communication+ Sender ID, Sender Pseudo, Message ID
Spatial+ Position ( x , y ) , Position Noise ( x , y )
Kinematic+ Speed ( x , y ) , Speed Noise ( x , y )
Dynamic+ Acceleration ( x , y ) , Acceleration Noise ( x , y )
25Temporal and IDPrevious 5 features
Communication+ Sender ID, Sender Pseudo, Message ID
Spatial+ Position ( x , y ) , Position Noise ( x , y )
Kinematic+ Speed ( x , y ) , Speed Noise ( x , y )
Dynamic+ Acceleration ( x , y ) , Acceleration Noise ( x , y )
Directional+ Heading ( x , y ) , Heading Noise ( x , y )
Table 5. Class distribution and imbalance analysis.
Table 5. Class distribution and imbalance analysis.
Class CategorySample CountPercentageImbalance RatioDetection Difficulty
Legitimate Vehicles297,82259.6%1:1 (baseline)Low
Sybil Attacks65,64913.1%4.5:1Moderate
DoS Attacks60,21312.0%4.9:1Moderate
Position Attacks27,6535.5%10.8:1High
Speed Attacks24,0384.8%12.4:1Very High
Replay Attacks23,6254.7%12.7:1Very High
Total499,000100.0%--
Table 6. Evaluation metrics framework.
Table 6. Evaluation metrics framework.
MetricFormula
Accuracy T P + T N T P + F P + T N + F N
Balanced Accuracy 1 N i = 1 N TP i TP i + FN i
Precision T P T P + F P
Recall T P T P + F N
F1-Score 2 × Precision × Recall Precision + Recall
Table 7. Evaluation metrics for binary classification—Normal or Attacker—with basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
Table 7. Evaluation metrics for binary classification—Normal or Attacker—with basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
RecordsMetricLSTMGRUBi-LSTM
100 kAccuracy0.83730.84390.8432
F1-Score0.82800.83460.8352
Recall0.83730.84390.8432
Precision0.84320.85160.8476
500 kAccuracy0.80530.80670.8163
F1-Score0.79560.79740.8065
Recall0.80530.80670.8163
Precision0.82110.82160.8362
1 MAccuracy0.83420.79760.8046
F1-Score0.82680.78880.7959
Recall0.83420.79760.8046
Precision0.83530.80460.8204
2 MAccuracy0.80160.79820.8062
F1-Score0.79090.78780.7959
Recall0.80160.79820.8062
Precision0.82340.81780.8283
Table 8. Multi-class VANET attack detection using original features (temporal, kinematic, noise, and filtered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 8. Multi-class VANET attack detection using original features (temporal, kinematic, noise, and filtered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
RecordsMetricLSTMGRUBi-LSTM
500 kAccuracy0.93550.93590.9702
(0.9344, 0.9366)(0.9344, 0.9373)(0.9692, 0.9712)
F1-Score0.93210.93300.9694
(0.9308, 0.9334)(0.9314, 0.9346)(0.9684, 0.9705)
Precision0.93320.93310.9694
(0.9319, 0.9345)(0.9314, 0.9347)(0.9684, 0.9705)
1 MAccuracy0.90220.90430.9449
(0.9011, 0.9032)(0.9032, 0.9054)(0.9438, 0.9459)
F1-Score0.89690.89920.9434
(0.8957, 0.8981)(0.8980, 0.9005)(0.9422, 0.9444)
Precision0.89750.89950.9437
(0.8963, 0.8987)(0.8983, 0.9007)(0.9426, 0.9448)
2 MAccuracy0.88720.89030.9272
(0.8856, 0.8888)(0.8887, 0.8919)(0.9258, 0.9286)
F1-Score0.88300.88650.9255
(0.8813, 0.8847)(0.8848, 0.8882)(0.9240, 0.9270)
Precision0.88450.88750.9260
(0.8828, 0.8862)(0.8858, 0.8892)(0.9245, 0.9275)
Table 9. Training efficiency and security performance for VANET attack detection models.
Table 9. Training efficiency and security performance for VANET attack detection models.
RecordsMetricLSTMGRUBi-LSTM
500 kTraining Time (min)136.6158.9158.3
Mean Latency (ms)39.3140.4139.68
Missed Attack Rate (%)5.045.162.32
1 MTraining Time (min)255.2208.8259.2
Mean Latency (ms)38.2639.8338.83
Missed Attack Rate (%)7.357.353.82
2 MTraining Time (min)357.0282.0389.0
Mean Latency (ms)37.539.038.0
Missed Attack Rate (%)8.087.944.97
Table 10. Performance degradation analysis: scaling impact on VANET attack detection.
Table 10. Performance degradation analysis: scaling impact on VANET attack detection.
Scaling TransitionAccuracy Change (%)Missed Attack Rate Change
LSTM GRU Bi-LSTM LSTM GRU Bi-LSTM
500 k → 1 M−3.33−3.16−2.53+2.31%+2.19%+1.50%
1 M → 2 M−1.50−1.40−1.77+0.73%+0.59%+1.15%
Total (500 k → 2 M)−4.83−4.56−4.30+3.04%+2.78%+2.65%
Table 11. UMAP feature selection results: model performance comparison.
Table 11. UMAP feature selection results: model performance comparison.
Dataset
Size
Features
Count
GRULSTMBi-LSTM
Acc F1 Prec Acc F1 Prec Acc F1 Prec
100 k50.66330.52900.44000.66330.52900.44000.66330.52910.4400
100.71250.61280.54800.71790.61950.55210.71890.62010.5583
150.71500.61640.55580.71550.61710.55270.71690.61940.5560
200.71800.62030.55590.71460.61290.55240.71970.62170.5786
250.71510.61630.55670.71450.60850.53720.71600.61870.5563
500 k50.59620.44540.35540.59620.44540.35550.66330.52900.4400
100.63570.51470.45600.63740.51700.46200.71890.62010.5583
150.63760.51180.46080.63610.51570.45900.71690.61940.5560
200.63890.51810.46460.63620.51590.46130.71970.62170.5786
250.63660.51690.46270.63680.51710.46720.71600.61870.5563
1 M50.58410.43080.34130.58450.43080.34120.58410.43080.3412
100.62020.49140.45460.61980.49290.43700.62160.49540.4519
150.62170.49840.43920.62020.49220.43910.62120.49630.4371
200.62040.49190.43590.62010.49170.43820.62200.49810.4369
250.62030.49520.43760.62000.49380.44010.62150.49580.4459
2 M50.58580.43280.34310.58580.43280.34310.58580.43280.3431
100.61980.48880.44020.62270.49500.43580.61950.49030.4330
150.62100.49910.43160.62020.49050.43470.61050.47640.4147
200.62060.49420.42770.61960.48810.44390.60900.47310.4097
250.62260.49810.43410.62300.49570.43480.61330.48780.4267
Table 12. Multi-class classification using basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
Table 12. Multi-class classification using basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
RecordsMetricLSTMGRUBi-LSTM
100 kAccuracy0.80120.80230.8139
F1-Score0.75620.75620.7735
Precision0.76720.75220.7831
500 kAccuracy0.75020.74060.7530
F1-Score0.68380.67100.6873
Precision0.68780.64760.7031
1 MAccuracy0.72030.72120.7378
F1-Score0.64770.64890.6730
Precision0.66840.66830.6911
2 MAccuracy0.71920.71570.7202
F1-Score0.64300.63990.6442
Precision0.63610.61570.6475
Table 13. Multi-class VANET attack detection using enhanced feature set (Basic kinematic + Engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 13. Multi-class VANET attack detection using enhanced feature set (Basic kinematic + Engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
RecordsMetricLSTMGRUBi-LSTM
500 kAccuracy0.88960.89370.9180
(0.8877, 0.8915)(0.8918, 0.8955)(0.9163, 0.9197)
F1-Score0.87980.88370.9093
(0.8777, 0.8819)(0.8816, 0.8858)(0.9074, 0.9113)
Precision0.88450.88860.9148
(0.8824, 0.8866)(0.8865, 0.8907)(0.9131, 0.9165)
1 MAccuracy0.86860.85880.8896
(0.8671, 0.8700)(0.8572, 0.8603)(0.8883, 0.8911)
F1-Score0.85010.84430.8780
(0.8484, 0.8517)(0.8425, 0.8460)(0.8766, 0.8797)
Precision0.84980.85090.8845
(0.8480, 0.8516)(0.8491, 0.8527)(0.8830, 0.8861)
2 MAccuracy0.84760.82390.8612
(0.8458, 0.8494)(0.8220, 0.8258)(0.8596, 0.8628)
F1-Score0.82040.80490.8467
(0.8184, 0.8224)(0.8028, 0.8070)(0.8450, 0.8484)
Precision0.81510.81320.8542
(0.8130, 0.8172)(0.8111, 0.8153)(0.8525, 0.8559)
Table 14. Training efficiency and security performance for enhanced feature set models.
Table 14. Training efficiency and security performance for enhanced feature set models.
RecordsMetricLSTMGRUBi-LSTM
500 kTraining Time (min)42.339.745.1
Mean Latency (ms)35.8736.1435.99
Missed Attack Rate (%)11.0510.778.98
1 MTraining Time (min)85.278.691.8
Mean Latency (ms)34.8235.8935.86
Missed Attack Rate (%)14.1913.1312.01
2 MTraining Time (min)178.5162.3195.2
Mean Latency (ms)34.535.735.8
Missed Attack Rate (%)16.8516.4214.53
Table 15. Performance degradation analysis: scaling impact with enhanced feature set.
Table 15. Performance degradation analysis: scaling impact with enhanced feature set.
Scaling TransitionAccuracy Change (%)Missed Attack Rate Change
LSTMGRUBi-LSTMLSTMGRUBi-LSTM
500 k → 1 M−2.10−3.49−2.84+3.14%+2.36%+3.03%
1 M → 2 M−2.10−3.49−2.84+2.66%+3.29%+2.52%
Total (500 k → 2 M)−4.20−6.98−5.68+5.80%+5.65%+5.55%
Table 16. Extended performance metrics for enhanced feature set.
Table 16. Extended performance metrics for enhanced feature set.
RecordsMetricLSTMGRUBi-LSTM
500 kRecall0.8896 (0.8877, 0.8915)0.8937 (0.8918, 0.8955)0.9180 (0.9163, 0.9197)
MCC0.8267 (0.8238, 0.8296)0.8328 (0.8299, 0.8357)0.8688 (0.8661, 0.8715)
Cohen’s  κ 0.8243 (0.8213, 0.8273)0.8305 (0.8275, 0.8335)0.8673 (0.8645, 0.8700)
Balanced Accuracy0.6915 (0.6869, 0.6963)0.6876 (0.6829, 0.6924)0.7391 (0.7349, 0.7435)
ROC-AUC  (Macro)0.97930.98060.9853
ECE0.00890.00810.0064
NPV (%)99.5899.6299.67
1 MRecall0.8686 (0.8671, 0.8700)0.8588 (0.8572, 0.8603)0.8896 (0.8883, 0.8911)
MCC0.7914 (0.7892, 0.7936)0.7758 (0.7734, 0.7781)0.8259 (0.8240, 0.8280)
Cohen’s  κ 0.7878 (0.7856, 0.7900)0.7729 (0.7705, 0.7753)0.8234 (0.8214, 0.8255)
Balanced Accuracy0.6346 (0.6317, 0.6375)0.6301 (0.6272, 0.6332)0.7021 (0.6992, 0.7051)
ROC-AUC (Macro)0.96750.97080.9792
ECE0.00720.00730.0054
NPV (%)98.8298.8798.98
2 MRecall0.8476 (0.8458, 0.8494)0.8239 (0.8220, 0.8258)0.8612 (0.8596, 0.8628)
MCC0.7561 (0.7536, 0.7586)0.7167 (0.7140, 0.7194)0.7830 (0.7808, 0.7852)
Cohen’s  κ 0.7514 (0.7489, 0.7539)0.7153 (0.7126, 0.7180)0.7795 (0.7773, 0.7817)
Balanced Accuracy0.5777 (0.5745, 0.5809)0.5737 (0.5704, 0.5770)0.6651 (0.6619, 0.6683)
ROC-AUC (Macro)0.95570.95830.9701
ECE0.00950.00920.0071
NPV (%)98.0598.1298.29
Table 17. Comprehensive feature engineering comparison analysis.
Table 17. Comprehensive feature engineering comparison analysis.
ApproachFeaturesAcc (%)F1 (%)Rec (%)Prec (%)Improvement
UMAP Features10–2071.9762.1771.9757.86Baseline
Basic Kinematic881.4077.3081.4078.10+9.42 (+13.06%)
Newly Added1491.8090.9391.8091.48+19.78 (+27.48%)
Table 18. Multi-class VANET attack detection using comprehensive feature set (original + engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 18. Multi-class VANET attack detection using comprehensive feature set (original + engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
RecordsMetricLSTMGRUBi-LSTM
500 kAccuracy0.99660.99630.9981
(0.9963, 0.9969)(0.9960, 0.9966)(0.9978, 0.9984)
F1-Score0.99660.99630.9981
(0.9963, 0.9969)(0.9960, 0.9966)(0.9978, 0.9984)
Precision0.99660.99630.9981
(0.9963, 0.9969)(0.9960, 0.9966)(0.9978, 0.9984)
1 MAccuracy0.99350.99270.9970
(0.9932, 0.9938)(0.9924, 0.9930)(0.9967, 0.9973)
F1-Score0.99350.99260.9970
(0.9932, 0.9938)(0.9923, 0.9929)(0.9967, 0.9973)
Precision0.99350.99270.9970
(0.9932, 0.9938)(0.9924, 0.9930)(0.9967, 0.9973)
2 MAccuracy0.99040.98910.9959
(0.9900, 0.9908)(0.9887, 0.9895)(0.9955, 0.9963)
F1-Score0.99040.98910.9959
(0.9900, 0.9908)(0.9887, 0.9895)(0.9955, 0.9963)
Precision0.99040.98910.9959
(0.9900, 0.9908)(0.9887, 0.9895)(0.9955, 0.9963)
Table 19. Training efficiency and security performance with comprehensive feature set.
Table 19. Training efficiency and security performance with comprehensive feature set.
RecordsMetricLSTMGRUBi-LSTM
500 kTraining Time (min)83.783.871.2
Mean Latency (ms)42.3240.8941.76
Missed Attack Rate (%)0.370.420.19
1 MTraining Time (min)251.8207.6230.0
Mean Latency (ms)37.9239.5940.90
Missed Attack Rate (%)0.730.820.33
2 MTraining Time (min)420.0330.0380.0
Mean Latency (ms)37.539.240.5
Missed Attack Rate (%)1.091.220.47
Table 20. Performance degradation analysis: scaling impact with comprehensive feature set.
Table 20. Performance degradation analysis: scaling impact with comprehensive feature set.
Scaling TransitionAccuracy Change (%)Missed Attack Rate Change
LSTM GRU Bi-LSTM LSTM GRU Bi-LSTM
500 k → 1 M−0.31−0.36−0.11+0.36%+0.40%+0.14%
1 M → 2 M−0.31−0.36−0.11+0.36%+0.40%+0.14%
Total (500 k → 2 M)−0.62−0.72−0.22+0.72%+0.80%+0.28%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Youness, N.; Mostafa, A.; Sobh, M.A.; Bahaa, A.M.; Nagaty, K. VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks. J. Sens. Actuator Netw. 2025, 14, 100. https://doi.org/10.3390/jsan14050100

AMA Style

Youness N, Mostafa A, Sobh MA, Bahaa AM, Nagaty K. VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks. Journal of Sensor and Actuator Networks. 2025; 14(5):100. https://doi.org/10.3390/jsan14050100

Chicago/Turabian Style

Youness, Nayera, Ahmad Mostafa, Mohamed A. Sobh, Ayman M. Bahaa, and Khaled Nagaty. 2025. "VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks" Journal of Sensor and Actuator Networks 14, no. 5: 100. https://doi.org/10.3390/jsan14050100

APA Style

Youness, N., Mostafa, A., Sobh, M. A., Bahaa, A. M., & Nagaty, K. (2025). VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks. Journal of Sensor and Actuator Networks, 14(5), 100. https://doi.org/10.3390/jsan14050100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop