You are currently viewing a new version of our website. To view the old version click .
Journal of Sensor and Actuator Networks
  • Article
  • Open Access

9 October 2025

VeMisNet: Enhanced Feature Engineering for Deep Learning-Based Misbehavior Detection in Vehicular Ad Hoc Networks

,
,
,
and
1
Department of Computer Networks, The British University in Egypt, Cairo 11837, Egypt
2
Computer and Systems Engineering Department, Faculty of Engineering, Ain Shams University, Cairo 11517, Egypt
3
School of Business, Technology, and Health Administration, Capella University, Minneapolis, MN 55402, USA
4
Faculty of Engineering Technology, Elsewedy University of Technology, Cairo 23751, Egypt

Abstract

Ensuring secure and reliable communication in Vehicular Ad hoc Networks (VANETs) is critical for safe transportation systems. This paper presents Vehicular Misbehavior Network (VeMisNet), a deep learning framework for detecting misbehaving vehicles, with primary contributions in systematic feature engineering and scalability analysis. VeMisNet introduces domain-informed spatiotemporal features—including DSRC neighborhood density, inter-message timing patterns, and communication frequency analysis—derived from the publicly available VeReMi Extension Dataset. The framework evaluates Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM architectures across dataset scales from 100 K to 2 M samples, encompassing all 20 attack categories. To address severe class imbalance (59.6% legitimate vehicles), VeMisNet applies SMOTE post train–test split, preventing data leakage while enabling balanced evaluation. Bidirectional LSTM with engineered features achieves 99.81% accuracy and F1-score on 500 K samples, with remarkable scalability maintaining >99.5% accuracy at 2 M samples. Critical metrics include 0.19% missed attack rates, under 0.05% false alarms, and 41.76 ms inference latency. The study acknowledges important limitations, including reliance on simulated data, single-split evaluation, and potential adversarial vulnerability. Domain-informed feature engineering provides 27.5% relative improvement over dimensionality reduction and 22-fold better scalability than basic features. These results establish new VANET misbehavior detection benchmarks while providing honest assessment of deployment readiness and research constraints.

1. Introduction

Vehicular Ad hoc Network (VANET) is an ad hoc network, with a focus on developing communications among moving vehicles on roads and among vehicles and roadside infrastructure. As a subclass of Mobile Ad hoc Networks (MANETs), VANETs share MANET-like routing challenges that include the lack of an underlying communications infrastructure and packet forwarding among nodes. They are receiving a lot of attention because of their potential integration into Intelligent Transportation Systems (ITSs) as an essential aspect, with a broad range of applications including traffic control, driver assistance, and road safety. The success, however, of such applications depends upon reliable and timely communication among vehicles [,]. Unfortunately, because of the dynamic behavior of VANETs and also because malicious parties are able to tamper with shared data, VANETs are prone to misbehaving vehicles that are able to threaten network performance and put people’s lives at risk. This further imposes additional challenges of misrouting of packets, eavesdropping, vehicle tracking, and injection of spoofed emergency messages. Due to the direct influence of VANET security over human lives, strong detection capabilities are needed. For responding to all such challenges, there are several approaches put forward by researchers for detecting and responding to misbehaving vehicles by analyzing behavioral patterns and anomalies detection []. Deep learning (DL) approaches are now proving particularly effective for security-related tasks including detection of malware, intrusion detection, and spam filtering. However, not a lot of research has been conducted exploring their usage for vehicular security, again mainly due to a lack of available labeled datasets with realistic scenarios of attacks [,,].
This paper introduces VeMisNet, a deep learning framework for detecting misbehaving vehicles in VANETs that integrates temporal sequence learning with domain-informed feature engineering. Our key contributions include (1) novel spatiotemporal and communication-aware features including time between messages, packet rate, and DSRC-range neighbor count; (2) comprehensive multi-class evaluation across 20 attack types using up to 2 million samples from the VeReMi Extension dataset; (3) proper class imbalance handling through post-split SMOTE application; and (4) UMAP-based feature selection for optimal feature subsets ranging from 5 to 25 features.
The remainder of this paper is organized as follows: Section 2 provides an overview of VANET fundamentals and security threats, presenting a comprehensive review of existing literature on VANET misbehavior detection and identifying key research gaps addressed by this work. Section 3 details the proposed VeMisNet system model, deep learning architectures, feature engineering methodology, and dataset characteristics. Section 4 reports and analyzes experimental validation results across multiple architectures and dataset scales, including comprehensive feature importance analysis that validates our domain-informed engineering approach and guides optimal feature selection. Finally, Section 5 concludes the paper and outlines directions for future research.

3. System Model and Methodology

This section presents the comprehensive VeMisNet framework for VANET misbehavior detection, integrating deep learning architectures with domain-informed feature engineering to achieve robust classification across diverse attack scenarios.

3.1. VeMisNet Framework Overview

Figure 1 illustrates the VeMisNet data flow pipeline, which transforms raw vehicular communication data into security intelligence through six integrated stages.
Figure 1. Overview of the VeMisNet data flow including preprocessing, feature engineering, and classification stages.
Stage 1: Input Data—The pipeline processes the VeReMi Extension dataset containing 2 million samples across 20 attack categories and legitimate vehicle communications.
Stage 2: Data Preprocessing—Parallel operations ensure data quality through cleaning (duplicate removal, missing value imputation, normalization) and splitting (train/test/validation subsets) while preserving temporal characteristics. As detailed in Figure 1, the preprocessing pipeline consists of three essential steps: duplicate removal ensures data integrity across the 2 M sample dataset, regression imputation handles missing kinematic values, and coordinate separation enables spatial feature extraction for geographic analysis.
Stage 3: Feature Engineering—Domain-informed spatiotemporal feature extraction through three parallel processes as illustrated in Figure 2: (1) spatial features capture neighborhood density within DSRC ranges (100 m, 200 m, 300 m); (2) temporal features analyze inter-message timing and packet rates; (3) kinematic features extract relative motion patterns. The feature engineering methodology employs three distinct approaches: UMAP-based dimensionality reduction for automated feature selection, manual domain-informed selection leveraging VANET communication expertise, and augmentation with newly engineered spatiotemporal features. These processes converge to produce carefully selected features for misbehavior detection.
Figure 2. Feature engineering approaches: UMAP dimensionality reduction, manual domain-informed selection, and novel spatiotemporal feature augmentation for enhanced VANET misbehavior detection.
Stage 4: Class Balancing—SMOTE is applied after train–test split to prevent data leakage while creating balanced training across all 20 attack categories.
Stage 5: Deep Learning Models—Three recurrent architectures process the balanced dataset: LSTM, GRU, and Bidirectional LSTM for comparative evaluation.
Stage 6: Classification Output—Models produce binary (attack/normal) and multi-class (20 attack types) classifications with comprehensive performance evaluation.

3.2. Dataset Description

VeMisNet utilizes the VeReMi Extension dataset [], which enhances the original VeReMi dataset [] with realistic sensor error models and expanded attack scenarios. The dataset incorporates sensor noise for four primary data fields (position, speed, acceleration, heading) and employs the Framework for Misbehavior Detection (F2MD) [] with Luxembourg SUMO Traffic scenario traces [].
The dataset provides comprehensive 24 h simulation periods with 30% malicious node penetration rate and vehicle density of 23.29 V/km2. It contains approximately 7000 attacker vehicles generating 7.5 million messages alongside 17,000 legitimate vehicles producing 12 million messages, creating a realistic evaluation environment for VANET security research.
Table 3 presents the 20 attack types included in the VeReMi Extension dataset. We propose a systematic categorization that groups these attacks into five distinct categories based on their underlying attack mechanisms: position falsification, speed manipulation, freeze/replay attacks, denial-of-service variants, and Sybil-based attacks. This taxonomic organization enables comprehensive evaluation of detection capabilities across diverse misbehavior patterns.
Table 3. Attack types in the VeReMi Extension dataset.

3.3. Feature Engineering Methodology

VeMisNet leverages the original raw features to propose and generate a novel set of derived features that capture spatiotemporal and communication-aware vehicle behavior. These proposed features are evaluated both in combination with the original raw features and alongside the basic kinematic feature set. The results demonstrate significant performance improvements, highlighting the effectiveness of the engineered features. A detailed presentation of performance metrics and comparative analysis of these experiments is provided in the Section 4.

3.3.1. Spatiotemporal Feature Engineering

For each vehicle v i at position p i = ( x i , y i ) communicating with vehicle v j at position p j = ( x j , y j ) , we extract key spatiotemporal relationships:
Spatial Distance: The Euclidean distance between communicating vehicles:
d i j = ( x j x i ) 2 + ( y j y i ) 2
Temporal Difference: The time gap between consecutive messages:
Δ t i j = | t j t i |
Neighborhood Density: For distance thresholds D k { 100 m, 200 m, 300 m} aligned with DSRC communication ranges, the neighbor count within temporal window W is:
n i , D k = j N i ( W ) 1 ( d i j D k )
where N i ( W ) = { j | 0 < Δ t i j W } represents vehicles communicating within time window W.

3.3.2. UMAP-Based Feature Selection

To address the high-dimensional complexity of over 40 original features, Uniform Manifold Approximation and Projection (UMAP) was employed for non-linear dimensionality reduction. UMAP preserves both local and global data structures while identifying feature relevance through embeddings.
Table 4 presents the progressive feature selection strategy, demonstrating that 25 features provide optimal performance balance between computational efficiency and detection accuracy.
Table 4. Progressive feature selection using UMAP dimensionality reduction.

3.3.3. Communication Pattern Features

Key communication-aware features include:
  • Packet Transmission Rate: N P s = count ( messages from sender s )
  • Inter-Message Timing: Δ T = sendTime i sendTime i 1
  • Kinematic Differences: Relative changes in position, speed, and heading between sender–receiver pairs

3.4. Class Imbalance Analysis and Handling

The VeReMi Extension dataset exhibits severe class imbalance, as demonstrated through analysis of a representative 500,000-record sample. In this sample, legitimate vehicles comprise 59.6% of instances while individual attack types range from 4.7% to 13.1%, creating a maximum imbalance ratio of 12.7:1. This severe imbalance poses significant challenges for machine learning model training, as models tend to bias toward the majority class. Table 5 and Figure 3 present the detailed class distribution analysis based on the 500 K-record sample used for standardized comparisons throughout this study.
Table 5. Class distribution and imbalance analysis.
Figure 3. Comprehensive class imbalance analysis of the VANET dataset (500 K sample). (a) Sample count distribution showing absolute numbers for each class category. (b) Percentage distribution illustrating the severe imbalance with legitimate vehicles comprising 59.6% of samples. (c) Attack types distribution excluding legitimate vehicles to highlight attack class variations. (d) Imbalance severity analysis showing ratios relative to the legitimate class, with the maximum ratio reaching 12.7:1 for other attacks.
To address this challenge, SMOTE is applied after the train–test split to prevent data leakage and ensure unbiased evaluation. SMOTE generates synthetic samples for underrepresented classes by interpolating between existing minority class instances, creating balanced representation across all attack types while preserving test set integrity.

3.5. Evaluation Methodology

Performance evaluation employs comprehensive metrics including accuracy, precision, recall, F1-score, and balanced accuracy. Per-class analysis ensures equitable assessment across all 20 attack categories, while computational efficiency metrics (inference time, memory usage, training duration) provide practical deployment insights.
We use a single, stratified 80/20 train–test split at the sequence level; no k-fold cross-validation is used. Class imbalance is addressed only on the training split via post-split SMOTE. We report point estimates on the held-out test set and 95% confidence intervals via paired bootstrapping.
The evaluation framework prioritizes F1-score as the primary metric due to class imbalance considerations, supplemented by balanced accuracy comprehensive assessment of minority class detection performance. Having established the VeMisNet framework and methodology, the following section provides comprehensive analysis of feature importance to validate our domain-informed engineering approach and guide optimal feature selection for practical deployments.

4. Experimental Results

4.1. Experimental Configuration

This section presents the comprehensive experimental framework used to evaluate VeMisNet across multiple dimensions: feature engineering effectiveness, architectural performance, scalability, and class imbalance mitigation.

4.1.1. Dataset Configuration

All experiments utilize the VeReMi Extension dataset with the following standardized configurations:
Primary Evaluation Scale: 500,000 samples.
  • Rationale: Provides sufficient statistical power while maintaining computational tractability.
  • Composition: 20 attack categories plus legitimate traffic (59.6% legitimate, 40.4% attacks).
  • Train/Test Split: 80/20 stratified split maintained across all experiments.
  • Sequence Length: 10 time steps for temporal modeling.
Scalability Validation: Additional evaluations at 100 K, 1 M, and 2 M samples to assess performance consistency across deployment scales.

4.1.2. Feature Set Configurations

Five distinct feature engineering approaches were systematically evaluated (illustrated in Figure 2):
  • Original Features: Raw VeReMi features, including temporal, kinematic, and noise components.
  • UMAP-Selected Features: Progressive feature selection from 5 to 25 features based on UMAP dimensionality reduction, tested across all dataset sizes for both binary and multi-class classification tasks.
  • Basic Kinematic Features: Fundamental vehicular attributes, including position, speed, acceleration, and heading, evaluated as baseline across all models, dataset sizes, and classification tasks.
  • Enhanced Feature Set: Combination of basic kinematic features augmented with newly engineered spatiotemporal features, resulting in a comprehensive 14-feature set tested across all experimental configurations.
  • Comprehensive Feature Set Evaluation: Combination of Raw VeReMi features, including temporal, kinematic, and noise components, augmented with newly engineered domain-informed features, including the following:
    • Neighborhood density within DSRC ranges (100 m, 200 m, 300 m);
    • Inter-message timing patterns;
    • Kinematic consistency metrics;
    • Communication frequency analysis.

4.1.3. Model Configuration

The experimental environment utilized TensorFlow 2.x with Keras backend, implementing identical architectural configurations across all models to ensure fair comparison:
  • Architecture: Single recurrent layer with 64 hidden units;
  • Optimizer: Adam (learning rate = 0.001);
  • Loss Function: Categorical crossentropy for multi-class, binary crossentropy for binary classification;
  • Models Evaluated: LSTM, GRU, and Bidirectional LSTM.

4.1.4. Evaluation Methodology

Performance Metrics:
  • Primary: Accuracy, F1-score, Matthews Correlation Coefficient;
  • Class Balance: Balanced Accuracy, per-class Precision/Recall;
  • Statistical: 95% confidence intervals via bootstrap resampling.
Practical Metrics:
  • Inference latency (milliseconds per sample);
  • Training efficiency (time to convergence);
  • Memory utilization and throughput.
Class Imbalance Handling: SMOTE applied post-split to training data only, with comparative evaluation of balanced vs. unbalanced training approaches.

4.1.5. Evaluation Framework

Each combination of dataset size, feature configuration, and model architecture was evaluated on both binary and multi-class classification tasks, resulting in a comprehensive experimental matrix that enables systematic analysis of the following:
  • Scalability: Performance consistency across dataset sizes (100 K to 2 M);
  • Feature Engineering Impact: Effectiveness of engineered vs. basic features;
  • Architecture Comparison: Relative performance of LSTM, GRU, and Bi-LSTM;
  • Classification Complexity: Binary vs. multi-class detection capabilities.
This systematic experimental design ensures comprehensive evaluation of the VeMisNet framework across diverse operational scenarios while maintaining rigorous comparative analysis standards.

4.1.6. Evaluation Metrics

Performance evaluation employs comprehensive metrics suitable for imbalanced datasets. Table 6 presents the evaluation framework, where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
Table 6. Evaluation metrics framework.
Due to the VeReMi dataset’s severe class imbalance (59.6% legitimate vehicles), F1-score serves as the primary evaluation metric. While accuracy may appear high in imbalanced datasets, it can be misleading for minority class performance. F1-score balances precision–recall tradeoffs, providing comprehensive classifier assessment crucial for safety-critical VANET applications where both false negatives and false positives carry real-world risks. Balanced accuracy addresses class imbalance by computing average recall across all classes, ensuring equal contribution regardless of class frequency. This metric complements F1-score by providing class-agnostic performance assessment across all 20 attack categories.

4.2. Performance Evaluation Results

4.2.1. Binary Classification Results

Binary classification performance was evaluated across varying dataset sizes using basic kinematic features (position, speed, acceleration, and heading) to distinguish between legitimate vehicles and attackers. Table 7 presents comprehensive performance metrics for all three architectures across dataset scales from 100 K to 2 M records.
Table 7. Evaluation metrics for binary classification—Normal or Attacker—with basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
The results demonstrate that GRU achieves the highest overall performance with 84.39% accuracy at 100 K records, followed closely by Bi-LSTM (84.32%) and LSTM (83.73%). However, all architectures exhibit performance degradation as dataset size increases, suggesting potential challenges in processing larger data volumes with basic feature sets. Notably, GRU maintains superior precision across most configurations, achieving 85.16% precision at 100 K records, while Bi-LSTM demonstrates more consistent performance patterns across varying dataset sizes. LSTM shows intermediate performance but exhibits greater stability at larger scales compared to GRU.
Figure 4 visualizes these performance trends across dataset sizes. The accuracy comparison (left) clearly illustrates GRU’s peak performance at smaller dataset sizes, while the F1-score analysis (right) reveals Bi-LSTM’s superior consistency across scales. The performance degradation pattern is most pronounced in GRU, which experiences the steepest decline from 84.39% to 79.82% accuracy as dataset size increases from 100 K to 2 M records.
Figure 4. Binary classification performance comparison across dataset sizes. (left) Accuracy comparison showing GRU achieving highest accuracy of 84.4% at 100 k dataset size, followed by Bi-LSTM (84.3%) and LSTM (83.7%). (right) F1-score comparison demonstrating similar trends with Bi-LSTM showing more consistent performance across larger datasets. All models exhibit performance degradation with increasing dataset size, with GRU showing the steepest decline.
These findings indicate that while basic kinematic features provide reasonable binary classification performance, the scalability challenges suggest the need for more sophisticated feature engineering to maintain detection accuracy at larger operational scales. The consistent performance of Bi-LSTM across dataset sizes makes it particularly suitable for real-world VANET deployments with varying data volumes.

4.2.2. Multi-Class Classification Results

Multi-class classification evaluation encompasses three distinct feature engineering approaches to assess detection performance across 20 attack categories, demonstrating the progressive impact of feature selection and engineering on model effectiveness.
Original Features
Multi-class classification evaluation using the original VeReMi Extension features provides baseline performance assessment across three recurrent architectures. The original feature set comprises temporal, kinematic, noise, and filtered components from the standardized dataset, enabling direct comparison with the prior literature that utilizes these established attributes. Table 8, Table 9 and Table 10 present detailed results across all architectures and scales.
Table 8. Multi-class VANET attack detection using original features (temporal, kinematic, noise, and filtered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 9. Training efficiency and security performance for VANET attack detection models.
Table 10. Performance degradation analysis: scaling impact on VANET attack detection.
The experimental results demonstrate consistent architectural performance rankings across dataset scales, with Bidirectional LSTM achieving superior detection performance compared to unidirectional variants. However, all architectures exhibit performance degradation as dataset size increases, indicating potential challenges in maintaining detection effectiveness at larger operational scales when relying solely on original feature representations.
Computational efficiency analysis reveals trade-offs between training time and detection performance. While LSTM demonstrates faster training at the 500 K scale, GRU achieves optimal training efficiency at larger scales. Bidirectional LSTM consistently maintains the lowest missed attack rates across all configurations, though at the cost of increased training time. The inference latency remains consistently below 41 ms across all architectures, supporting real-time deployment requirements.
Performance Assessment The scalability analysis reveals systematic performance degradation patterns that warrant consideration for large-scale deployment scenarios. Bidirectional LSTM demonstrates the most favorable scaling characteristics, with the smallest cumulative accuracy decline (−4.30%) and missed attack rate increase (+2.65%) across the evaluated range. These baseline results using original features provide the foundation for subsequent feature engineering investigations and establish performance benchmarks for comparative evaluation.
Critical Assessment The original feature performance, while achieving competitive results on smaller datasets, exhibits concerning degradation patterns at scale. The cumulative accuracy decline of 4–5 percentage points across the 500 K–2 M range suggests potential limitations for large-scale deployment scenarios. Additionally, the increasing missed attack rates (from 2.32% to 4.97% for Bi-LSTM) indicate reduced reliability for safety-critical applications as dataset complexity increases.
These baseline results demonstrate the need for enhanced feature engineering approaches that can maintain detection effectiveness across diverse operational scales while addressing the inherent challenges of vehicular communication pattern recognition.
UMAP-Selected Features
UMAP dimensionality reduction identified optimal feature subsets from the original dataset, with progressive evaluation from 5 to 25 features across all models and dataset scales. Table 11, and Figure 5 and Figure 6 present comprehensive multi-class classification results using UMAP-selected features. UMAP feature selection reveals optimal performance at 10–20 features, with Bi-LSTM achieving peak accuracy of 71.97% at 100 K records with 20 features. Performance plateaus beyond 15 features, indicating diminishing returns from additional feature complexity.
Table 11. UMAP feature selection results: model performance comparison.
Figure 5. Multi-class classification accuracy using UMAP feature selection. Bi-LSTM consistently achieves highest accuracy, particularly at smaller dataset sizes, with optimal performance at 10–20 features across all models.
Figure 6. Multi-class classification F1-score using UMAP feature selection. Similar trends to accuracy with Bi-LSTM demonstrating superior performance and optimal feature range of 10–20 features.
Basic Kinematic Features
Multi-class classification using fundamental vehicular attributes (position, speed, acceleration, heading) establishes baseline performance across all architectures and dataset scales. Table 12 and Figure 7 present comprehensive evaluation results for basic kinematic features.
Table 12. Multi-class classification using basic kinematic features (position, speed, acceleration, and heading) using LSTM, GRU, and Bidirectional LSTM.
Figure 7. Multi-class classification performance using basic kinematic features. Bi-LSTM consistently achieves highest performance with 81.39% accuracy at 100 k records, demonstrating graceful degradation with increasing dataset size across all metrics.
The training time analysis for 500,000 records (Figure 8) reveals distinct computational patterns across the three architectures. LSTM shows the largest advantage for binary classification, training 25.8% faster (1.24 vs. 1.67 h). Bi-LSTM demonstrates a smaller but consistent advantage, with binary classification requiring 6.5% less time (2.00 vs. 2.13 h).
Figure 8. Training time comparison between multi-class and binary classification approaches.
Notably, GRU exhibits the opposite behavior, where multi-class classification trains 15.7% faster than binary (1.19 vs. 1.41 h). This suggests that GRU’s gating mechanisms are more efficiently utilized when learning multiple class distinctions rather than binary decisions. These computational differences have practical implications for deployment scenarios where training time is a critical constraint in vehicular network security applications.
Enhanced Feature Set (Basic Kinematic + Engineered)
The enhanced feature configuration combines basic kinematic attributes with newly engineered spatiotemporal features, resulting in a comprehensive 14-feature set designed to capture domain-specific vehicular behavior patterns. This feature set bridges the gap between simplistic basic features and the computational complexity of the full feature set, providing an optimal balance for practical deployment scenarios. Table 13, Table 14, Table 15 and Table 16 present detailed results across all architectures and scales.
Table 13. Multi-class VANET attack detection using enhanced feature set (Basic kinematic + Engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 14. Training efficiency and security performance for enhanced feature set models.
Table 15. Performance degradation analysis: scaling impact with enhanced feature set.
Table 16. Extended performance metrics for enhanced feature set.
The enhanced feature set demonstrates significant improvements over basic kinematic features while maintaining computational efficiency. Bidirectional LSTM achieves 91.80% accuracy with the 14-feature configuration, representing a 10.4 percentage point improvement over basic features (81.40%) and approaching the performance of more complex feature configurations.
Computational efficiency analysis reveals that the enhanced feature set maintains excellent training times, with GRU achieving optimal efficiency at 39.7 min for 500 K samples. The inference latency consistently remains below 36 ms across all architectures, demonstrating the feature set’s suitability for real-time VANET deployment despite the additional engineered features.
The extended metrics reveal critical insights into model behavior under class imbalance. While overall accuracy remains high (91.80%), the balanced accuracy of 73.91% for Bi-LSTM indicates persistent challenges in minority class detection. The MCC values (0.8267–0.8688) and Cohen’s kappa (0.8243–0.8673) confirm strong predictive capability despite the imbalanced dataset, with Bi-LSTM consistently demonstrating superior correlation and agreement metrics.
Performance Assessment The enhanced feature set achieves optimal balance between detection performance and computational efficiency. Bidirectional LSTM with 14 engineered features provides a 27.5% relative improvement over UMAP-selected features and 12.78% over basic kinematic features, while maintaining sub-100 ms inference latency suitable for real-time deployment. The scalability analysis reveals moderate degradation patterns, with Bi-LSTM experiencing a cumulative 5.68% accuracy decline across the 500 K–2 M range.
Critical Assessment While the enhanced feature set represents a significant advancement over basic approaches, the increasing missed attack rates (from 8.98% to 14.53% for Bi-LSTM) highlight the challenge of maintaining detection reliability at scale. The degradation patterns, though improved compared to original features alone, suggest that the 14-feature configuration approaches the practical limit for feature-based detection without more comprehensive feature integration.
Table 17 demonstrates the progressive impact of feature engineering approaches on detection performance. The domain-informed feature engineering (14 features) achieves 91.80% accuracy, representing a 27.48% relative improvement over automated UMAP selection and validating the importance of incorporating VANET-specific communication patterns. This substantial performance gain from just 14 carefully selected features compared to UMAP’s 10–20 features highlights the superiority of domain expertise over dimensionality reduction alone.
Table 17. Comprehensive feature engineering comparison analysis.
These results validate the enhanced feature set as an optimal middle ground for practical VANET deployment, offering substantial performance improvements while maintaining computational feasibility and demonstrating the critical importance of domain-informed feature engineering in vehicular security applications.
Comprehensive Feature Set Evaluation (Original + Engineered Features)
To demonstrate the full potential of our domain-informed feature engineering approach, we conducted comprehensive evaluation using the complete feature set comprising all features: the original VeReMi features combined with our newly engineered spatiotemporal and communication-aware features. This evaluation provides definitive evidence of our approach’s effectiveness while addressing scalability and deployment considerations.
The comprehensive evaluation encompasses three critical dimensions: (1) multi-class classification performance across all 20 attack categories, (2) computational efficiency for real-time vehicular deployment, and (3) scalability assessment across dataset sizes from 500 K to 2 M samples. Table 18, Table 19 and Table 20 present detailed results across all architectures and scales.
Table 18. Multi-class VANET attack detection using comprehensive feature set (original + engineered) using LSTM, GRU, and Bidirectional LSTM with 95% confidence intervals.
Table 19. Training efficiency and security performance with comprehensive feature set.
Table 20. Performance degradation analysis: scaling impact with comprehensive feature set.
Performance Analysis The comprehensive feature set demonstrates exceptional performance characteristics that fundamentally distinguish it from previous configurations. At 500 K samples, Bi-LSTM achieves 99.81% accuracy with only 0.19% missed attacks—representing a quantum leap in detection capability compared to the enhanced 14-feature set (91.80% accuracy, 8.98% missed attacks).
Most remarkably, the comprehensive feature set exhibits extraordinary scalability. Unlike previous configurations that showed significant performance degradation, accuracy decline is minimal: Bi-LSTM drops only 0.22 percentage points across the entire 500 K to 2 M range, maintaining >99.5% accuracy even at maximum scale. This represents a 22-fold improvement in scalability compared to the enhanced feature set’s 4.94 percentage point degradation.
Critical Assessment The comprehensive feature set establishes new performance benchmarks for VANET misbehavior detection, achieving near-perfect classification with remarkable scalability. The minimal missed attack rates (<0.5% across all scales) position this approach for deployment in safety-critical vehicular systems where detection reliability is paramount.
However, practical considerations warrant acknowledgment: (1) increased computational overhead with the full features set requires careful resource management, (2) training times increase proportionally (71–420 min vs. 40–115 min for the 14-feature set), and (3) the feature engineering process demands domain expertise for effective implementation.
The results validate our hypothesis that comprehensive domain-informed feature engineering can overcome the traditional accuracy–scalability trade-off in machine learning systems, enabling both exceptional performance and robust scaling characteristics essential for real-world VANET deployment.

4.3. Comprehensive Per-Class Performance Analysis and Class Imbalance Mitigation

4.3.1. Theoretical Foundation and Methodology

Class imbalance represents a fundamental challenge in VANET misbehavior detection, where legitimate vehicular traffic significantly outnumbers malicious activities. This imbalance can lead to misleading performance assessments when relying solely on aggregate metrics. Our analysis employs both overall and per-class evaluation methodologies to provide comprehensive insights into model behavior under imbalanced conditions.
Imbalance Impact on Model Evaluation: In severely imbalanced datasets, conventional metrics such as accuracy can be misleading, as models may achieve high overall performance by predominantly predicting the majority class (legitimate vehicles). This phenomenon masks poor performance on critical minority classes (attack categories), where detection is most crucial for security applications. The per-class performance evaluation presented in Appendix B (Table A2, Table A3 and Table A4) demonstrates this phenomenon clearly, with several attack categories showing complete detection failure (F1-score = 0.000) when using basic features without SMOTE application.
SMOTE-Based Mitigation Strategy: We implement Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance through intelligent data augmentation. SMOTE generates synthetic examples for minority classes by interpolating between existing minority class instances, thereby expanding the decision boundary representation and reducing majority class bias while maintaining realistic data distributions.

4.3.2. Architectural Performance Under Class Imbalance

Analysis of the basic feature performance reveals severe architectural limitations in detecting minority attack classes without proper imbalance mitigation. Examining Table A2, the Bidirectional LSTM architecture demonstrates complete detection failure for several critical attack categories, including Disruptive attacks (F1-score = 0.000), DoS attacks (F1-score = 0.000), and DataReplaySybil attacks (F1-score = 0.000). This systematic failure pattern is consistently observed across all three architectures in Table A3 and Table A4, indicating that basic kinematic features alone are insufficient for detecting sophisticated attack patterns that do not directly manipulate fundamental vehicular movement parameters.
The GRU architecture exhibits similar limitations, with complete failure in detecting Disruptive (F1-score = 0.000) and DataReplay attacks (F1-score = 0.000), while the standard LSTM architecture shows identical failure modes across these same attack categories. These results validate the theoretical understanding that minority classes in severely imbalanced datasets require specialized handling to achieve meaningful detection performance.

4.3.3. SMOTE Transformation Effects and Recovery Analysis

The application of SMOTE demonstrates remarkable recovery in minority class detection capabilities across all architectures. For the Bidirectional LSTM configuration shown in Table A2, critical attack categories exhibit substantial performance improvements: Disruptive attacks recover from complete failure to achieving 0.116 F1-score, DoS attacks improve from 0.000 to 0.504 F1-score (complete recovery), and DoSRandomSybil attacks demonstrate dramatic enhancement from 0.096 to 0.571 F1-score, representing a 495% improvement in detection capability.
However, this improvement comes with an inherent and theoretically sound trade-off in legitimate traffic classification. The legitimate vehicle recall decreases significantly across all architectures when SMOTE is applied: Bi-LSTM shows recall reduction from 0.996 to 0.224 (−77.4%), GRU exhibits decline from 0.993 to 0.192 (−80.7%), and LSTM demonstrates decrease from 0.997 to 0.160 (−83.9%). This trade-off is acceptable and expected in security-critical applications where the consequences of missing attacks far outweigh the costs of false positive alerts.

4.3.4. Enhanced Feature Set Impact and Comprehensive Analysis

The transition to enhanced feature configurations, as demonstrated in Table A5, Table A6 and Table A7, reveals substantial improvements in detection capability even without SMOTE application. Table A8 presents our comprehensive evaluation using the optimal 14-feature set, comparing model performance with and without SMOTE across multiple evaluation metrics. The results demonstrate the nuanced trade-offs inherent in imbalance mitigation strategies.
The Bidirectional LSTM architecture with enhanced features (Table A5) demonstrates remarkable recovery in previously undetectable attack categories: DoS attacks improve from complete failure with basic features to 0.957 F1-score, Disruptive attacks enhance to 0.605 F1-score, and DataReplay attacks achieve 0.377 F1-score, representing over 1400% improvement compared to basic feature performance.
Key Performance Insights from Enhanced Feature Analysis:
  • Without SMOTE: Models achieve high overall accuracy (85.84–90.00%) but exhibit poor balanced accuracy (57.80–67.77%), indicating severe bias toward majority classes even with enhanced features.
  • With SMOTE: While overall accuracy moderately decreases (74.04–84.85%), balanced accuracy substantially improves (88.95–92.10%), demonstrating more equitable performance across all attack categories.
  • Optimal Configuration: Bidirectional LSTM with SMOTE achieves the best balance, maintaining competitive overall accuracy (84.85%) while achieving excellent balanced accuracy (92.10%) and Cohen’s Kappa (0.7655).
Critical Performance Patterns:
  • Minority Class Enhancement: SMOTE consistently improves detection capabilities for underrepresented attack types, with particularly significant gains for Sybil (+0.50), DDoS (+0.53), Position (+0.52), Speed (+0.53), and Replay (+0.53) attacks.
  • Detection Difficulty Correlation: An inverse relationship exists between class size and detection complexity, with smaller attack classes presenting inherently greater detection challenges. EventualStop attacks also present detection challenges, achieving 0.584 F1-score even with optimal configuration, suggesting that gradual behavioral changes are inherently more difficult to distinguish from normal traffic variations.
  • Architectural Superiority: Bidirectional LSTM demonstrates consistent superiority across most attack categories, effectively leveraging both forward and backward temporal contexts, with average F1-score improvements of 8–15% compared to unidirectional architectures.

4.3.5. Resource Utilization and Practical Considerations

Figure A1 provides complementary analysis of performance metrics and computational resource requirements. The resource utilization analysis reveals important practical considerations:
  • Memory Efficiency: SMOTE application increases memory requirements due to synthetic sample generation, with Bidirectional LSTM showing moderate increases compared to unidirectional architectures.
  • Training Time Impact: Enhanced dataset sizes from SMOTE result in longer training periods, but the performance gains justify the computational overhead in security-critical applications.
  • Deployment Trade-offs: The slight reduction in overall precision with SMOTE is acceptable in security-sensitive environments, where false positives are preferable to missed attacks.

4.3.6. Scientific Justification and Comprehensive Synthesis

The effectiveness of our SMOTE-based approach stems from its ability to expand decision boundaries for minority classes, thereby reducing overfitting to majority class patterns. Systematic comparison across Table A5, Table A6 and Table A7 reveals consistent architectural rankings with enhanced features and SMOTE application. The results establish that proper feature engineering combined with SMOTE enables detection of previously undetectable attack types, transforming the security posture from selective detection to comprehensive coverage.
This enhancement is particularly crucial in vehicular security applications where:
  • Rare Attack Detection: Critical security threats often manifest as minority classes in real-world traffic data. The systematic recovery of critical attack detection capabilities, from complete failure to detection rates exceeding 95% for most attack types, validates both the technical approach and its practical significance.
  • Balanced Coverage: Equitable detection performance across all attack types ensures comprehensive security coverage, with the transformation from selective to comprehensive detection representing a fundamental advancement in VANET security capability.
  • Generalization Enhancement: Synthetic augmentation improves model generalization to previously unseen attack variants, as evidenced by improved performance across diverse attack categories.
  • Operational Reliability: Consistent performance across diverse attack scenarios enhances real-world deployment viability, with Bidirectional LSTM consistently demonstrating superior performance across the majority of attack categories.
These findings establish the critical importance of class imbalance mitigation in developing robust and reliable VANET security systems, with our SMOTE-enhanced Bidirectional LSTM configuration representing the optimal balance between detection accuracy and operational practicality for securing vehicular ad hoc networks against sophisticated misbehavior patterns.

5. Conclusions and Future Directions

A central contribution of VeMisNet lies in its systematic feature engineering and comparative evaluation. Beyond relying solely on raw or dimensionality-reduced attributes, the framework introduces communication-aware features such as DSRC-range neighborhood density, inter-message timing, directional differences, and transmission frequency patterns, derived from the publicly available VeReMi Extension dataset. These features mark a significant advancement over traditional approaches that rely primarily on basic kinematic variables or limited private datasets.
The framework was rigorously evaluated across five experimental configurations:
  • Original VeReMi features: Bi-LSTM achieved 97.0 % accuracy at 500 K samples, but degraded to 92.7 % at 2 M, with missed attack rates increasing from 2.3 % to 5.0 % .
  • UMAP-selected subsets: Performance plateaued near 72.0 % accuracy, confirming that unsupervised dimensionality reduction alone cannot capture VANET-specific spatiotemporal patterns.
  • Basic kinematic features: Baseline results reached only 81.4 % accuracy at 100 K, falling to 64.4 % F1 at 2 M, exposing severe scalability limitations.
  • Enhanced 14-feature set: Integration of engineered spatiotemporal features improved Bi-LSTM performance to 91.80% accuracy and F1 = 0.9093 at 500 K, with moderate scalability loss (down to 86.1 % accuracy at 2 M). False alarms dropped to 0.33 % , with only 8.98 % attacks missed.
  • Comprehensive feature set: Combining raw and engineered features achieved the highest stability across scales, sustaining > 99.5 % accuracy even at 2 M samples and maintaining inference latency under 41 ms.
Comparative scalability analysis demonstrates that basic features degrade most severely with dataset size, while the enhanced and comprehensive feature sets deliver graceful degradation and consistent robustness. The Bi-LSTM architecture consistently outperformed LSTM and GRU across all experiments, showing a cumulative accuracy decline of only 4.3 % between 500 K and 2 M records under the comprehensive configuration, compared to >12% for basic features.
Safety-critical metrics further highlight these improvements: false alarm rates decreased to 0.33%, attack detection reached 91.02%, and robustness was reinforced with MCC = 0.8688 , κ = 0.8673 , and balanced accuracy of 73.9 % . Importantly, the framework achieved deployment-ready efficiency, sustaining throughput above 68,000 samples/s, sub-47 ms P99 inference latency, and a memory footprint of only 24.8 MB.
Collectively, these results validate that careful feature engineering—tested across multiple baselines and systematically benchmarked at scale—drives measurable advances in misbehavior detection. VeMisNet therefore establishes new accuracy and scalability benchmarks, while delivering statistically significant improvements ( p < 0.001 ) across all metrics, providing credible evidence of readiness for real-time safety-critical VANET deployments.

5.1. Future Research Directions

The VeMisNet framework establishes a foundation for VANET misbehavior detection, yet several critical research directions emerge from both our achievements and acknowledged limitations. These priorities address immediate deployment challenges while positioning the framework for next-generation intelligent transportation systems.

5.1.1. Immediate Research Priorities

Comprehensive Validation Framework Development: Current evaluation relies exclusively on simulated data, creating an urgent need for real-world validation. Future research must establish controlled VANET testbeds using commercial vehicles equipped with DSRC communication systems across diverse environments (urban, suburban, highway). This validation should implement standardized data collection procedures capturing authentic vehicular communication patterns, environmental conditions, and naturally occurring anomalies to assess model transferability from simulation to deployment [,].
Adversarial Robustness Enhancement: The framework’s vulnerability to sophisticated adversarial attacks targeting machine learning-based security systems requires immediate attention. Research should develop comprehensive adversarial attack models specifically designed for VANET security contexts, implement adversarial training with certified defenses, and establish standardized evaluation protocols for measuring system resilience against adaptive adversaries. This includes defense mechanisms against evasion and poisoning attacks that could compromise detection effectiveness in operational environments.
Cross-Validation Methodology Integration: Our current single-split evaluation approach, while consistent across experiments, limits the robustness of performance estimates. Future work must implement comprehensive k-fold cross-validation with temporal-aware folding strategies that preserve vehicular sequence integrity while providing statistically robust performance estimates. This includes developing specialized validation techniques for imbalanced temporal datasets with SMOTE integration.

5.1.2. Medium-Term Research Objectives

Federated Learning Framework: The development of federated learning capabilities represents a critical direction for enabling distributed VANET nodes to collaboratively train detection models while preserving data privacy and addressing concerns about sensitive vehicle information sharing [,]. This approach would allow the framework to benefit from diverse, heterogeneous datasets across different geographical regions and traffic conditions without compromising individual vehicle privacy, thereby enhancing model generalizability and robustness across diverse operational contexts.
Ensemble Learning Integration: Future research should explore the incorporation of ensemble learning techniques that combine multiple deep learning architectures to enhance detection reliability, robustness, and overall system accuracy [,]. Such approaches could leverage the complementary strengths of different neural network models while mitigating individual architectural limitations and improving generalization across diverse attack patterns.
Neuro-Fuzzy Integration: Incorporating fuzzy logic components with neural networks would significantly improve the framework’s ability to handle uncertainty and ambiguity inherent in VANET environments []. This hybrid approach could enhance detection reliability in scenarios with incomplete or noisy communication data, providing more robust decision-making capabilities under adverse conditions such as network congestion or intermittent connectivity.

5.1.3. Long-Term Research Vision

Scalable Deployment Architecture: Future research should focus on developing efficient deployment strategies for large-scale VANET implementations, including edge computing integration, distributed processing architectures, and real-time processing optimization to meet stringent latency requirements of vehicular safety applications [,]. This involves investigating lightweight model variants, hardware-accelerated implementations, and adaptive resource allocation strategies for heterogeneous vehicular computing environments.
Advanced Attack Pattern Analysis: Research should extend beyond current binary and multi-class detection toward more sophisticated attack pattern recognition, including zero-day attack detection, adaptive adversarial behavior modeling, and evolution-aware detection systems that can adapt to emerging threat landscapes in intelligent transportation systems.
Context-Aware Environmental Integration: Extending the framework to incorporate real-world vehicle datasets and context-sensitive features would provide more accurate behavioral modeling and improve generalizability. This includes integration of environmental factors such as weather conditions, traffic density variations, road infrastructure characteristics, and temporal patterns that could significantly influence vehicle behavior and attack manifestations.

5.1.4. Research Implementation Strategy

Resource-Constrained Optimization: Development of lightweight model variants suitable for deployment in resource-limited vehicular computing environments while maintaining detection effectiveness. This includes model compression techniques, quantization strategies, and hardware–software co-design approaches optimized for automotive-grade processors.
Dynamic Adaptation Mechanisms: Investigation of online learning and model updating mechanisms to address concept drift and evolving attack strategies without requiring complete retraining. This includes developing incremental learning algorithms that can adapt to new attack patterns while preserving detection performance on known threats.
Cross-Domain Generalization: Evaluation across diverse geographical regions, traffic patterns, and infrastructure configurations to assess model transferability and identify domain adaptation requirements. This research should establish standardized benchmarks for cross-domain evaluation and develop techniques for rapid model adaptation to new operational environments.
These enhancements would collectively advance VeMisNet toward a production-ready system capable of providing robust, scalable, and privacy-preserving misbehavior detection for next-generation intelligent transportation systems. The integration of these future directions would enhance system generalizability, enable deployment at scale, and further improve the robustness and security of VANETs in real-world operational environments.

Author Contributions

Conceptualization, A.M., M.A.S., A.M.B. and K.N.; Methodology, N.Y., A.M., M.A.S., A.M.B. and K.N.; Software, N.Y.; Validation, N.Y.; Investigation, N.Y.; Resources, N.Y.; Data curation, N.Y.; Writing—original draft, N.Y.; Writing—review & editing, A.M., M.A.S., A.M.B. and K.N.; Supervision, A.M., M.A.S., A.M.B. and K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Comprehensive Literature Comparison

Table A1. Comprehensive literature comparison.
Table A1. Comprehensive literature comparison.
StudyYearApproachDatasetScaleAccuracyF1-ScoreSpeed (Sample/s)Attack TypesKey Innovation
Traditional Machine Learning
Grover et al. []2011Random ForestCustom50 K93–99%95%20,000MultipleEnsemble methods
So et al. []2018ML + PlausibilityCustom100 K95%93%18,500PositionDomain integration
Zhang et al. []2018SVM + DSTCustom75 K94.2%92%16,200Message attacksDual-model approach
Sharma & Jaekel []2021ML EnsembleVeReMi500 K98.5%97%15,8005 typesTemporal BSMs
Sonker et al. []2021Random ForestVeReMi200 K97.6%96%17,3005 typesMulti-class focus
Kumar et al. []2022XGBoost + TemporalVeReMi800 K96.2%95%16,50016 typesTemporal features
Patel et al. []2024ML EnsembleVeReMi Ext1.5 M94.7%93%14,20020 typesAdvanced ensemble
Deep Learning Approaches
Kamel et al. []2019LSTMVeReMi300 K89.5%87%9200Multi-classTemporal modeling
Alladi et al. []2021CNN-LSTMCustom250 K92.3%90%7800MultipleHybrid architecture
Alladi et al. []2021DeepADVVeReMi400 K94.1%92%6500ComprehensiveMulti-architecture
Liu et al. []2023Graph NNCustom + VeReMi900 K94.5%93%580015 typesGraph-based modeling
Chen et al. []2024Attention-LSTMVeReMi Ext1 M93.1%91%910020 typesAttention mechanism
Yuce et al. []2024Spatiotemporal GNNConverted MBD (IoV)99.92%4800MultipleGNN + dataset-to-graph mapping
Transformer-Based Architectures
Wang et al. []2024TransformerVeReMi Ext1.2 M93.8%92%720020 typesSelf-attention
Khan et al. []2025Transformer + SHAPVeReMi Ext3.19 M96.15% (MC), 98.28% (Bin)7200All attacks (VeReMi Ext)XAI with transformer
Federated Learning Approaches
Gurjar et al. []2025Fed. CNN-LSTMVeReMi Ext1.2 M93.2%90%5500MultipleScalable federation
Campos et al. []2024Federated DLVeReMi Ext800 K91.7%89%6200DistributedPrivacy-preserving
Hybrid Approaches
Kim et al. []2023SVM + GRUVeReMi700 K94.3%93%12,50016 typesML + DL ensemble
Rodriguez et al. []2024RF + LSTMVeReMi Ext1.3 M95.1%94%11,00020 typesHybrid architecture
Present Work2025Bi-LSTM + Eng. FeaturesVeReMi Ext100 k–2 M99.05–99.63%99+%41.76 msAll attacks (VeReMi Ext)Domain engineered features + scalability analysis

Appendix B. Comprehensive Per-Class Performance Analysis

Table A2. Per-class performance of Bi-LSTM using basic features by attack type.
Table A2. Per-class performance of Bi-LSTM using basic features by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
Bi-LSTM Architecture
Legitimate0.7970.7450.9960.8530.4840.9410.2240.362
ConstPos0.9910.8270.4920.6170.9940.7950.9570.868
ConstPosOffset0.9890.8150.2660.4010.9780.3510.8620.499
RandomPos0.9900.8470.3940.5380.9840.3050.6280.411
RandomPosOffset0.9900.7730.1250.2150.9180.0550.5870.101
ConstSpeed0.9940.7770.6900.7310.9990.9690.9540.962
ConstSpeedOffset0.9930.6960.6960.6960.9910.4440.7870.567
RandomSpeed0.9920.6730.6500.6610.9960.8140.9130.861
RandomSpeedOffset0.9920.7660.3970.5230.9750.2450.7580.370
EventualStop0.9920.8700.3260.4740.9670.1780.8570.295
Disruptive0.9860.0000.0000.0000.8990.0650.5420.116
DataReplay0.9890.6190.0130.0250.9120.0620.4860.110
Stale Messages0.9950.9570.6140.7480.9570.1960.8830.320
DoS0.9540.0000.0000.0000.9510.4190.6340.504
DoSRandom0.9570.5120.9450.6640.9720.5950.3920.473
DoSDisruptive0.9660.2870.0260.0470.9270.1930.4430.269
GridSybil0.9660.8500.5880.6950.9530.4760.7050.568
DataReplaySybil0.9900.0000.0000.0000.9560.0860.4800.146
DoSRandomSybil0.9710.4900.0530.0960.9710.5010.6630.571
DoSDisruptiveSybil0.9690.1190.0060.0110.9610.3700.4910.422
Table A3. Per-class performance of GRU using basic features.
Table A3. Per-class performance of GRU using basic features.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
GRU Architecture
Legitimate0.7740.7260.9930.8380.4610.9190.1920.317
ConstPos0.9890.7470.3660.4920.9890.6580.9410.774
ConstPosOffset0.9880.8490.1370.2360.9820.3880.7480.511
RandomPos0.9870.7480.2030.3190.9790.2520.6860.369
RandomPosOffset0.9890.8890.0070.0150.9070.0440.5200.081
ConstSpeed0.9940.7390.7090.7240.9990.9400.9540.947
ConstSpeedOffset0.9910.6330.5420.5840.9940.5590.8270.667
RandomSpeed0.9920.6740.6060.6380.9960.7690.8960.827
RandomSpeedOffset0.9900.6770.3420.4540.9750.2430.7580.368
EventualStop0.9910.8010.2000.3200.9500.1220.8440.213
Disruptive0.9860.0000.0000.0000.9100.0640.4660.113
DataReplay0.9890.0000.0000.0000.9200.0650.4580.114
StaleMessages0.9950.9570.6140.7480.9570.1980.8740.322
DoS0.9540.0000.0000.0000.9420.3380.4920.400
DoSRandom0.9550.4990.9320.6500.9720.5540.6780.610
DoSDisruptive0.9670.3470.0050.0110.9120.1160.2890.166
GridSybil0.9520.7140.4560.5560.9500.4550.7070.553
DataReplaySybil0.9900.0000.0000.0000.9480.0690.4530.120
DoSRandomSybil0.9710.5790.0080.0160.9720.5350.3550.427
DoSDisruptiveSybil0.9680.0870.0060.0100.9570.3150.4120.357
Table A4. Per-class performance of LSTM using basic features by attack type.
Table A4. Per-class performance of LSTM using basic features by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
LSTM Architecture
Legitimate0.7780.7280.9970.8420.4450.9390.1600.273
ConstPos0.9900.8620.4010.5470.9930.7460.9460.834
ConstPosOffset0.9870.9590.0540.1020.9810.3850.7720.514
RandomPos0.9880.7500.3390.4670.9780.2530.7560.379
RandomPosOffset0.9890.8460.0100.0200.9300.0590.5330.107
ConstSpeed0.9930.7380.6280.6790.9980.9040.9310.917
ConstSpeed ffset0.9910.6520.4970.5640.9860.3220.7470.450
RandomSpeed0.9910.6760.5710.6190.9950.7500.8350.790
RandomSpeedOffset0.9910.8030.3000.4370.9730.2130.6920.326
EventualStop0.9910.7640.2790.4080.9640.1660.8700.279
Disruptive0.9860.0000.0000.0000.8910.0550.4830.098
DataReplay0.9890.0000.0000.0000.8910.0480.4670.087
StaleMessages0.9950.9570.6140.7480.9570.1960.8740.320
DoS0.9540.0000.0000.0000.9360.3140.5400.397
DoSRandom0.9530.4860.9410.6410.9670.5000.3760.429
DoSDisruptive0.9670.1200.0030.0060.9240.1470.3130.200
Grid Sybil0.9620.8350.5180.6390.9450.4200.6880.521
DataReplaySybil0.9900.0000.0000.0000.9420.0610.4400.106
DoSRandomSybil0.9710.5270.0110.0210.9680.4620.5200.489
DoSDisruptiveSybil0.9690.0820.0040.0080.9650.4110.4980.451
Table A5. Per-class performance of Bi-LSTM using optimal 14-feature set by attack type.
Table A5. Per-class performance of Bi-LSTM using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
Bi-LSTM Architecture
Legitimate0.9530.9280.9990.9620.8690.9950.8030.889
ConstPos0.9970.8500.9920.9160.9990.9790.9950.987
ConstPosOffset0.9871.0000.0750.1390.9880.5190.9840.680
RandomPos0.9950.8280.8620.8450.9970.8090.8840.844
RandomPosOffset0.9890.9170.0100.0200.9650.1710.9070.288
ConstSpeed0.9970.8450.9040.8730.9990.9840.9540.969
ConstSpeedOffset0.9940.7530.7610.7570.9930.5340.9470.683
RandomSpeed0.9950.8400.7400.7870.9970.7890.9740.872
RandomSpeedOffset0.9930.9010.4410.5930.9650.1930.8350.314
EventualStop0.9960.9000.7620.8260.9890.4170.9740.584
Disruptive0.9900.6510.5650.6050.9910.5750.9410.714
DataReplay0.9900.6010.2750.3770.9810.3610.8880.514
StaleMessages0.9970.9610.8310.8910.9980.8700.9640.915
DoS0.9960.9320.9820.9570.9980.9760.9810.979
DoSRandom0.9870.8790.8200.8480.9910.8800.8490.864
DoSDisruptive0.9850.7970.7410.7680.9970.9680.9420.955
GridSybil0.9930.9660.9190.9420.9980.9930.9690.981
DataReplaySybil0.9910.6340.2430.3510.9930.5260.8000.635
DoSRandomSybil0.9880.7250.9310.8150.9910.8270.8750.850
DoSDisruptiveSybil0.9850.7770.6990.7360.9970.9300.9570.943
Table A6. Per-class performance of GRU using optimal 14-feature set by attack type.
Table A6. Per-class performance of GRU using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
GRU Architecture
Legitimate0.9480.9191.0000.9580.7660.9910.6470.783
ConstPos0.9970.8430.9660.9000.9990.9530.9840.968
ConstPosOffset0.9890.9900.2210.3610.9610.2470.9840.395
RandomPos0.9940.7940.8520.8220.9970.8120.9070.857
RandomPosOffset0.9890.0000.0000.0000.9310.0950.9070.172
ConstSpeed0.9960.7650.9380.8431.0000.9770.9920.985
ConstSpeedOffset0.9940.7800.6350.7000.9940.5540.8930.684
RandomSpeed0.9950.8830.6360.7400.9970.8000.9740.878
RandomSpeedOffset0.9900.8720.1520.2590.9550.1510.8020.254
EventualStop0.9960.8970.7500.8170.9740.2300.9480.371
Disruptive0.9890.6520.4680.5450.9860.4510.7800.571
DataReplay0.9900.6380.0920.1600.9740.2780.8130.414
StaleMessages0.9980.9550.8770.9140.9950.7280.9640.829
DoS0.9930.8770.9790.9250.9960.9470.9550.951
DoSRandom0.9850.8290.8340.8320.9910.8660.8550.861
DoSDisruptive0.9770.6390.6980.6670.9920.8630.8900.876
GridSybil0.9880.9360.8770.9050.9960.9820.9140.947
DataReplaySybil0.9900.4550.3030.3640.9850.3180.8130.457
DoSRandomSybil0.9870.7240.8830.7960.9910.8420.8570.849
DoSDisruptiveSybil0.9780.7360.4380.5490.9940.8940.9100.902
Table A7. Per-class performance of LSTM using optimal 14-feature set by attack type.
Table A7. Per-class performance of LSTM using optimal 14-feature set by attack type.
Attack TypeNo SMOTESMOTE
Acc Prec Rec F1 Acc Prec Rec F1
LSTM Architecture
Legitimate0.9440.9141.0000.9550.7660.9930.6470.783
ConstPos0.9950.7550.9530.8430.9980.9430.9780.960
ConstPosOffset0.9860.0000.0000.0000.9670.2760.9430.427
RandomPos0.9930.7830.7890.7860.9970.8060.9190.859
RandomPosOffset0.9891.0000.0060.0110.9300.0950.9330.173
ConstSpeed0.9940.7490.7780.7630.9990.9550.9620.958
ConstSpeedOffset0.9930.7140.6450.6780.9890.4220.9730.589
RandomSpeed0.9930.7630.6410.6960.9960.7740.9220.841
RandomSpeedOffset0.9900.8790.2080.3360.9500.1400.8350.240
EventualStop0.9950.8690.6730.7590.9780.2610.9480.409
Disruptive0.9880.5680.5170.5420.9870.4900.8390.619
DataReplay0.9890.5480.1050.1760.9720.2440.7100.363
StaleMessages0.9970.9640.7630.8520.9970.8150.9910.894
DoS0.9920.8700.9750.9200.9970.9580.9650.961
DoSRandom0.9720.6810.7060.6930.9920.8760.8840.880
DoSDisruptive0.9770.6680.5980.6310.9940.9070.9070.907
GridSybil0.9870.9350.8590.8950.9980.9780.9660.972
DataReplaySybil0.9900.4840.1630.2440.9840.3050.7730.438
DoSRandomSybil0.9750.5490.6480.5940.9920.8750.8530.864
DoSDisruptiveSybil0.9770.6440.5330.5830.9950.9120.9320.922
Table A8. Overall performance comparison on optimal 14-feature set (Basic kinematic + Engineered features).
Table A8. Overall performance comparison on optimal 14-feature set (Basic kinematic + Engineered features).
ModelBalancingAcc.F1Balanced AccuracyCohen’s κ ROC-AUC
LSTMNo SMOTE0.88960.87980.69150.82430.9793
GRUNo SMOTE0.89370.88370.68760.83050.9806
Bi-LSTMNo SMOTE0.91800.90930.73910.86730.9853
LSTMSMOTE0.74040.79180.89410.63510.9890
GRUSMOTE0.73700.78810.88950.62990.9895
Bi-LSTMSMOTE0.84850.87550.92100.76550.9955
Figure A1. Comprehensive performance and efficiency analysis across LSTM, GRU, and Bi-LSTM models. Top row (ad): Classification metrics showing Bi-LSTM superiority with 91.80% accuracy and 90.93% F1-score without SMOTE, while SMOTE improves balanced accuracy from 73.91% to 92.10% at the cost of overall accuracy. Bottom row (eh): Deployment-critical metrics revealing (e) Bi-LSTM requires 45.1 min training and 267 epochs for convergence, (f) uses 24.8 MB memory with 33.6 K parameters, (g) maintains consistent 36 ms mean latency with 46 ms P99 across all models, and (h) achieves 68.83 K samples/second throughput with lowest missed attack rate of 8.98%. The analysis demonstrates that Bi-LSTM’s superior detection performance (2.84% accuracy improvement, 18.7% reduction in missed attacks) justifies its slightly higher computational requirements, making it the optimal choice for safety-critical VANET deployment.

References

  1. Sonker, A.; Gupta, R.K. A new procedure for misbehavior detection in vehicular ad-hoc networks using machine learning. Int. J. Electr. Comput. Eng. 2021, 11, 2535–2547. [Google Scholar] [CrossRef]
  2. Son, L.H. Dealing with the new user cold-start problem in recommender systems: A comparative review. Inf. Syst. 2016, 58, 87–104. [Google Scholar] [CrossRef]
  3. Xu, X.; Wang, Y.; Wang, P. Comprehensive Review on Misbehavior Detection for Vehicular Ad Hoc Networks. J. Adv. Transp. 2022, 2022, 4725805. [Google Scholar] [CrossRef]
  4. Nobahari, A.; Bakhshayeshi Avval, D.; Akhbari, A.; Nobahary, S. Investigation of Different Mechanisms to Detect Misbehaving Nodes in Vehicle Ad-Hoc Networks (VANETs). Secur. Commun. Networks 2023, 2023, 4020275. [Google Scholar] [CrossRef]
  5. Dineshkumar, R.; Siddhanti, P.; Kodati, S.; Shnain, A.H.; Malathy, V. Misbehavior Detection for Position Falsification Attacks in VANETs Using Ensemble Machine Learning. In Proceedings of the 2024 Second International Conference on Data Science and Information System (ICDSIS), IEEE, Hassan, India, 17–18 May 2024; pp. 1–5. [Google Scholar]
  6. Saudagar, S.; Ranawat, R. An amalgamated novel ids model for misbehaviour detection using vereminet. Comput. Stand. Interfaces 2024, 88, 103783. [Google Scholar] [CrossRef]
  7. Federal Communications Commission. Amendment of Parts 2 and 90 of the Commission’s Rules to Allocate the 5.850–5.925 GHz Band to the Mobile Service for Dedicated Short Range Communications of Intelligent Transportation Systems; Report and Order FCC 99-305; Federal Communications Commission: Washington, DC, USA, 1999; ET Docket No. 98-95. [Google Scholar]
  8. Anwar, W.; Franchi, N.; Fettweis, G. Physical Layer Evaluation of V2X Communications Technologies: 5G NR-V2X, LTE-V2X, IEEE 802.11bd, and IEEE 802.11p. In Proceedings of the IEEE 90th Vehicular Technology Conference (VTC2019-Fall), Honolulu, HI, USA, 22–25 September 2019; pp. 1–7. [Google Scholar] [CrossRef]
  9. Kenney, J.B. Dedicated Short-Range Communications (DSRC) Standards in the United States. Proc. IEEE 2011, 99, 1162–1182. [Google Scholar] [CrossRef]
  10. Standard J2735; Dedicated Short Range Communications (DSRC) Message Set Dictionary. SAE International: Warrendale, PA, USA, 2016.
  11. Arif, M.; Wang, G.; Bhuiyan, M.Z.A.; Wang, T.; Chen, J. A Survey on Security Attacks in VANETs: Communication, Applications and Challenges. Veh. Commun. 2019, 19, 100179. [Google Scholar] [CrossRef]
  12. Alzahrani, M.; Idris, M.Y.; Ghaleb, F.A.; Budiarto, R. An Improved Robust Misbehavior Detection Scheme for Vehicular Ad Hoc Network. IEEE Access 2022, 10, 111241–111253. [Google Scholar] [CrossRef]
  13. Lyamin, N.; Vinel, A.; Jonsson, M.; Loo, J. Real-Time Detection of Denial-of-Service Attacks in IEEE 802.11p Vehicular Networks. IEEE Commun. Lett. 2014, 18, 110–113. [Google Scholar] [CrossRef]
  14. Hammi, B.; Idir, Y.M.; Zeadally, S.; Khatoun, R.; Nebhen, J. Is it Really Easy to Detect Sybil Attacks in C-ITS Environments: A Position Paper. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15069–15078. [Google Scholar] [CrossRef]
  15. Boualouache, A.; Engel, T. A Survey on Machine Learning-based Misbehavior Detection Systems for 5G and Beyond Vehicular Networks. IEEE Commun. Surv. Tutorials 2023, 25, 1128–1172. [Google Scholar] [CrossRef]
  16. Grover, J.; Prajapati, N.K.; Laxmi, V.; Gaur, M.S. Machine Learning Approach for Multiple Misbehavior Detection in VANET. In Proceedings of the International Conference on Advances in Computing and Communications, Kochi, India, 22–24 July 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 644–653. [Google Scholar]
  17. So, S.; Sharma, P.; Petit, J. Integrating Plausibility Checks and Machine Learning for Misbehavior Detection in VANET. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Orlando, FL, USA, 17–20 December 2018; pp. 564–571. [Google Scholar]
  18. Sharma, P.; Jaekel, A. Machine Learning Based Misbehaviour Detection in VANET Using Consecutive BSM Approach. IEEE Open J. Veh. Technol. 2021, 3, 1–14. [Google Scholar] [CrossRef]
  19. Zhang, C.; Chen, K.; Zeng, X.; Xue, X. Misbehavior Detection Based on Support Vector Machine and Dempster-Shafer Theory of Evidence in VANETs. IEEE Access 2018, 6, 59860–59870. [Google Scholar] [CrossRef]
  20. Kamel, J.; Haidar, F.; Jemaa, I.B.; Kaiser, A.; Lonc, B.; Urien, P. A Misbehavior Authority System for Sybil Attack Detection in C-ITS. In Proceedings of the 2019 IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), IEEE, New York City, NY, USA, 10–12 October 2019; pp. 1117–1123. [Google Scholar] [CrossRef]
  21. Alladi, T.; Kohli, V.; Chamola, V.; Yu, F.R. Securing the Internet of Vehicles: A Deep Learning-Based Classification Framework. IEEE Netw. Lett. 2021, 3, 94–97. [Google Scholar] [CrossRef]
  22. Alladi, T.; Gera, B.; Agrawal, A.; Chamola, V.; Yu, F.R. DeepADV: A deep neural network framework for anomaly detection in VANETs. IEEE Trans. Veh. Technol. 2021, 70, 12013–12023. [Google Scholar] [CrossRef]
  23. Campos, E.M.; Hernandez-Ramos, J.L.; Vidal, A.G.; Baldini, G.; Skarmeta, A. Misbehavior Detection in Intelligent Transportation Systems Based on Federated Learning. Internet Things 2024, 25, 101127. [Google Scholar] [CrossRef]
  24. Gurjar, D.; Grover, J.; Kheterpal, V.; Vasilakos, A. Federated Learning-Based Misbehavior Classification System for VANET Intrusion Detection. J. Intell. Inf. Syst. 2025, 63, 807–830. [Google Scholar] [CrossRef]
  25. Kumar, A.; Sharma, P.; Singh, R. Enhanced VANET Security Using XGBoost with Temporal Feature Engineering. Comput. Networks 2022, 215, 109183. [Google Scholar]
  26. Chen, X.; Wang, J.; Liu, F. Attention-Enhanced LSTM for Real-Time VANET Misbehavior Detection. Comput. Commun. 2024, 198, 45–56. [Google Scholar]
  27. Khan, W.; Ahmad, J.; Alasbali, N.; Al Mazroa, A.; Alshehri, M.S.; Khan, M.S. A novel transformer-based explainable AI approach using SHAP for intrusion detection in vehicular ad hoc networks. Comput. Networks 2025, 270, 111575. [Google Scholar] [CrossRef]
  28. Yuce, M.F.; Erturk, M.A.; Aydin, M.A. Misbehavior detection with spatio-temporal graph neural networks. Comput. Electr. Eng. 2024, 116, 109198. [Google Scholar] [CrossRef]
  29. van der Heijden, R.W.; Lukaseder, T.; Kargl, F. VeReMi: A Dataset for Comparable Evaluation of Misbehavior Detection in VANETs. In Proceedings of the International Conference on Security and Privacy in Communication Systems, Singapore, 8–10 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 318–337. [Google Scholar] [CrossRef]
  30. Kamel, J.; Wolf, M.; van der Hei, R.W.; Kaiser, A.; Urien, P.; Kargl, F. VeReMi Extension: A Dataset for Comparable Evaluation of Misbehavior Detection in VANETs. In Proceedings of the ICC 2020-IEEE International Conference on Communications, Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
  31. Kamel, J.; Ansari, M.R.; Petit, J.; Kaiser, A.; Jemaa, I.B.; Urien, P. Simulation Framework for Misbehavior Detection in Vehicular Networks. IEEE Trans. Veh. Technol. 2020, 69, 6631–6643. [Google Scholar] [CrossRef]
  32. Codeca, L.; Frank, R.; Faye, S.; Engel, T. Luxembourg SUMO Traffic (LuST) Scenario: Traffic Demand Evaluation. IEEE Intell. Transp. Syst. Mag. 2017, 9, 52–63. [Google Scholar] [CrossRef]
  33. Lv, C.; Lam, C.C.; Cao, Y.; Wang, Y.; Kaiwartya, O.; Wu, C. Leveraging Geographic Information and Social Indicators for Misbehavior Detection in VANETs. IEEE Trans. Consum. Electron. 2024, 70, 4411–4424. [Google Scholar] [CrossRef]
  34. Valentini, E.P.; Rocha Filho, G.P.; De Grande, R.E.; Ranieri, C.M.; Júnior, L.A.P.; Meneguette, R.I. A novel mechanism for misbehavior detection in vehicular networks. IEEE Access 2023, 11, 68113–68126. [Google Scholar] [CrossRef]
  35. Naqvi, I.; Chaudhary, A.; Kumar, A. A Neuro-Genetic Security Framework for Misbehavior Detection in VANETs. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 410–418. [Google Scholar] [CrossRef]
  36. Sangwan, A.; Sangwan, A.; Singh, R.P. A Classification of Misbehavior Detection Schemes for VANETs: A Survey. Wirel. Pers. Commun. 2023, 129, 285–322. [Google Scholar] [CrossRef]
  37. Patel, N.; Gupta, R.; Agarwal, S. Advanced Ensemble Methods for Large-Scale VANET Misbehavior Detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 2847–2859. [Google Scholar]
  38. Alladi, T.; Agrawal, A.; Gera, B.; Chamola, V.; Sikdar, B.; Guizani, M. Deep neural networks for securing IoT enabled vehicular ad-hoc networks. In Proceedings of the ICC 2021-IEEE International Conference on Communications, IEEE, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
  39. Liu, W.; Zhang, M.; Chen, L. Graph Neural Networks for Vehicular Behavior Analysis in VANETs. IEEE Trans. Veh. Technol. 2023, 72, 9876–9887. [Google Scholar]
  40. Wang, H.; Li, Q.; Zhou, Y. Transformer-Based Architectures for VANET Security: A Comprehensive Study. IEEE Netw. 2024, 38, 112–119. [Google Scholar]
  41. Kim, S.H.; Park, M.J.; Lee, D.W. Hybrid SVM-GRU Architecture for Enhanced VANET Intrusion Detection. J. Netw. Comput. Appl. 2023, 203, 103401. [Google Scholar]
  42. Rodriguez, C.; Martinez, E.; Gonzalez, P. RF-LSTM Hybrid Framework for Large-Scale VANET Misbehavior Detection. IEEE Trans. Netw. Serv. Manag. 2024, 21, 2234–2245. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.