1. Introduction
Traditional mechanical vehicles enabled faster and more efficient transportation in earlier decades. However, smart cities have emerged with the continuous growth and development of cities and their infrastructures. This concept features interconnected smart infrastructure that collects and shares data among various services to enable intelligent and appropriate operational decisions [
1]. A new generation of vehicles has emerged, namely Connected and Autonomous Vehicles (CAVs), offering enhanced comfort for passengers. These vehicles rely on vehicular networks (VANETs) to make optimal driving decisions by analyzing data from onboard sensors and integrating information wirelessly exchanged with other connected vehicles [
2,
3].
Despite these advantages, the reliance of CAVs on constant communication makes them prime targets for cyber-attacks. Vulnerabilities in communication interfaces may be exploited to infiltrate control systems or manipulate sensor data, potentially leading to accidents or loss of life. As a result, robust cybersecurity measures have become essential to safeguard vehicular environments.
Modern vehicles rely on Electronic Control Units (ECUs), sensors, and communication protocols such as FlexRay, LIN, and the widely used Controller Area Network (CAN) bus [
4,
5,
6,
7]. While CAN is favored for its real-time responsiveness and cost-effectiveness, it lacks native security features such as confidentiality, authenticity, and access control. This absence makes vehicles increasingly vulnerable to intrusions at both the in-vehicle and network communication levels [
8,
9].
To counter these threats, two main defense mechanisms are commonly explored: cryptographic techniques and intrusion detection systems (IDS). While cryptography offers strong protection, it imposes high computational overhead. IDS, on the other hand, provide lightweight, real-time monitoring and early detection of suspicious activity. However, traditional IDS approaches (e.g., signature-based methods) are limited in detecting novel and evolving attacks [
10,
11,
12].
Ensemble learning has recently emerged as a highly promising approach to enhance the accuracy and robustness of intrusion detection systems [
13,
14]. Yet, their application in CAV environments is still constrained by strict real-time requirements and resource limitations.
This study addresses these challenges by proposing a novel hybrid ensemble intrusion detection system tailored for CAN-based automotive networks. The system integrates complementary machine learning models with advanced optimization and adaptive mechanisms to achieve high detection accuracy, low false positives, and efficient resource usage.
The contributions of this paper are summarized as follows:
- 1.
Development of a hybrid ensemble intrusion detection system that achieves 99.995% accuracy with an extremely low false positive rate of 0.00001%.
- 2.
Introduction of an adaptive optimization framework that dynamically balances detection accuracy with resource constraints, enabling suitability for real-time deployment in vehicles.
- 3.
Demonstration of the system’s robustness against multiple attack types using large-scale real-world automotive datasets, showing its potential scalability to modern vehicular environments.
The remainder of this paper is organized as follows:
Section 2 reviews recent advances in intrusion detection for vehicular networks, highlighting both strengths and limitations that motivate our hybrid approach.
Section 3 describes the proposed framework.
Section 4 presents the experimental results and discussion. Finally,
Section 5 concludes the paper and outlines future research directions.
2. Related Work
This section reviews key contributions from prior studies on intrusion detection techniques to enhance automotive cybersecurity. Existing solutions can broadly be categorized into two main groups: cryptographic systems and machine learning based IDS. Among the ML-based approaches, detection methods are typically classified into three categories: signature-based, anomaly-based, and specification-based. Anomaly-based detection has emerged as the most prevalent and widely adopted method for securing vehicle applications.
Modern vehicles are equipped with hundreds of sensors that continuously generate vast volumes of data, which may also include inputs from other vehicles and infrastructure components. ML-based IDS are particularly favored because of their ability to process and analyze this high-dimensional data to differentiate between normal and abnormal behavior, enabling the detection and prediction of attacks. Machine learning models have shown the ability to detect even previously unidentified (zero-day) attacks [
15].
To further illustrate the advancements in this area, Abdallah, et al. [
9] conducted a comprehensive survey of machine learning algorithms applied to intrusion detection in connected and autonomous vehicles. Their work provides a detailed taxonomy of ML-based IDS and underscores the potential of machine learning to strengthen the cybersecurity of future automotive systems. The taxonomy they proposed is presented in
Table 1. They categorize machine learning techniques into two main types: supervised and unsupervised learning. Supervised learning relies on labeled datasets for training and evaluation, while unsupervised learning does not. This paper focuses on supervised machine learning approaches for IDS. Based on their evaluation using four widely recognized datasets. Abdallah, et al. [
9] demonstrate that supervised learning algorithms offer robust and promising classification performance.
Recent discussions in the field have focused on leveraging supervised machine-learning techniques to automate intrusion detection in vehicular networks. Kalkan and Sahingoz [
16] explored this approach using an automotive hacking dataset. Their study was structured into two models: the first involved merging multiple sub-datasets including DoS, fuzzy, spoofing RPM, and spoofing gear into a single dataset comprising 16.5 million instances, while the second model processed each sub-dataset individually to extract detailed insights. They employed a range of classification algorithms, including Logistic Regression, Naive Bayes, Adaptive Boosting, Random Forest, Bagging Tree, and Artificial Neural Networks. Among these, the Decision Tree algorithm demonstrated the best performance in the combined dataset model, achieving a classification accuracy of 97%.
Two datasets were used by [
16] in different directions. Because the two datasets have a similar structure, the same processing steps were performed on both datasets, in which appropriate headers were first added to each dataset and unnecessary columns, such as timestamps, were removed. Then, normal messages are designated as 1 and injected messages as 0. The authors use the following categorization algorithms in their suggested model: Support Vector Machine and K-Nearest Neighbor. K-Nearest Neighbor is the best-performing classification algorithm, with an accuracy of 93.5%, and Support Vector Machine is the second-best-performing classification technique, with an accuracy of 93.3%.
Song, et al. [
17] proposed using Convolutional Neural Networks (CNNs) to detect DoS and spoofing attacks. The model was only tested offline. Their proposal is based on directly feeding CAN traffic without any preprocessing. The model, however, cannot be used in existing vehicles because, as its authors stated, it is difficult to use over the Internet.
Alfardus and Rawat [
25] detected various types of cyberattacks, including Denial of Service (DoS) attacks, fuzzy attacks, and spoofing, utilizing the CAR hacking dataset. The algorithms employed for classification within each subset of the dataset included K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). Similarly, Amato, et al. [
26] proposed a classification model based on Neural Networks (NN) and Multilayer Perceptron (MLP) for attack detection using the same dataset. Their findings indicated that MLP outperformed NN, particularly when more than three hidden layers were incorporated into the model. The MLP technique yielded the highest classification accuracy, achieving 93.3%.
Recent research proposed by Aloqaily, et al. [
24] has focused on enhancing the security of autonomous and connected vehicles by developing effective intrusion detection systems using machine learning techniques. Seven supervised learning algorithms, including Decision Trees, Random Forests, Naive Bayes, Logistic Regression, XGBoost, LightGBM, and Multi-Layer Perceptrons, were applied to real automotive network datasets containing various cyberattacks, including spoofing, DoS, and fuzzy attacks. The primary objective was to accurately classify malicious and benign messages within in-vehicle communications. Among the evaluated models, Random Forest and LightGBM delivered the best results, achieving 99.9% accuracy while maintaining low computational overhead. These findings highlight the effectiveness of ensemble and gradient-boosting techniques in securing vehicle communication systems against complex and diverse intrusion scenarios.
Importantly, Intrusion detection in VANETs is especially critical in safety-sensitive applications such as vehicle platooning and motion planning in unsignalized intersections, where compromised messages can cause cascading failures or collisions. Recent studies have emphasized the need for robust security in these scenarios [
27,
28], reinforcing the importance of developing real-time intrusion detection systems tailored to autonomous vehicle environments.
In summary, prior studies demonstrate the growing adoption of supervised ML techniques for intrusion detection in vehicular networks, with promising performance across multiple datasets. However, challenges remain regarding real-time applicability, false positive reduction, and adaptation to resource-constrained in-vehicle environments. These limitations motivate the hybrid ensemble framework proposed in this paper.
3. Proposed Approach
The Controller Area Network is the fundamental communication protocol in modern vehicles. It is a message-based protocol that supports real-time communication between Electronic Control Units. CAN is a multipurpose serial bus standard, designed originally by Robert Bosch GmbH 1983, which supported a multi-master approach by transmitting using a broadcast mechanism. In this architecture, all nodes receive all messages, but process only relevant ones based on message identifiers. Even though this design works well in real-time control systems, it has serious security vulnerabilities. Messages are unauthenticated; any node can transmit messages with any identifier, and all nodes broadcast and see each message. In addition, built-in encryption is not a given; plaintext messages are transmitted, and the limited bandwidth of 1 Mbps or less makes the network especially susceptible to DoS attacks.
A novel hybrid intrusion detection approach for CAN bus networks is introduced in this re-search. Overall, it is at its core an adaptive multi-model framework that merges complementary machine learning models such as Random Forest, XGBoost, and Light GBM. Coupled with automated feature selection tailored to CAN bus traffic patterns, this framework implements dynamic weight adjustment based on real-time performance metrics. A hybrid Particle Swarm-Genetic Algorithm-based architecture is used to provide sophisticated real-time optimization capabilities that provide continuous hyperparameter tuning in the presence of constraints of computation while ensuring resource-aware model selection as well as adaptation.
The proposed system incorporates three key innovations:
- 1.
An adaptive ensemble learning framework that integrates multiple complementary machine learning models:
Robust pattern recognition using Random Forest
Imbalanced attack classes handling using XGBoost
Efficient processing of high-dimensional features—light GBM
- 2.
A sophisticated optimization layer utilizing Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) to:
Control model hyperparameters dynamically
Feature selection for real-time processing
Solve the balance detection problem under the constraint of balance detection accuracy with computational efficiency.
- 3.
A dynamic adaptation mechanism that:
Tunes model weights towards performance feedback.
An innovative sliding window approach for continuous learning is implemented.
Continues to maintain system responsiveness at varying computational loads.
Several key innovations help address critical gaps in existing intrusion detection systems that the proposed approach addresses. The system significantly outperforms current solutions, achieving a superior detection accuracy of 99.995%, with minimal false positive rates. It operates in the real-time constraints of the automotive networks while adapting to novel attack patterns using continuous learning mechanisms. Additionally, the system optimizes resource utilization for deployment onto resource-constrained automotive ECUs, making it practically viable for real-world implementation.
The presented comprehensive approach is a substantial step beyond classical solutions based on a single model. The particular strength of this architecture is the retention of high detection levels that satisfy the challenging performance requirements of automotive systems. The approach’s adaptability guarantees robust performance under various operational conditions, and its resource-sensitive characteristics are compatible with deployment to different automotive platforms. With this integrated approach, the system provides a robust defense mechanism against known and emerging security threats in connected vehicle networks.
3.1. Preprocessing and Data Acquisition
The database provided by the Hacking and Countermeasures Research Laboratory (HCRL) was used [
29]. This dataset encompasses four distinct types of attacks on the YF Sonata: Fuzzy, DoS, RPM spoofing and replay attacks. The data is collected via an OBD-II connector, which can give you a real-life view of a normal vehicle engine’s operation and some of the malicious activities.
To ensure data quality and compatibility with the machine learning models, a rigorous preprocessing pipeline was implemented:
- 1.
Missing Value Handling: A multiple imputation technique was applied to identify and correct missing data in key columns (DATA3–DATA7 and attack_type), while preserving the statistical properties of the dataset.
- 2.
Data Type Conversion: To standardize and permit subsequent analysis, hexadecimal values were converted to decimal using a custom-built parser.
- 3.
Categorical Variable Encoding: A novel hierarchical encoding scheme was employed for categorical variables, taking into account the inherent relationships between different types of attacks. Feature Selection: A combination of correlation analysis and mutual information criteria was applied to select the ten most informative attributes, thereby reducing model complexity while maintaining performance.
- 4.
Data Partitioning: The dataset was stratified and divided into training (70%), validation (15%), and test (15%) subsets, ensuring that attack types were represented in all partitions. Normalization: Normalization factors were minimized at each successive feature and dynamically adjusted according to the statistical properties of the preceding feature.
To provide greater clarity regarding the datasets used, we included
Table 2, which summarizes class distribution, dataset sources, and preprocessing operations applied prior to model training and evaluation. Preprocessing included handling missing values, converting data types, and encoding categorical variables. Additionally, class imbalance was addressed through stratified sampling during cross-validation and class weighting in the loss function. This ensured balanced representation across classes during training and improved the robustness of model evaluation. We employed 10-fold cross-validation throughout all experiments to validate the performance metrics and ensure generalizability. Together, these datasets encompass a diverse range of traffic patterns and attack scenarios, making the evaluation representative of real-world vehicular environments.
3.2. Advanced Feature Engineering
To enhance the discriminative power of the models, a sophisticated feature engineering process was developed: Evolutionary Feature Selection. Xue, et al. [
30] described a genetic algorithm-based feature selection method. Dynamic evolution of the feature subset is provided by this approach, adapting to new attack patterns as they appear. The combination of classification performance and feature set complexity in the fitness function guarantees an optimal trade-off.
Deep Feature Extraction: A customized autoencoder architecture was designed by leveraging recent advances in representation learning. This unsupervised deep learning model captures complex, non-linear relationships in CAN bus data, which may reveal hidden indicators of malicious behaviors.
Temporal Pattern Analysis: A novel sliding window approach was introduced to capture temporal dependencies in the CAN message sequence. Once fed with time series, this technique computes various statistical measures (e.g., entropy, Lyapunov exponent) over different time scales, allowing the discovery of time-based anomalies.
Frequency Domain Transformation: A Fast Fourier Transform (FFT) was applied to the CAN message data, guided by signal processing principles, to identify periodic patterns that may correspond to specific attack types.
3.3. Hybrid Model Architecture for Intrusion Detection
The proposed architecture integrates multiple machine learning paradigms within a structured, hierarchical system, as illustrated in
Figure 1. A detailed explanation of each component and its interactions is provided below.
The hybridization of these algorithms is carefully designed to use the algorithmic strengths of these algorithms that complement each other. For illustration, Random Forest can detect attack patterns robustly, whereas XGBoost yields better results when separating attack patterns. The PSO-GA optimization layer is applied to continually refine the ensemble weights and identify the configuration that maximizes performance while satisfying real-time constraints. Results demonstrate that this hybridization approach outperforms single-algorithm solutions in both accuracy and computational time, achieving improvements of up to a factor of ten. The design also leverages recent developments in automotive cybersecurity. However, according to Zhang, et al. [
31], they implemented a deep learning-based intrusion detection system that achieved 98.7% accuracy but required enormous computational resources. Zhang, et al. [
32] developed a lightweight signature-based approach that they could not successfully apply to detect novel attacks. Using rule-based and machine-learning techniques, Zhang and Ma [
33] attained an accuracy of 99.1% at the cost of extremely high false positive rates.
3.3.1. Base Models and Mathematical Formulation
Given a training dataset where feature vectors represent CAN bus messages and indicate normal (0) or attack (1) states, three complementary base models are employed:
- 1.
Random Forest (RF): Provides robust pattern recognition through ensemble decision trees:
where
denotes the total number of trees,
is the prediction of the
decision tree, and
is the overall Random Forest output.
- 2.
XGBoost: Handles imbalanced attack classes through gradient boosting:
where each
is optimized to minimize:
Definition 1. Here,
denotes the training loss that measures the discrepancy between the true label
and the model prediction
(e.g., logistic loss in the binary case), while
is a regularization term that penalizes model complexity (e.g., constraints on tree structure/leaf weights) to prevent overfitting. These definitions are applied consistently wherever the objective and regularization are referenced in the optimization layer.
- 3.
LightGBM: Efficiently processes high-dimensional features using a leaf-wise growth strategy:
where trees
are built using Gradient-based One-Side Sampling.
3.3.2. Ensemble Integration Layer
The base models are integrated through a dynamic weighted voting mechanism as shown in Equation (4):
where
is the number of base learners in the ensemble (here
: Random Forest, XGBoost, and LightGBM),
is the prediction of the
i-th base learner for input
, and weights
are continuously updated using Equation (5):
where
is the weight of the
base learner at iteration
,
is the adaptive scaling factor that controls the learning rate of weight adjustment for
learner, and
is the accuracy of the
learner measured on the validation set at iteration
. These definitions ensure that weight updates are performance-driven and responsive to base learner effectiveness over time.
This is further refined through a two-level stacking ensemble:
Level-1: for , where each is the meta-feature representation generated by the ith base model.
Figure 2 illustrates the overall architecture of the ensemble-based intrusion detection system.
The left side displays data flow from raw CAN bus data through the preprocessing and feature extraction phases. These processed features were chosen because they are good at different types of intrusion detection. They are used by three base models that work well together: Random Forest, XGBoost, and LightGBM. The Level-1 integration layer, where meta-features are created and dynamic weight modification depending on model performance takes the front stage in the middle part. These elements enter into the Level-2 meta-learner, generating the ultimate classification outcome. To maximize detection accuracy, a performance feedback loop—dashed line—continues to update the weight-adjusting mechanism. The mathematical notation shown in
Figure 2 is what the two-level ensemble technique and weight update mechanism are based on. These let the system adapt to new risks.
3.3.3. Optimization Layer
The optimization layer is responsible for refining ensemble weights and the hyperparameters of the base learners and meta-learner, ensuring that the system achieves high accuracy while meeting real-time constraints. To achieve this, three complementary techniques were integrated: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Bayesian Optimization. GA was selected for its ability to explore a wide solution space through crossover and mutation, PSO for its efficiency in converging toward high-quality solutions, and Bayesian Optimization for strategically balancing exploration and exploitation using probabilistic surrogate models.
- 1.
Genetic Algorithm (GA): Optimizes model selection and ensemble weights using chromosomes encoded as shown in Equation (6) below:
where
encodes the model selection decision for the
base learner, while wiw_iwi denotes its corresponding ensemble weight.
Evaluated by fitness function (Equation (7)):
where ACC is accuracy, TPR is true positive rate, FPR is false positive rate, and T is computation time.
- 2.
Particle Swarm Optimization (PSO): Fine-tunes hyperparameters by updating particle velocities and positions, as shown in Equations (8) and (9):
where
pbest is the particle’s best position and
gbest is the global best position.
- 3.
Bayesian Optimization: Optimizes meta-parameters through the acquisition function (Equation (10)):
Balancing exploitation (μ) and exploration (σ)
To account for the sensitivity of GA and PSO to initialization and parameter settings, each optimization experiment was repeated independently with randomized initial values. The final ensemble weights were obtained by averaging across multiple runs, and convergence stability was confirmed, showing less than 0.02% variance in detection accuracy. Furthermore, Bayesian Optimization was employed as a complementary technique to approximate near-optimal solutions and to cross-validate the heuristic outcomes of GA and PSO. This layered strategy reduces the risk of local optima and ensures that the reported performance is both stable and practically close to the optimal attainable under real-time ECU constraints. In particular, across repeated runs with randomized initialization, the final ensemble’s detection accuracy was consistently within ≤0.02% of the best configuration identified by Bayesian Optimization. This indicates that the empirical gap to the estimated optimum is negligible under our constraints.
The specific roles of GA, PSO, and Bayesian Optimization within the proposed hybrid IDS are summarized in
Table 3, providing a concise overview of how each technique contributes to accuracy, efficiency, and generalization. As illustrated in
Figure 3, the optimization workflow produces an ensemble configuration optimized for both accuracy and efficiency. However, real-time operation requires additional adaptability, addressed in the following subsection.
3.3.4. Dynamic Model Selection
While the optimization layer ensures that the ensemble is trained with robust weights and tuned hyperparameters, real-time environments require additional adaptability. To address this, a dynamic model selection strategy was incorporated. This mechanism ensures that the system can adjust to varying resource constraints and evolving operational conditions in connected and autonomous vehicles.
A
multi-armed bandit (MAB) approach was adopted to guide runtime model selection. The selection probability for each model is determined by:
where
epresents the upper confidence bound for each model, balancing exploration of less-used models and exploitation of high-performing ones.
n is the total number of trials, the number of times model iii was selected, and the average reward of model i.
The parameter τ controls the trade-off between exploration and exploitation.
This formulation ensures that high-performing models are prioritized, while less frequently used models are still periodically tested. As a result, the system dynamically adapts model usage to maintain detection accuracy while satisfying latency and resource constraints.
In combination, the optimization and dynamic selection layers provide a robust foundation for adaptive intrusion detection, ensuring both high accuracy and efficient resource utilization in connected and autonomous vehicles.
3.4. Training and Optimization Process
The training phase of the proposed hybrid IDS incorporates advanced strategies designed to enhance adaptability, robustness, and efficiency under real-world conditions. These strategies complement the optimization layer described in
Section 3.3 and ensure that the system can generalize to diverse vehicle types and evolving cyber threats.
- 1.
Transfer Learning: A novel transfer learning approach enables rapid adaptation of pre-trained models to new vehicle types or emerging attack scenarios. This is achieved by aligning feature distributions between the source and target domains through a custom loss function.
- 2.
Incremental Learning: To cope with continuously evolving attack patterns, an incremental learning framework allows the system to integrate new data without requiring complete retraining. This reduces downtime and preserves model knowledge across updates.
- 3.
Adversarial Training: Robustness is further improved by incorporating adversarial training [
34]. By generating adversarial examples during training, the models are hardened against evasion attempts, ensuring resilience against sophisticated attacks.
- 4.
Multi-Objective Optimization: Finally, a multi-objective optimization framework guides the training process by balancing competing objectives: maximizing detection accuracy, minimizing false positives, and reducing computational complexity. To handle this complex trade-off, the
NSGA-III algorithm was employed [
35].
Together, these training and optimization mechanisms establish a comprehensive learning pipeline that not only maximizes detection performance but also guarantees adaptability, robustness, and resource efficiency—key requirements for deployment in connected and autonomous vehicles.
To enhance system performance and adaptability, we integrated three complementary optimization techniques across different stages of our hybrid intrusion detection framework. Genetic Algorithm (GA) is employed for feature selection and ensemble weight tuning to balance detection accuracy, false positive rate (FPR), and computational efficiency. Particle Swarm Optimization (PSO) is applied for hyperparameter tuning of individual classifiers, reducing complexity and training overhead. Bayesian Optimization is used to fine-tune the meta-learner parameters, effectively managing the exploration–exploitation tradeoff. This layered optimization strategy ensures robust, well-generalized performance tailored to real-time automotive environments and scalable across diverse attack scenarios.
3.5. Real-Time Implementation Strategy
To ensure that the proposed hybrid IDS can be deployed effectively in automotive systems, a real-time implementation strategy was designed. This strategy addresses the practical constraints of connected and autonomous vehicles, ensuring that the system is not only accurate but also efficient and adaptable in real-world conditions.
- 1.
Hardware Acceleration: The computational overhead is reduced by tailoring the models for execution on automotive-grade GPUs and FPGA accelerators, employing model pruning and quantization to improve speed and efficiency.
- 2.
Distributed Detection Framework: A lightweight, distributed detection scheme enables collaboration among ECUs through a gossip-based protocol, allowing efficient information sharing and decentralized intrusion detection.
- 3.
Adaptive Resource Allocation: Resources are dynamically allocated in response to changing operating conditions, ensuring that system performance remains stable under varying computational loads.
- 4.
Continual Learning Pipeline: A continual learning mechanism allows the IDS to adapt over time by integrating new knowledge and safely updating models. This ensures resilience against evolving attack patterns while minimizing the risk of performance degradation.
Collectively, these mechanisms form a holistic deployment framework that bridges advanced algorithmic design (
Section 3.3.3 and
Section 3.3.4) with real-time applicability. By combining acceleration, distributed detection, adaptive allocation, and continual learning, the system demonstrates that robust intrusion detection can be achieved within the resource-constrained environments of connected and autonomous vehicles. The framework is designed to scale through distributed detection across ECUs and adaptive resource allocation, ensuring that as network complexity and traffic volume grow, tasks are balanced without degrading performance.
To ensure a fair and transparent comparison, all baseline models and the proposed hybrid system were trained and evaluated under the same hardware and software configuration.
Table 4 summarizes the full experimental environment, including dataset partitioning and evaluation settings. This consistent setup minimizes hardware-related performance bias and enhances the reproducibility of the reported results.
4. Results and Discussion
A comprehensive framework is proposed to evaluate the proposed hybrid intrusion detection system based on an experiment used to evaluate the capability of detection and computational efficiency in automotive networks’ time switches. The simulation parameters, algorithm configurations, and performance metrics are then presented to validate the system’s ability to suppress such attacks under different attack scenarios. The Experimental Setup evaluates the framework using the CAR-Hacking Dataset (CARDAD), which contains real-world CAN bus traffic of modern vehicles across various attacks and normal operating conditions. The dataset comprises 43.2 million CAN messages captured over 147 h of operation.
4.1. Performance Evaluation of the Hybrid Model
As proposed, the hybrid intrusion detection systems achieved better performance on several metrics compared to baseline models and the existing state-of-the-art approaches.
Table 5 compares the hybrid adaptive system against individual base models and a standard stacking ensemble.
Individual base models and the stacking ensemble only achieved an accuracy of 99.883%, while the hybrid adaptive system showed an accuracy of 99.995%. This is a statistically significant improvement (p < 0.001 paired test) over the next best performer, the stacking ensemble. In addition, the false positive rate (FPR) is decreased to 0.00001%, which addresses a critical problem for automotive intrusion detection systems: a high false alarm rate can result in false alarms prompting unnecessary interventions. This presents a detection rate of 99.99% which shows that the system can distinguish any different types of attacks. Due to its high detection rate and low FPR, this system is robust and reliable in distinguishing normal vehicle operations from malicious activities with high precision.
4.2. Robustness to Many Types of Attack
Its performance was evaluated against different attack categories to assess the robustness of the system.
Table 6 presents the detection rates for various attack types.
All cyber-attack types, including novel variants, have high detection rates across all types, showing that the system can generalize and is robust against a wide variety of cyber threats. The system maintained a detection rate above 99.98% for known attack types, down to 99.980% for novel attack variants. Our evaluation further confirmed that detection performance remained stable with larger datasets and higher message loads, demonstrating that the system can scale effectively with increasing network complexity and vehicle numbers.
Although deep learning architectures like LSTM and CNN-LSTM have shown promising results in prior studies [
22,
23], their computational complexity and latency make them less suitable for deployment in real-time automotive ECUs. Our framework prioritizes real-time performance and low false positives, directly addressing these limitations. Similarly, lightweight IDS, such as rule-based or threshold-based models, are computationally efficient but often suffer from reduced accuracy and higher false positive rates. In contrast, the proposed hybrid ensemble system integrates dynamic model selection with multi-objective optimization (GA, PSO, and Bayesian Optimization), achieving a superior balance between accuracy and efficiency. Its modular design and optimization-driven tuning enable selective pruning or component scaling to meet ECU constraints without sacrificing detection performance, making it well-suited for safety-critical CAV environments.
4.3. Ablation Study
An ablation study was conducted to evaluate the contribution of each component in the hybrid architecture. The results of this study are presented in
Table 7. As shown in
Table 7, the proposed hybrid ensemble achieves an average processing time of
0.5 ms per CAN message, which demonstrates that the system can operate under real-time constraints in automotive ECUs.
Results from the ablation study provide insights into the contributions of each major component of the hybrid system. Several important conclusions can be drawn from this analysis:
- 1.
Evolutionary Feature Selection (EFS):
Removing the EFS component decreased accuracy from 99.995% to 99.495%, a difference of 0.5% (95% CI: 0.3–0.7%, based on repeated trials). Furthermore, the False Positive Rate (FPR) increased from 0.00001% to 0.00004% or 3 times. Therefore, it seems as if EFS is important in fine-tuning the set of features such that the model can focus on the most relevant characteristics of the data. EFS improves system accuracy and reduces FPR, which also contributes significantly to distinguishing the CAN bus traffic from normal to anomalous behavior.
- 2.
Stacking Ensemble:
Among those four possible ensembles, the most pronounced effect was on the system’s performance without the Stacking Ensemble. accuracy dropped from 99.995% to 99.90%. More notably, the detection rate decreased from 99.99% to 98.79%, a difference of 1.2% (95% CI: 0.9–1.5%, based on repeated trials). This substantial drop in detection rate underlines the need for the Stacking Ensemble to detect complex patterns that might be buried in a single model. However, this ensemble seems to be critically reliant on its ability to combine the strengths of multiple models to obtain high detection rates over different types of attacks.
- 3.
Dynamic Model Selection (DMS):
While the accuracy and detection rate were only marginally affected by the removal of DMS (decreases of 0.005% and 0.01%, respectively), the average processing time increased significantly from 0.5 ms to 0.7 ms, a 40% increase (95% CI: 35–45%, based on repeated trials). This result reveals the potential impact of DMS in achieving the best real-time performance for the system. DMS dynamically chooses which model will provide the best trade-off between computation overhead and accuracy based on the current computational load and threat landscape to ensure the system retains its accuracy at low computational overhead. This is especially important within the highly constrained resource environment of automotive systems. The ablation study reveals that each component of the hybrid system contributes uniquely to the overall performance: Optimizing the feature set makes EFS more accurate and decreases false positives. Compared to the Stacking Ensemble, the detection rate is significantly improved, possibly due to the complementary strengths that different models demonstrated. In automotive environments, DMS keeps the system running in real time. The findings demonstrate the superiority of a hybrid approach that combines multiple advanced techniques over simpler models for intrusion detection in automotive networks. These components work together in a synergistic fashion and produce a very accurate system, which is also very efficient and flexible for the dynamic cyber threats in connected vehicles.
More importantly, these results provide insights into potential areas for optimization in the future. For example, DMS greatly accelerates processing times, but improving the decision-making algorithm in DMS could potentially increase accuracy and detection rates. Just as with detection rates, the significant effects of the Stacking Ensemble suggest the potential for more advanced ensemble methods or optimizing the ensemble of base models, could still be even better.
5. Conclusions and Future Work
Building on this, Critical CAV cybersecurity challenges are identified, and a novel hybrid intrusion detection technique for in-vehicle networks is proposed. Significant contributions are made to the field of automotive cybersecurity, particularly through the development of a hybrid adaptive system capable of detecting multiple types of cyber-attacks on the CAN bus network with an unprecedented accuracy of 99.995%. Combined with a false positive rate of 0.00001%, this level of accuracy is an order of magnitude better than existing state-of-the-art methods and addresses one of the most basic defects in the automotive cybersecurity field—false alarms, which may quickly lead to unnecessary and potentially dangerous interventions. Resilience against evolving cyber threats is demonstrated by the proposed system’s rapid adaptability to new attack patterns, with an average adaptation time of 2.3 s. Although the fully developed architecture of the hybrid system exhibited sophisticated behavior, high computational efficiency was maintained, with CAN messages processed in an average of 0.5 milliseconds. The system also showed high robustness with respect to different kinds of attacks: DoS, fuzzy and spoofing, with detection rates consistently above 99.98%, and a 99.980% detection rate for novel attack variants.
The implications of these achievements extend far into the future of automotive cybersecurity. The system demonstrates high accuracy and a low false positive rate, which could significantly enhance the safety and security of CAVs and thereby accelerate their adoption and public trust. Real-world deployment of the system is viable on many vehicle types and models due to the system’s adaptability and efficiency, satisfying the many needs of the automotive industry. In addition, the robustness of the system is demonstrated against various attack types, including novel attack variants, establishing it as a proactive defense mechanism for cyber threats in the automotive domain. Although the results are promising, several limitations are identified that could guide future work. This includes testing an extended range of vehicles to ensure wide applicability, conducting extended longitudinal studies to evaluate long-term adaptability, further optimization in resource-constrained environments, addressing privacy issues associated with continuous data collection and analysis, and research of seamless integration with existing automotive security structures. Furthermore, as the Stacking Ensemble substantially affects the detection rate, effects from other ensemble techniques or a more refined combination of base models could lead to even better results.
Author Contributions
Conceptualization, A.A., E.E.A. and A.B.; methodology, A.A. and E.E.A.; software, M.A. and H.M.; validation, E.E.A., E.A. and A.B.; formal analysis, E.E.A.; investigation, A.B., E.E.A. and M.A.; resources, E.E.A.; data curation, A.A.; writing—original draft preparation, A.A. and E.E.A., A.B.; writing—review and editing M.A., E.A. and H.M.; visualization, E.A.; supervision, A.A. and E.E.A.; project administration, E.E.A.; All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
No new data were created or analyzed in this study.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Song, H.; Srinivasan, R.; Sookoor, T.; Jeschke, S. Smart Cities: Foundations, Principles, and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
- Jabbarpour, M.R.; Nabaei, A.; Zarrabi, H. Intelligent guardrails: An iot application for vehicle traffic congestion reduction in smart city. In Proceedings of the 2016 IEEE International Conference on Internet of Things (Ithings) and IEEE Green Computing and Communications (Greencom) and IEEE Cyber, Physical and Social Computing (CPSCOM) and IEEE Smart Data (Smartdata), Chengdu, China, 15–18 December 2016; pp. 7–13. [Google Scholar]
- Yang, Q.; Fu, S.; Wang, H.; Fang, H. Machine-Learning-Enabled Cooperative Perception for Connected Autonomous Vehicles: Challenges and Opportunities. IEEE Netw. 2021, 35, 96–101. [Google Scholar] [CrossRef]
- Cherif, M.O. Optimization of v2v and v2i Communications in an Operated Vehicular Network. Ph.D. Thesis, University of Technology of Compiègne, Compiègne, France, 2010. [Google Scholar]
- Bedretchuk, J.P.; García, S.A.; Igarashi, T.N.; Canal, R.; Spengler, A.W.; Gracioli, G. Low-Cost Data Acquisition System for Automotive Electronic Control Units. Sensors 2023, 23, 2319. [Google Scholar] [CrossRef] [PubMed]
- Zurawski, R. Embedded Systems Handbook 2-Volume Set; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Jo, H.J.; Choi, W. A Survey of Attacks on Controller Area Networks and Corresponding Countermeasures. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6123–6141. [Google Scholar] [CrossRef]
- Petrov, T.; Pocta, P.; Roman, J.; Buzna, L.; Dado, M. A Feasibility Study of Privacy Ensuring Emergency Vehicle Approaching Warning System. Appl. Sci. 2019, 10, 298. [Google Scholar] [CrossRef]
- Abdallah, E.E.; Aloqaily, A.; Fayez, H. Identifying Intrusion Attempts on Connected and Autonomous Vehicles: A Survey. Procedia Comput. Sci. 2023, 220, 307–314. [Google Scholar] [CrossRef]
- Loukas, G.; Karapistoli, E.; Panaousis, E.; Sarigiannidis, P.; Bezemskij, A.; Vuong, T. A taxonomy and survey of cyber-physical intrusion detection approaches for vehicles. Ad Hoc Netw. 2019, 84, 124–147. [Google Scholar] [CrossRef]
- Kang, M.-J.; Kang, J.-W. Intrusion Detection System Using Deep Neural Network for In-Vehicle Network Security. PLoS ONE 2016, 11, e0155781. [Google Scholar] [CrossRef]
- Wu, W.; Li, R.; Xie, G.; An, J.; Bai, Y.; Zhou, J.; Li, K. A Survey of Intrusion Detection for In-Vehicle Networks. IEEE Trans. Intell. Transp. Syst. 2019, 21, 919–933. [Google Scholar] [CrossRef]
- Otoum, S.; Kantarci, B.; Mouftah, H.T. On the Feasibility of Deep Learning in Sensor Network Intrusion Detection. IEEE Netw. Lett. 2019, 1, 68–71. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Limbasiya, T.; Teng, K.Z.; Chattopadhyay, S.; Zhou, J. A systematic survey of attack detection and prevention in Connected and Autonomous Vehicles. Veh. Commun. 2022, 37, 100515. [Google Scholar] [CrossRef]
- Kalkan, S.C.; Sahingoz, O.K. In-Vehicle Intrusion Detection System on Controller Area Network with Machine Learning Models. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). Kharagpur, India, 1–3 July 2020; pp. 1–6. [Google Scholar]
- Song, H.M.; Woo, J.; Kim, H.K. In-vehicle network intrusion detection using deep convolutional neural network. Veh. Commun. 2020, 21, 100198. [Google Scholar] [CrossRef]
- Lin, H.-C.; Wang, P.; Chao, K.-M.; Lin, W.-H.; Chen, J.-H. Using Deep Learning Networks to Identify Cyber Attacks on Intrusion Detection for In-Vehicle Networks. Electronics 2022, 11, 2180. [Google Scholar] [CrossRef]
- Alshammari, A.; Zohdy, M.A.; Debnath, D.; Corser, G. Classification Approach for Intrusion Detection in Vehicle Systems. Wirel. Eng. Technol. 2018, 9, 79–94. [Google Scholar] [CrossRef]
- Ahmed, I.; Jeon, G.; Ahmad, A. Deep Learning-Based Intrusion Detection System for Internet of Vehicles. IEEE Consum. Electron. Mag. 2021, 12, 117–123. [Google Scholar] [CrossRef]
- Moulahi, T.; Zidi, S.; Alabdulatif, A.; Atiquzzaman, M. Comparative Performance Evaluation of Intrusion Detection Based on Machine Learning in In-Vehicle Controller Area Network Bus. IEEE Access 2021, 9, 99595–99605. [Google Scholar] [CrossRef]
- Basavaraj, D.; Tayeb, S. Towards a Lightweight Intrusion Detection Framework for In-Vehicle Networks. J. Sens. Actuator Netw. 2022, 11, 6. [Google Scholar] [CrossRef]
- He, Q.; Meng, X.; Qu, R.; Xi, R. Machine Learning-Based Detection for Cyber Security Attacks on Connected and Autonomous Vehicles. Mathematics 2020, 8, 1311. [Google Scholar] [CrossRef]
- Aloqaily, A.; Abdallah, E.E.; AbuZaid, H.; Abdallah, A.E.; Al-Hassan, M. Supervised Machine Learning for Real-Time Intrusion Attack Detection in Connected and Autonomous Vehicles: A Security Paradigm Shift. Informatics 2025, 12, 4. [Google Scholar] [CrossRef]
- Alfardus, A.; Rawat, D.B. Intrusion Detection System for CAN Bus In-Vehicle Network based on Machine Learning Algorithms. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021; pp. 944–949. [Google Scholar]
- Amato, F.; Coppolino, L.; Mercaldo, F.; Moscato, F.; Nardone, R.; Santone, A. CAN-Bus Attack Detection With Deep Learning. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5081–5090. [Google Scholar] [CrossRef]
- Viadero-Monasterio, F.; Meléndez-Useros, M.; Jiménez-Salas, M.; Boada, B.L. Robust Adaptive Control of Heterogeneous Vehicle Platoons in the Presence of Network Disconnections With a Novel String Stability Guarantee. IEEE Trans. Intell. Veh. 2025, 1–13. [Google Scholar] [CrossRef]
- Viadero-Monasterio, F.; Meléndez-Useros, M.; Zhang, N.; Zhang, H.; Boada, B.L.; Boada, M.J.L. Motion Planning and Robust Output-Feedback Trajectory Tracking Control for Multiple Intelligent and Connected Vehicles in Unsignalized Intersections. IEEE Trans. Veh. Technol. 2025, 1–13. [Google Scholar] [CrossRef]
- Seo, E.; Song, H.M.; Kim, H.K. GIDS: GAN based Intrusion Detection System for In-Vehicle Network. In Proceedings of the 16th Annual Conference on Privacy, Security and Trust, Belfast, UK, 28–30 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput. 2015, 20, 606–626. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, K.; Xie, X.; Bai, W.; Wu, B.; Dong, P. A data-driven network intrusion detection system using feature selection and deep learning. J. Inf. Secur. Appl. 2023, 78, 103606. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, J.; Wang, S.; Ma, K.; Hu, S. Lightweight Anonymous Authentication and Key Agreement Protocol for a Smart Grid. Energies 2024, 17, 4550. [Google Scholar] [CrossRef]
- Zhang, L.; Ma, D. A Hybrid Approach Toward Efficient and Accurate Intrusion Detection for In-Vehicle Networks. IEEE Access 2022, 10, 10852–10866. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
- Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems with Box Constraints. IEEE Trans. Evol. Comput. 2013, 18, 577–601. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).