You are currently viewing a new version of our website. To view the old version click .
Computers
  • Article
  • Open Access

3 October 2025

ZDBERTa: Advancing Zero-Day Cyberattack Detection in Internet of Vehicle with Zero-Shot Learning

,
,
and
1
Department of Computer Engineering, University of Engineering and Technology, Taxila 47050, Pakistan
2
Department of Computer Engineering, HITEC University Taxila, Taxila 47080, Pakistan
3
School of Computing, College of Science, Engineering and Technology, University of South Africa (UNISA), Pretoria 0003, South Africa
4
Technology Innovation Research Group, School of Information Technology, Whitecliffe College of Arts and Design, Whitecliffe, Wellington 6145, New Zealand

Abstract

The Internet of Vehicles (IoV) is becoming increasingly vulnerable to zero-day (ZD) cyberattacks, which often bypass conventional intrusion detection systems. To mitigate this challenge, this study proposes Zero-Day Bidirectional Encoder Representations from Transformers approach (ZDBERTa), a zero-shot learning (ZSL)-based framework for ZD attack detection, evaluated on the CICIoV2024 dataset. Unlike conventional AI models, ZSL enables the classification of attack types not previously encountered during the training phase. Two dataset variants are formed: Variant 1, created through synthetic traffic generation using a mixture of pattern-based, crossover, and mutation techniques, and Variant 2, augmented with a Generative Adversarial Network (GAN). To replicate realistic zero-day conditions, denial-of-service (DoS) attacks were omitted during training and introduced only at testing. The proposed ZDBERTa incorporates a Byte-Pair Encoding (BPE) tokenizer, a multi-layer transformer encoder, and a classification head for prediction, enabling the model to capture semantic patterns and identify previously unseen threats. The experimental results demonstrate that ZDBERTa achieves 86.677% accuracy on Variant 1, highlighting the complexity of zero-day detection, while performance significantly improves to 99.315% on Variant 2, underscoring the effectiveness of GAN-based augmentation. To the best of our knowledge, this is the first research to explore ZD detection within CICIoV2024, contributing a novel direction toward resilient IoV cybersecurity.

1. Introduction

The Internet of Things (IoT) has emerged as a central paradigm in modern computing, enabling interconnected devices to collect, process, and share data through various communication frameworks. By combining sensing, communication, and interaction capabilities, IoT systems create opportunities for automation and efficiency across diverse domains. However, this increased connectivity also introduces new vulnerabilities, making security a fundamental concern in their deployment. To address these challenges, advanced computational approaches are increasingly being explored, particularly those grounded in Artificial Intelligence (AI) [1].
AI plays a vital role in enhancing the reliability and security of IoT environments [2]. It encompasses methods that replicate human-like intelligence, such as learning from data, recognizing patterns, and processing language. Among these, machine learning and deep learning have gained significant traction in cybersecurity applications. While machine learning applies data-driven algorithms for prediction and decision making, deep learning leverages layered neural architectures to uncover complex patterns in network activity. The integration of these techniques provides IoT systems with greater adaptability and resilience, laying the foundation for their extension into more specialized domains, including the Internet of Vehicles (IoV).
The IoV builds upon IoT principles by embedding connectivity within transportation systems, allowing vehicles to communicate with one another as well as with infrastructure. This connectivity improves safety, navigation, and efficiency but simultaneously exposes vehicles to cyber threats. Common risks such as denial-of-service (DoS) and spoofing attacks undermine communication reliability and can directly endanger passenger safety [3]. These threats highlight the limitations of conventional defense mechanisms and point toward the need for intelligent, adaptive solutions driven by AI.
One of the most pressing issues in this regard is the rise of zero-day attacks, which exploit previously unknown vulnerabilities [4]. Traditional defense mechanisms often fail to identify these intrusions, allowing attackers to remain undetected for extended periods, sometimes exceeding 300 days. Such attacks result in severe financial and operational consequences, emphasizing the necessity for proactive detection strategies. Here, advanced AI models [5], particularly deep learning [6], have demonstrated promise in recognizing subtle anomalies within network traffic [7]. Nevertheless, their dependence on pre-labeled datasets restricts their ability to detect unseen threats.
To overcome this limitation, zero-shot learning (ZSL) has been introduced as a powerful extension of deep learning approaches. Unlike conventional models, ZSL enables the classification of attack types not previously encountered during training [8].
In the context of IoV, this capability holds significant potential for identifying zero-day attacks, thereby addressing one of the most critical gaps in smart and secure IoV. The integration of ZSL into IoV thus represents a forward-looking approach, combining the strengths of ZSL and AI to build more secure, resilient, and adaptive infrastructures for the future IoV.
Thus, in this work we explore ZSL for Zero-Day Attack detection in IoV utilizing a recent dataset of CICIoV2024 and achieve promising results. Important acronyms along with definitions are listed in Table 1.
Table 1. List of acronyms.

1.1. List of Contributions

In this work, we made the following key contributions:
  • We propose a ZSL-based technique for zero-day attack (ZD) detection in IoV, leveraging Generative Adversarial Networks (GANs) and Large Language Models (LLMs).
  • To address the class imbalance issue in the CICIoV2024 dataset, we evaluate four different techniques—pattern-based recognition, crossover, mutation, and GAN-based generation—and find GANs to be the most effective solution.
  • We propose ZDBERTa, which is a specialized model designed for zero-day attack detection in IoV. It leverages the Byte-Pair Encoding (BPE) tokenizer, a multi-layer transformer encoder, as its backbone, and a classification head (pooling, dense layer, and activation function) for final predictions. By incorporating semantic knowledge, ZDBERTa effectively detects previously unseen attacks, ensuring robust cybersecurity defense.
  • We conduct an extensive performance evaluation, measuring accuracy, precision, recall, and F1-score, showing that our proposed framework outperforms state-of-the-art methods.
  • We are the first to utilize ZDBERTa for zero-day attack detection in the CICIoV2024 dataset, establishing a new benchmark for AI-driven intrusion detection in IoV systems.

1.2. Organization of the Work

The structure of this paper is organized as follows:
  • Section 2 provides the related literature of AI-based attack detection in IoV.
  • Section 3 presents the details of the proposed methodology along with the proposed ZDBERTa.
  • Section 4 discusses the implementation and results.
  • Section 5 concludes the paper and highlights future research directions.

3. Proposed Methodology

In this section, we describe the proposed methodology through subsections of dataset description along with its issues, dataset preprocessing, synthetically generated dataset variants, binary-to-text conversion, and the proposed ZDBERTa.

3.1. CICIoV2024 Dataset

The CICIoV2024 dataset investigates vulnerabilities and attack scenarios in the CAN bus of a 2019 Ford vehicle, encompassing both benign and malicious traffic. It captures normal driving behavior along with emulated cyberattacks, such as denial of service (DoS) and spoofing. These attacks manipulate crucial parameters including engine RPM, steering wheel position, gas pedal position, and vehicle speed. By offering such diverse traffic patterns, the dataset serves as a practical resource for developing and testing intrusion detection techniques in vehicular networks, helping identify vulnerabilities and providing a benchmark to improve protection against automotive cyber threats. The subclasses of CICIoV2024 are illustrated in Figure 1.
Figure 1. CICIoV2024 dataset.

3.2. Data Preprocessing

The data preprocessing involved the following steps that can be seen in Figure 2, applied consistently across all dataset files, including spoofing_GAS, spoofing_RPM, spoofing_SPEED, spoofing_STEERING_WHEEL, DoS, and BENIGN:
Figure 2. Preprocessing steps.
  • Feature Refinement: Irrelevant attributes, such as category and specific_class, were eliminated from each dataset to retain only the essential features required for analysis.
  • Label Encoding: The label column originally contained categorical entries (e.g., ATTACK and BENIGN). These were transformed into numerical form as follows:
    ATTACK1
    BENIGN0
  • Redundancy Check: Each class in the dataset was inspected for duplicate entries. Duplicates were removed, and only unique samples were preserved. The results of this process are summarized in Table 6.
    Table 6. Unique samples in datasets.

3.3. Synthetic Data Generation

After extracting the unique samples from the CICIoV2024 dataset, the number of available records for some subclasses was very limited. To overcome this imbalance and ensure sufficient training data, synthetic data was generated in two different variants.

3.3.1. Variant 1: Pattern, Crossover, and Mutation

In the first approach, synthetic samples were created using a combination of three strategies:
  • Pattern-based generation: This technique relies on identifying underlying patterns in the available data and producing new samples that follow similar distributions. Pattern-based Generation Details: The pattern-based method works by first learning the feature activation probabilities and pairwise dependencies from the original dataset. Each feature is then generated through Bernoulli sampling according to these probabilities. To further reflect real-world variability and avoid simple duplication, a small controlled amount of noise is injected by randomly flipping bits. In other words, Algorithm 1 operates like a probabilistic storyteller: it learns which flags or protocol bits tend to be active, understands which combinations of features co-occur, and then synthesizes new samples that mimic this behavior while introducing slight randomness for diversity. In Algorithm 1, D represents the original dataset containing binary-valued features, while N denotes the number of synthetic samples to be generated. Each feature is denoted by f j , with its activation probability expressed as p j = P ( f j = 1 ) , capturing the likelihood of observing a “1” in that feature. The dependency between pairs of features ( f j , f k ) is modeled using the correlation coefficient corr ( f j , f k ) . A newly generated synthetic sample is represented as x i , where i { 1 , , N } . The final collection of generated samples forms the synthetic dataset, denoted as D synthetic . To further ensure variability and prevent overfitting, controlled random noise is introduced by flipping feature values with a small probability ϵ . In this way, diversity was created from limited samples: feature-level probabilities and correlations were estimated from the original data, and new samples were generated using Bernoulli sampling with controlled noise injection, ensuring the synthetic data preserved realistic binary patterns while introducing natural variability.
  • Crossover: Inspired by genetic algorithms, crossover combines attributes from two or more existing samples to generate new ones, thereby introducing diversity while preserving original data characteristics. The crossover strategy mimics genetic recombination. As shown in Figure 3, two parent feature vectors (Parent A and Parent B) exchange segments at a randomly selected crossover point k, creating two offspring (Offspring A and B ). This operation, formally described in Algorithm 2, ensures meaningful feature patterns are recombined into new, diverse synthetic samples. Crossover uses this operator, known as single-point crossover (swapping segments of two feature vectors).
    Figure 3. Method of single-point crossover operator.
Algorithm 1 Pattern-Based Synthetic Data Generation
Require: 
D: Original dataset with binary features
Require: 
N: Number of synthetic samples to generate
Ensure: 
  D s y n t h e t i c : Synthetic dataset
  1:
Preprocess dataset D (remove duplicates, handle missing values if any)
  2:
For each feature f j , compute activation probability p j = P ( f j = 1 )
  3:
For feature pairs ( f j , f k ) , compute correlation c o r r ( f j , f k )
  4:
for  i = 1 to N do
  5:
   Initialize empty sample x i
  6:
   for each feature f j  do
  7:
     Generate f j B e r n o u l l i ( p j )
  8:
   end for
  9:
   Adjust correlated features using c o r r ( f j , f k )
10:
   Introduce controlled noise (flip bits with probability ϵ )
11:
   Store x i in D s y n t h e t i c
12:
end for
13:
return  D s y n t h e t i c
Algorithm 2 Single-Point Crossover
Require: 
Parent A, Parent B: binary feature vectors
Ensure: 
Offspring A , Offspring B : recombined samples
 1:
Randomly choose crossover point k such that 1 k < | P a r e n t A |
 2:
Offspring A [1:k] ← Parent A[1:k]
 3:
Offspring A [k+1:end] ← Parent B[k+1:end]
 4:
Offspring B [1:k] ← Parent B[1:k]
 5:
Offspring B [k+1:end] ← Parent A[k+1:end]
 6:
return Offspring A , Offspring B
  • Mutation: Mutation slightly modifies feature values in existing samples to create novel variations. This helps in covering a broader feature space and reduces the risk of overfitting. The mutation strategy introduces controlled randomness into feature vectors. As shown in Algorithm 3, a random mutation point k is selected in both Parent A and Parent B, and with probability m, the bit is flipped (0 → 1 or 1 → 0). The offspring vectors (denoted as Offspring A and Offspring B ) are generated after mutation, as shown in Figure 4. This simple but powerful operator prevents overfitting, avoids duplication of parent samples, and ensures rare variations (such as zero-day attack indicators) are represented in the synthetic dataset. In mutation, the bit-flip mutation operator is used (flipping feature values 0 1 with small probability).
    Figure 4. Method of bit-flip mutation operator.
While pattern-based methods tend to over-represent frequent feature combinations while under-representing rare but realistic ones, on the other hand, crossover and mutation methods produced limited novelty but a risk of generating unrealistic hybrids when applied to very small pools. This results in a synthetic dataset that is less varied, more biased, and potentially “easier” for classifiers to memorize rather than generalize from. Due to the above facts, by mixing the results of these three methods’ results, a richer and more diverse synthetic dataset was obtained.
Algorithm 3 Bit-Flip Mutation
Require: 
Parent A, Parent B: binary feature vectors; m: Mutation probability
Ensure: 
Offspring A , Offspring B : mutated samples
 1:
Copy Parent A into Offspring A
 2:
Copy Parent B into Offspring B
 3:
Randomly choose mutation point k such that 1 k | P a r e n t A |
 4:
Generate random number r [ 0 , 1 ]
 5:
if  r m  then
 6:
   Flip Offspring A [ k ] (0 → 1 or 1 → 0)
 7:
   Flip Offspring B [ k ] (0 → 1 or 1 → 0)
 8:
end if
 9:
return Offspring A , Offspring B

3.3.2. Variant 2: GAN-Based Data Generation

GAN Architecture

The second variant employed a GAN. GANs consist of two neural networks—a generator G that produces synthetic samples and a discriminator D that evaluates their authenticity. Through adversarial training, the generator learns the distribution of the real data and produces highly realistic synthetic records. The GAN architecture is shown in Figure 5. Several attack classes in our dataset contain extremely few unique samples (Spoofing Gas: two samples; Spoofing Steering Wheel: three samples). Such scarcity creates two primary risks for GAN-based data generation: (i) memorization/replication, where the generator reproduces training samples instead of synthesizing novel ones, and (ii) mode collapse, where the generator outputs a narrow set of nearly identical samples. Both effects reduce the utility of synthetic data for training robust detectors and can produce over-optimistic evaluation results if not handled properly. To mitigate the challenges of generating synthetic data from very limited unique samples, we incorporated several mechanisms into our GAN framework. First, Gaussian noise was injected both into the latent input vector z N ( 0 , 1 ) and into intermediate generator layers, encouraging exploration of a broader feature space and reducing direct memorization. Additionally, dropout was applied within the generator’s hidden layers to prevent reliance on a few dominant activations, thereby promoting distributed representation learning. To further ensure diversity, a minibatch discrimination module was employed in the discriminator, which penalized identical or near-identical outputs and improved intra-batch variability. For classes with extremely few real samples, we also applied controlled oversampling by introducing minor bit-flip perturbations to real samples prior to GAN training. This enriched the training signal while preserving the realism of the data. Finally, training was carefully monitored using diversity metrics such as uniqueness ratio and pairwise Hamming distance, with early stopping applied when replication began to increase. Collectively, these mechanisms acted as safeguards against mode collapse and over-memorization, enabling the generation of more diverse and high-fidelity samples even with very limited original data. The GAN was implemented as a neural network due to the tabular and binary nature of the features:
Figure 5. Architecture of GAN-based synthetic data generation.
  • Generator (G): Input z N ( 0 , 1 ) of dimension 64; two hidden layers (128 and 64 neurons) with ReLU activation; output layer with sigmoid activation producing binary-like feature vectors.
  • Discriminator (D): Input dimension equal to feature vector size; two hidden layers (128 and 64 neurons) with LeakyReLU activation; output layer with sigmoid activation producing probability of “real” vs. “synthetic.”
  • Optimization: Both G and D trained with Adam optimizer ( η G = 0.0002 , η D = 0.0002 ), β 1 = 0.5 , β 2 = 0.999 .

3.3.3. Training Procedure

Training followed the standard adversarial process in Algorithm 4 in each iteration, D was updated to better distinguish real from synthetic samples, and G was updated to fool D. Convergence was monitored with the help of the discriminator loss and the diversity of generated samples. The GAN-based data generation strategy leveraged adversarial training between the Generator and Discriminator to encourage the creation of samples that were both realistic (fidelity) and diverse (variation), even from a very small pool of training samples. Techniques such as noise injection, minibatch discrimination, and feature-level regularization were used to avoid mode collapse and overfitting. These mechanisms collectively acted as safeguards against mode collapse and direct memorization of the 2–3 original samples. Through these mechanisms, the GAN-based data generation was able to produce diverse, high-fidelity samples even for classes with very limited original data, thereby enhancing the robustness of downstream models.
Algorithm 4 Training Procedure for GANs-based Data Generation
Require: 
Generator G, Discriminator D, learning rates η G , η D , noise prior p z ( z ) , real data distribution p d a t a ( x )
Ensure: 
Trained G and D
  1:
Initialize G and D with random weights
  2:
repeat
  3:
   Update Discriminator D:
  4:
   Sample minibatch of real data { x 1 , , x m } p d a t a ( x )
  5:
   Sample minibatch of noise { z 1 , , z m } p z ( z )
  6:
   Generate fake samples { G ( z 1 ) , , G ( z m ) }
  7:
   Compute loss L D = 1 m i = 1 m [ log D ( x i ) + log ( 1 D ( G ( z i ) ) ) ]
  8:
   Update D D η D θ D L D
  9:
   Update Generator G:
10:
   Sample minibatch of noise { z 1 , , z m } p z ( z )
11:
   Generate fake samples { G ( z 1 ) , , G ( z m ) }
12:
   Compute loss L G = 1 m i = 1 m log D ( G ( z i ) )
13:
   Update G G η G θ G L G
14:
until Convergence or max epochs
15:
return Trained generator G

3.4. Binary-to-Text Encoding Scheme

Following the construction of the synthetic datasets, the next step involved applying Base64 encoding to the textual inputs. This transformation converts raw data into a standardized ASCII representation, ensuring that special characters and non-printable elements are consistently preserved across all samples. Such encoding is particularly useful when preparing inputs for transformer-based models like RoBERTa, as it guarantees a uniform format prior to tokenization. By performing this step, potential parsing issues were minimized, and the dataset was made fully compatible with subsequent RoBERTa preprocessing.

3.5. Train–Test Split

The dataset was divided into training and testing sets as shown in Figure 6. A total of 82,500 samples were used, with 62,500 for training and 20,000 for testing. The training set consists of 30,000 benign samples and 32,500 spoofing samples (equally distributed across GAS, RPM, Steering Wheel, and Speed). The testing set contains 8000 benign samples, 8000 spoofing samples (2000 per subclass), and 4000 DoS samples.
Figure 6. Train–test split.

Synthetic Dataset Variants

To ensure consistency, the same train–test distribution was maintained for both synthetic data generation techniques. Two separate datasets were created:
  • Variant 1: Generated using a combination of pattern-based generation, crossover, and mutation strategies.
  • Variant 2: Generated using a Generative Adversarial Network (GAN), which learns the real data distribution to produce realistic synthetic samples.
This resulted in two synthetic datasets with identical sample distribution, allowing for a fair comparison of the two generation methods.

3.6. Proposed ZDBERTa

The complete proposed methodology can be visualized in Figure 7.
Figure 7. Proposed methodology: Workings of ZDBERTa.

3.6.1. BERT

Bidirectional Encoder Representations from Transformers (BERT) [28] was the first model to capture bidirectional context using a transformer encoder, unlike ELMo (bi-LSTMs) or GPT (left-to-right). BERT-base consists of 12 encoder layers with multi-head attention, pretrained on BookCorpus and Wikipedia via two tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). It employs WordPiece tokenization (30 k vocab) and uses special tokens [CLS], [SEP], [MASK], and [PAD].

3.6.2. RoBERTa

RoBERTa (Robustly Optimized BERT Pretraining Approach) [29] retains BERT’s architecture but optimizes training. Key changes include removing NSP, applying dynamic masking, larger mini-batches, a 50 k byte-level BPE vocabulary, and training on a much larger corpus (160 GB vs. 16 GB). These adjustments enable it to consistently outperform BERT on NLP benchmarks.

Architectural Comparison

BERT and RoBERTa share the same transformer encoder design; their differences lie in training strategies. RoBERTa drops NSP, employs dynamic masking, uses larger datasets and batches, and adopts byte-level BPE, leading to stronger performance without altering the core architecture.

3.6.3. ZDBERTa Method

  • Tokenization: Input text is first processed using a byte-level Byte-Pair Encoding (BBPE) tokenizer. This converts text into subword tokens, allowing effective handling of rare and unseen words.
  • Embedding Layer:
    • Each token is converted into a numerical vector using RoBERTa’s embedding layer.
    • Positional embeddings are added so that the model can understand the order of tokens.
  • Transformer Encoder Layers:
    • Multiple transformer layers process the input representations.
    • Each layer employs multi-head self-attention to capture dependencies between tokens.
    • Feed-forward networks are applied to learn higher-level abstract features.
    • The final output is a contextualized representation of the entire sequence.
  • CLS Token: After processing through all transformer layers, the [CLS] token embedding captures the complete contextual information of the input text and acts as a fixed-size feature vector representing the entire input sequence.
  • Pooling Layer:
    • From the encoder’s final hidden states, a condensed representation is derived.
    • Common approaches include using the [CLS] token embedding or mean pooling.
    • This pooled vector serves as a summary of the input sequence.
  • Dense (Fully Connected) Layer:
    • The pooled vector is passed through a fully connected dense layer.
    • This maps the high-dimensional representation into a lower-dimensional, task-specific space.
  • Activation Function:
    • A non-linear activation function, e.g., ReLU, is applied.
    • This introduces non-linearity, which enhances the model’s ability to learn complex patterns.
  • Output Layer:
    • The final classification layer generates prediction probabilities.
    • Sigmoid activation is used for binary classification.
    • Example classes: 0 = Benign and 1 = Attack.
  • Zero-Day Attack Detection (Zero-Shot Learning):
    • During training, the model is exposed only to benign and known attack classes (e.g., spoofing).
    • At test time, unseen attack classes (e.g., DoS) are introduced using semantic descriptions.
    • RoBERTa, in combination with zero-shot classification, leverages its learned representations and semantic understanding to recognize and classify previously unseen zero-day attacks.

4. Implementation Setup, Evaluation Results, and Discussion

In this section, we discuss the implementation setup, evaluation parameters, and results of both variants. Then, we also provide a comparison with the state-of-the-art methods.

4.1. Hardware and Training Configuration

For experimentation, the CICIoV2024 dataset was employed, where the training set only included benign traffic and spoofing-based attack samples. All experiments were carried out on a computing environment with the specifications summarized in Table 7.
Table 7. Hardware and Training Configuration.
This configuration was chosen to simulate realistic constraints in terms of both hardware resources and available training data, ensuring a fair and reproducible evaluation of the proposed ZDBERTa framework.

4.2. Experiment Design

To evaluate the effectiveness of the proposed ZDBERTa model, we conducted two experiments using synthetically generated datasets. The experiments were designed to assess the model’s ability to detect both known and zero-day attacks under different data generation strategies. In both cases, we followed a standardized pipeline for preprocessing, model training, and evaluation.

4.2.1. Experiment 1

In the first experiment, the dataset was generated using a combination of pattern-based generation, crossover, and mutation strategies. We named it Variant 1. After preparing the datasets, we implemented the proposed ZDBERTa model. The process began with tokenization, where input text was converted into subword tokens using a byte-level BPE tokenizer to effectively handle rare and unseen words. Next, each token was mapped into numerical vectors through RoBERTa’s embedding layer, with positional embeddings added to preserve token order. These embeddings were then passed through multiple transformer encoder layers, which applied multi-head self-attention to capture dependencies between tokens and feed-forward networks to learn higher-level abstract features. The [CLS] token embedding was then used as a fixed-size sequence representation, followed by a pooling step (using either the [CLS] token or mean pooling) to obtain a condensed vector. This pooled vector was passed through a dense layer to reduce dimensionality, after which a ReLU activation function was applied to enable the learning of complex patterns. The final output classification layer, activated with a sigmoid function, produced probabilities for binary classification (0 = Benign, 1 = Attack). For zero-day attack detection, the model was trained only on benign and known attack classes (e.g., spoofing), and during testing, an unseen attack class (e.g., DoS) was introduced using semantic descriptions, enabling ZDBERTa to leverage RoBERTa’s contextual representations for recognizing novel attacks. Finally, we evaluated the model’s performance using accuracy, precision, recall, and F1-score, providing a comprehensive assessment of its effectiveness in detecting both known and unseen attacks.

4.2.2. Experiment 2

The second experiment followed the same training–testing protocol as Experiment 1 but employed a GAN to synthesize realistic and diverse attack samples. Variant 2 of the dataset is used in this experiment. The ZDBERTa model was applied using the identical pipeline (tokenization, embedding, transformer encoding, pooling, dense + ReLU, and sigmoid classifier). For zero-day evaluation, the model was again trained on benign and known classes, while an unseen attack (DoS) was introduced during testing. The evaluation metrics (accuracy, precision, recall, and F1-score) remained consistent to ensure comparability with Experiment 1.

4.3. Evaluation Parameters

To measure the effectiveness of the proposed approach, standard classification metrics were applied. The model’s predictions were compared against ground-truth labels to determine the frequency of correct and incorrect classifications. This comparison yields four fundamental outcomes:
  • True Positives (TP): Instances correctly identified as attacks.
  • True Negatives (TN): Instances correctly identified as benign traffic.
  • False Positives (FP): Benign samples that were incorrectly labeled as attacks.
  • False Negatives (FN): Attack samples that were mistakenly classified as benign.
Based on these outcomes, four widely used performance measures were computed: accuracy, precision, recall, and the F1-score. These provide a comprehensive view of the model’s capability in distinguishing between normal and malicious traffic.
  • Accuracy—The overall proportion of correct predictions, considering both benign and attack classes:
    Accuracy = TP + TN TP + TN + FP + FN
  • Precision—Indicates how many of the samples predicted as attacks were actually attacks:
    Precision = TP TP + FP
  • Recall—Measures how many real attacks were successfully detected:
    Recall = TP TP + FN
  • F1-score—Provides a balanced measure by combining precision and recall through their harmonic mean:
    F 1 - score = 2 × Precision × Recall Precision + Recall

4.4. Results and Discussion

This section presents the experimental results obtained from the two synthetic dataset variants. Each subsection reports the performance of the ZDBERTa models in terms of precision, accuracy, recall, and F1-score.

4.4.1. Variant 1: Pattern, Crossover, and Mutation-Based Dataset

For Variant 1, which uses a pattern, crossover, and mutation-based dataset, ZDBERTa achieves a precision of 99.931%, an accuracy of 86.677%, recall of 77.835%, and an F1-score of 87.510% (Table 8).
Table 8. Performance of ZDBERTa on Variant 1 dataset.

4.4.2. Variant 2 Results: GANs-Based Dataset Results

On Variant 2, which leverages GAN-based synthetic data to simulate realistic attack scenarios, the model exhibits a significant improvement in performance, achieving an accuracy of 99.315%, precision of 99.957%, recall of 98.901%, and F1-score of 99.427% (Table 9). The GAN-based augmentation provides more diverse and representative attack instances, enabling ZDBERTa to generalize better and detect nearly all attack types with minimal false positives. The superior performance of GAN-based synthetic data is attributed to the adversarial training mechanism that enforces both realism and diversity. Unlike the simpler approaches in Variant 1 (pattern-, crossover-, and mutation-based generation), GANs continuously refine the generator through feedback from the discriminator. This iterative competition compels the generator to capture high-order feature dependencies and complex correlations that simpler probabilistic or rule-based methods cannot fully model.
Table 9. Performance of ZDBERTa on Variant 2 dataset.
The results indicate that Variant 1, while computationally efficient, may not fully generalize across unseen zero-day attacks because they lack the ability to capture deeper latent structures. In contrast, Variant 2, using GAN-based data, not only improved classifier robustness but also provided stronger out-of-distribution generalization, making it more effective in simulating realistic zero-day conditions.

4.5. Comparison with SOTA and Discussion

The experimental evaluation demonstrates the effectiveness of ZDBERTa across two synthetic dataset variants and in comparison with state-of-the-art (SOTA) approaches. Finally, Table 10 compares ZDBERTa with existing SOTA methods on the CICIoV2024 dataset. ZDBERTa outperforms prior approaches, including LR, RF, DNN, and various zero-day aware models, achieving an accuracy of 99.315%. The comparison highlights the advantage of leveraging transformer-based architectures with zero-shot and zero-day learning capabilities in the detection of complex IoV attacks. Overall, these results validate the robustness and superiority of ZDBERTa for practical IoV intrusion detection tasks.
Table 10. ZDBERTa comparison with SOTA.

5. Conclusions and Future Insights

This work introduced ZDBERTa, a zero-shot learning (ZSL)-based framework for detecting zero-day attacks in the IoV using the CICIoV2024 dataset. The model integrates a Byte-Pair Encoding (BPE) tokenizer, a multi-layer transformer encoder, and a classification head, enabling it to incorporate semantic knowledge for identifying previously unseen attacks. To simulate a realistic zero-day scenario, DoS attacks were excluded during training and introduced only at the testing stage. Results show that ZDBERTa achieved an accuracy of 86.677% on the synthetically generated Variant 1 dataset, highlighting its effectiveness in handling the inherent challenges of zero-day detection. To further enhance performance, a GAN was employed to construct the Variant 2 dataset, leading to a substantial improvement with an accuracy of 99.315%. Comparative analysis against state-of-the-art approaches demonstrates that ZDBERTa provides superior adaptability and robustness in detecting novel attack patterns. To the best of our knowledge, this is the first study to address zero-day attack detection in CICIoV2024, offering a new direction for IoV cybersecurity.
While ZDBERTa demonstrates strong potential for zero-day attack detection in IoV networks, several avenues remain for future research. First, extending the evaluation beyond the CICIoV2024 dataset to real-world vehicular traffic will help validate its generalizability under diverse operational environments. Second, although GAN-based augmentation substantially improved performance, exploring alternative generative models such as Variational Autoencoders (VAEs) or diffusion-based approaches may produce richer synthetic traffic patterns. Third, optimizing ZDBERTa for deployment on resource-constrained IoV devices through lightweight architectures or knowledge distillation can enhance its practical applicability. Finally, integrating explainable AI mechanisms to interpret ZDBERTa’s predictions could increase transparency, trust, and adoption in safety-critical transportation systems.

Author Contributions

Conceptualization, S.A.; Methodology, A.M. and S.A.; Software, A.M.; Validation, S.A. and M.H.Y.; Formal analysis, M.H.Y. and M.A.A.; Investigation, A.M.; Resources, M.H.Y. and M.A.A.; Writing—original draft, A.M. and S.A.; Writing — review & editing, S.A., M.H.Y. and M.A.A.; Visualization, A.M.; Supervision, S.A., M.H.Y. and M.A.A.; Project administration, S.A., M.H.Y. and M.A.A.; Funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Technology Innovation Research Group, School of Information Technology, Whitecliffe, Wellington 6145, New Zealand grant number RG-002.

Data Availability Statement

The original data presented in the study are openly available in CICIoV2024 at https://www.unb.ca/cic/datasets/iov-dataset-2024.html (accessed on 20 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Menon, U.V.; Babu Kumaravelu, V.; Kumar, C.V.; Rammohan, A.; Chinnadurai, S.; Venkatesan, R.; Hai, H.; Selvaprabhu, P. AI-Powered IoT: A Survey on Integrating Artificial Intelligence with IoT for Enhanced Security, Efficiency, and Smart Applications. IEEE Access 2025, 13, 50296–50339. [Google Scholar] [CrossRef]
  2. Ali, J.; Kumar Singh, S.; Jiang, W.; Alenezi, A.M.; Islam, M.; Ibrahim Daradkeh, Y.; Mehmood, A. A deep dive into cybersecurity solutions for AI-driven IoT-enabled smart cities in advanced communication networks. Comput. Commun. 2025, 229, 108000. [Google Scholar] [CrossRef]
  3. Khezri, E.; Hassanzadeh, H.; Yahya, R.O.; Mir, M. Security Challenges in Internet of Vehicles (IoV) for ITS: A Survey. Tsinghua Sci. Technol. 2025, 30, 1700–1723. [Google Scholar] [CrossRef]
  4. Xu, B.; Zhao, J.; Wang, B.; He, G. Detection of zero-day attacks via sample augmentation for the Internet of Vehicles. Veh. Commun. 2025, 52, 100887. [Google Scholar] [CrossRef]
  5. Guo, Y. A review of Machine Learning-based zero-day attack detection: Challenges and future directions. Comput. Commun. 2023, 198, 175–185. [Google Scholar] [CrossRef]
  6. Arun, A.; Nair, A.S.; Sreedevi, A.G. Zero Day Attack Detection and Simulation through Deep Learning Techniques. In Proceedings of the 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 18–19 January 2024; pp. 852–857. [Google Scholar] [CrossRef]
  7. Ali, S.; Rehman, S.U.; Imran, A.; Adeem, G.; Iqbal, Z.; Kim, K.I. Comparative Evaluation of AI-Based Techniques for Zero-Day Attacks Detection. Electronics 2022, 11, 3934. [Google Scholar] [CrossRef]
  8. Sarhan, M.; Layeghy, S.; Gallagher, M.; Portmann, M. From zero-shot machine learning to zero-day attack detection. Int. J. Inf. Secur. 2023, 22, 947–959. [Google Scholar] [CrossRef]
  9. Neto, E.C.P.; Taslimasa, H.; Dadkhah, S.; Iqbal, S.; Xiong, P.; Rahman, T.; Ghorbani, A.A. CICIoV2024: Advancing realistic IDS approaches against DoS and spoofing attack in IoV CAN bus. Internet Things 2024, 26, 101209. [Google Scholar] [CrossRef]
  10. Amirudin, N.; Abdulkadir, S.J. Comparative Study of Machine Learning Algorithms using the CICIoV2024 Dataset. Platform J. Sci. Technol. 2024, 7, 1–8. [Google Scholar] [CrossRef]
  11. Jin, F.; Chen, M.; Zhang, W.; Yuan, Y.; Wang, S. Intrusion detection on internet of vehicles via combining log-ratio oversampling, outlier detection and metric learning. Inf. Sci. 2021, 579, 814–831. [Google Scholar] [CrossRef]
  12. Gao, Y.; Wu, H.; Song, B.; Jin, Y.; Luo, X.; Zeng, X. A Distributed Network Intrusion Detection System for Distributed Denial of Service Attacks in Vehicular Ad Hoc Network. IEEE Access 2019, 7, 154560–154571. [Google Scholar] [CrossRef]
  13. Yang, L.; Moubayed, A.; Hamieh, I.; Shami, A. Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
  14. Zhang, T.; Zhu, Q. Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs. IEEE Trans. Signal Inf. Process. Netw. 2018, 4, 148–161. [Google Scholar] [CrossRef]
  15. Ullah, I.; Mahmoud, Q.H. A Technique for Generating a Botnet Dataset for Anomalous Activity Detection in IoT Networks. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 134–140. [Google Scholar] [CrossRef]
  16. Rosay, A.; Carlier, F.; Leroux, P. Feed-forward neural network for Network Intrusion Detection. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–6. [Google Scholar] [CrossRef]
  17. Lee, H.; Jeong, S.H.; Kim, H.K. OTIDS: A Novel Intrusion Detection System for In-vehicle Network by Using Remote Frame. In Proceedings of the 2017 15th Annual Conference on Privacy, Security and Trust (PST), Calgary, AB, Canada, 28–30 August 2017; pp. 57–5709. [Google Scholar] [CrossRef]
  18. Yu, T.; Hua, G.; Wang, H.; Yang, J.; Hu, J. Federated-LSTM based Network Intrusion Detection Method for Intelligent Connected Vehicles. In Proceedings of the ICC 2022—IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 4324–4329. [Google Scholar] [CrossRef]
  19. Liu, H.; Zhang, S.; Zhang, P.; Zhou, X.; Shao, X.; Pu, G.; Zhang, Y. Blockchain and Federated Learning for Collaborative Intrusion Detection in Vehicular Edge Computing. IEEE Trans. Veh. Technol. 2021, 70, 6073–6084. [Google Scholar] [CrossRef]
  20. Shu, J.; Zhou, L.; Zhang, W.; Du, X.; Guizani, M. Collaborative Intrusion Detection for VANETs: A Deep Learning-Based Distributed SDN Approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4519–4530. [Google Scholar] [CrossRef]
  21. Kumar, R.; Kumar, P.; Tripathi, R.; Gupta, G.P.; Kumar, N.; Hassan, M.M. A Privacy-Preserving-Based Secure Framework Using Blockchain-Enabled Deep-Learning in Cooperative Intelligent Transport System. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16492–16503. [Google Scholar] [CrossRef]
  22. Pour, M.S.; Mangino, A.; Friday, K.; Rathbun, M.; Bou-Harb, E.; Iqbal, F.; Samtani, S.; Crichigno, J.; Ghani, N. On data-driven curation, learning, and analysis for inferring evolving internet-of-Things (IoT) botnets in the wild. Comput. Secur. 2020, 91, 101707. [Google Scholar] [CrossRef]
  23. Almutlaq, S.; Derhab, A.; Hassan, M.M.; Kaur, K. Two-Stage Intrusion Detection System in Intelligent Transportation Systems Using Rule Extraction Methods From Deep Neural Networks. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15687–15701. [Google Scholar] [CrossRef]
  24. Driss, M.; Almomani, I.; e Huma, Z.; Ahmad, J. A federated learning framework for cyberattack detection in vehicular sensor networks. Complex Intell. Syst. 2022, 8, 4221–4235. [Google Scholar] [CrossRef]
  25. Korba, A.A.; Boualouache, A.; Ghamri-Doudane, Y. Zero-X: A Blockchain-Enabled Open-Set Federated Learning Framework for Zero-Day Attack Detection in IoV. IEEE Trans. Veh. Technol. 2024, 73, 12399–12414. [Google Scholar] [CrossRef]
  26. Xu, B.; Wang, B.; Chen, X.; Zhao, J.; He, G. A CGAN-based Few-shot Method for Zero-day Attack Detection in the Internet of Vehicles. In Proceedings of the 2023 Eleventh International Conference on Advanced Cloud and Big Data (CBD), Danzhou, China, 18–19 December 2023; pp. 98–103. [Google Scholar] [CrossRef]
  27. Yang, L.; Moubayed, A.; Shami, A. MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles. IEEE Internet Things J. 2022, 9, 616–632. [Google Scholar] [CrossRef]
  28. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers). Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  29. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.