Next Article in Journal
Explainable and Federated Recommender Systems: A Survey and Conceptual Framework for Trustworthy Personalization
Previous Article in Journal
A New Smith Predictor Controller Design Based on the Coefficient Diagram Method for Time-Delay Systems
Previous Article in Special Issue
A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Systematic Ablation Study of GAN-Based Minority Augmentation for Intrusion Detection on UWF-ZeekData22

1
Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA
2
Department of Mathematics and Statistics, The University of West Florida, Pensacola, FL 32514, USA
3
Department of Cybersecurity, The University of West Florida, Pensacola, FL 32514, USA
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(6), 1291; https://doi.org/10.3390/electronics15061291
Submission received: 24 February 2026 / Revised: 16 March 2026 / Accepted: 17 March 2026 / Published: 19 March 2026
(This article belongs to the Special Issue Intelligent Solutions for Network and Cyber Security)

Abstract

Generative adversarial networks (GANs) are increasingly applied to mitigate extreme class imbalance in intrusion detection systems, yet reported improvements often obscure role augmentation intensity and adversarial stability. This paper presents a controlled ablation study that isolates the impact of adversarial objective choice, augmentation ratio, and training duration on GAN-based minority data augmentation for highly imbalanced tabular cybersecurity data. Using the UWF-ZeekData22 dataset, nine MITRE ATT&CK tactic-versus-benign classification tasks are evaluated under augmentation ratios of 0.25 and 0.50 and training durations of 400 and 800 epochs. Four GAN variants—Vanilla GAN, Conditional GAN (cGAN), WGAN, and WGAN-GP—are assessed using stratified cross-validation and five classical classifiers representing diverse inductive biases. The results reveal consistent structural patterns. Moderate augmentation (r = 0.25) with controlled training (400 epochs) yields the most stable and reliable improvement in minority recall. Wasserstein-based objectives demonstrate superior stability under aggressive augmentation and prolonged training, while conditional GANs frequently exhibit recall collapse in ultra-sparse regimes. Increasing augmentation volume does not uniformly improve performance and may introduce distributional overlaps that degrade linear and margin-based classifiers. Tree-based classifiers remain largely invariant once sufficient minority density is achieved. These findings demonstrate that adversarial calibration is more important than architectural complexity for improving the detection of rare attacks. The study provides practical guidance for designing robust GAN-based augmentation pipelines under extreme cybersecurity class imbalance.

1. Introduction

Network intrusion detection systems (IDSs) increasingly rely on supervised machine learning models trained on network flow telemetry. However, in operational environments, such data are characterized by extreme class imbalance, in which benign traffic dominates, whereas many cyberattack behaviors occur rarely but pose high operational risk. As a result, classifiers trained on imbalanced data often achieve deceptively high overall accuracy while failing to detect rare attack tactics, yielding unacceptably high false-negative rates [1,2,3]. This phenomenon is closely related to the base-rate fallacy in intrusion detection, in which rare events are systematically misclassified despite high aggregate accuracy [4].
Numerous strategies have been proposed to mitigate class imbalance in intrusion detection, including cost-sensitive learning, ensemble methods, and data-level resampling techniques [1,2]. Classical oversampling approaches, such as SMOTE and its variants, generate synthetic minority samples via interpolation [5,6]. Most recently, generative adversarial networks (GANs) have attracted increasing attention due to their capacity to model learn complex minority-class distributions and generate synthetic samples for augmentation [7]. While several studies report improved detection performance using GAN-generated data [8,9], these improvements are often presented at an aggregated level, without isolating how adversarial objective design, augmentation intensity, and training dynamics influence downstream classifier behavior.
GAN-based augmentation is not a monolithic intervention. Its effectiveness depends on several interacting factors, including the adversarial loss formulation, augmentation ratio, and training duration. Theoretical and empirical studies have demonstrated that GAN stability and convergence behavior vary significantly across objective functions and training schemes [10,11,12,13,14,15]. In highly imbalanced tabular cybersecurity data, these factors can determine whether augmentation improves minority recall or instead introduces distributional overlap that degrades classifier generalization. However, most prior work evaluates GAN augmentation holistically, limiting insight into which design choices are critical for stability under extreme sparsity.
This paper presents a systematic ablation study of GAN-based minority data augmentation for intrusion detection. Rather than proposing new generative architectures, commonly used GAN variants—Vanilla GAN [7], Conditional GAN (cGAN) [16], Wasserstein GAN (WGAN) [10], and Wasserstein GAN with Gradient Penalty (WGAN-GP) [11]—are used as controlled experimental instruments to analyze adversarial stability under extreme class imbalance. Using the UWF-ZeekData22 dataset [17,18], multiple MITRE ATT&CK tactics are modeled as independent binary classification problems. This study evaluates how adversarial objective formulation, augmentation ratio (0.25 vs. 0.50), and training duration (400 vs. 800 epochs) influence downstream detection performance across five classical classifier families, including Support Vector Machines (SVM) [19], k-Nearest Neighbors (KNN) [20], Random Forests (RF) [21], and related statistical learning models [22,23].
Understanding these interactions is critical for both research reproducibility and operational deployment. In real-world IDS environments, excessive false positives impose high operational costs, while missed attacks pose serious security risks [3,4]. Uncontrolled or overly aggressive data augmentations can distort minority-class structure, increase class overlap, and degrade recall. By analyzing performance patterns consistently across adversarial objectives, augmentation intensities, and training durations, the study provides empirically grounded guidance for designing robust GAN-based augmentation pipelines for tabular cybersecurity data.
The main contributions of this work are as follows:
  • A structured ablation framework that isolates the impact of adversarial objective choice (Vanilla GAN, cGAN, WGAN, WGAN-GP) under extreme class imbalance.
  • A systematic evaluation of augmentation-ratio sensitivity (0.25 vs. 0.50) across multiple MITRE ATT&CK tactics and classifier families.
  • An empirical analysis of training-duration effects (400 vs. 800 epochs) on adversarial stability and downstream recall.
  • Identification of consistent failure modes—including conditional recall collapse and smoothing under heavy augmentation—that are obscure in aggregated performance reporting.
The remainder of this paper is organized as follows. Section 2 reviews related work on GAN variants, tabular GANs, and GAN-based approaches in cybersecurity. Section 3 describes the dataset and preprocessing steps. Section 4 presents the ablation dimensions used to analyze GAN-based data augmentation. Section 5 details the experimental design. Section 6 reports the results across GAN variants. Section 7 discusses the implications of the findings, Section 8 concludes the manuscript, and Section 9 provides directions for future research.

2. Related Works

2.1. Class Imbalance in Intrusion Detection

Class imbalance has long been recognized as a fundamental challenge in intrusion detection systems, where benign network traffic dominates and attack instances, particularly sophisticated or targeted behaviors, occur infrequently. Early work highlighted the base-rate fallacy, demonstrating that even detectors with high nominal accuracy can become ineffective in low-prevalence settings due to excessive false positives and missed detections [4]. Subsequent studies have consistently shown that standard supervised learning algorithms trained on imbalanced data tend to favor majority classes, resulting in poor recall for rare but operationally critical attacks [1,2].
To mitigate this issue, prior research has explored cost-sensitive learning, ensemble-based methods, and data-level resampling strategies [1,2]. While these approaches can improve minority-class detection in some scenarios, their effectiveness is often constrained by the heterogeneity, high dimensionality, and evolving nature of network traffic data. These limitations have motivated increasing interest in data-driven augmentation techniques that aim to enrich minority-class representation directly.

2.2. Oversampling and Synthetic Data Generation

Traditional oversampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants, generate synthetic samples through interpolation in feature space [5,24,25]. Although effective in certain domains, these methods may produce unrealistic or ambiguous samples when applied to complex, high-dimensional tabular data such as network telemetry, where feature dependencies, mixed data types, and discrete attributes are common, a limitation widely documented in imbalanced learning studies [6].
Several extensions to SMOTE have been proposed to improve synthetic sample generation in imbalanced datasets [26,27,28,29]. For example, Adaptive Synthetic Sampling (ADASYN) [29] focuses on generating synthetic samples in regions where the minority class is more difficult to learn, thereby adapting the distribution of generated samples based on the learning difficulty of minority instances. Another widely used technique is Borderline-SMOTE [30], which concentrates on minority samples located near the decision boundary between classes. Rather than oversampling the entire minority distribution, Borderline-SMOTE identifies minority instances that are surrounded by majority-class neighbors and generates synthetic samples in these critical boundary regions to strengthen class separation [30,31].
Additionally, hybrid techniques combining oversampling with outlier detection have been explored. One such approach integrates Local Outlier Factor (LOF) with SMOTE [26,27,28] to identify noisy or anomalous samples before generating synthetic data. This method, often referred to as SMOTE-LOF, attempts to improve the quality of generated samples by reducing the influence of outliers during the oversampling process [27,28].
While these interpolation-based techniques can improve classification performance in many imbalanced learning problems, they remain limited by their reliance on local linear relationships between samples. In cybersecurity contexts, naïve synthetic oversampling can inadvertently amplify noise, distort traffic semantics, or increase class overlap, thereby degrading classifier robustness. These limitations have prompted interest in generative models that can learn richer minority-class distributions. Among these, generative adversarial networks (GANs) have emerged as a flexible framework for data synthesis, offering the ability to approximate complex data manifolds through adversarial training between a generator and a discriminator [7].

2.3. GANs for Intrusion Detection and Data Augmentation

GANs have been applied to intrusion detection in multiple roles, including attack traffic generation, adversarial training, and minority-class data augmentation. Several studies report improvements in detection performance when GAN-generated samples are used to balance training data, particularly for rare or underrepresented attack categories [8,9].
Conditional GANs (cGANs) extend the original GAN formulation by incorporating label information to guide sample generation [16], enabling class-specific synthesis. Wasserstein-based objectives, including WGAN and WGAN-GP, have been proposed to improve training stability and convergence by formulating the adversarial loss and enforcing Lipschitz continuity constraints [10,11]. More broadly, theoretical and empirical analysis have highlighted convergence challenges and stability trade-offs in adversarial training [12,13].
Despite these reported gains, most GAN-based intrusion detection studies evaluate a single configuration or emphasize architectural novelty rather than methodological analysis. Performance is typically reported at an aggregate level, making it difficult to determine how individual pipeline choices—such as augmentation scale, training data composition, or regularization strategies—contribute to observed improvements. Consequently, the reproducibility and generalizability of GAN-based augmentation approaches in intrusion detection remain open questions.

2.4. Sensitivity and Failure Modes in GAN Training

Beyond cybersecurity, prior work has demonstrated that GAN performance is highly sensitive to hyperparameters, regularization schemes, and data composition. Large-scale empirical studies show that different GAN objectives can achieve comparable performance when carefully tuned, whereas poorly chosen configurations may lead to mode collapse, training instability, or the introduction of distributional artifacts [32]. Furthermore, analysis of GAN evaluation metrics further indicates that improvements in adversarial loss or visual fidelity do not necessarily translate into improved downstream task performance [33].
From a methodological standpoint, ablation studies are widely recognized as essential for isolating causal factors in machine learning systems and ensuring reliable empirical conclusions. However, systematic ablation analyses that disentangle adversarial objective choice, augmentation intensity, and training duration remain relatively limited in the context of tabular data and extreme class imbalance.

2.5. Positioning of This Study

In contrast to prior studies, this work treats commonly used GAN formulations—Vanilla GAN [7], cGAN [16], WGAN [10], and WGAN-GP [11]—as experimental instruments within a controlled ablation framework. The objective is not to identify a universally superior adversarial loss, but to analyze how key augmentation design choices behave across different GAN instantiations under extreme class imbalance.
By systematically isolating factors such as augmentation ratio, training data composition, conditioning mechanisms, discriminator regularization, and synthetic sample integration strategies, this study complements existing GAN-based intrusion detection research with a data-centric and methodological perspective. The resulting insights are intended to support reproducible experimentation and inform the practical deployment of GAN-based augmentation pipelines in intrusion detection systems.
To better contextualize the contribution of prior work on class-imbalance handling and GAN-based intrusion detection, Table 1 summarizes representative studies, their techniques, datasets, and primary contributions.

3. Dataset Characteristics and Preparation

3.1. Overview of UWF-ZeekData22

This study uses the UWF-ZeekData22 dataset [17,18], which consists of Zeek-derived network flow records annotated with MITRE ATT&CK tactic labels. The dataset is designed to reflect operational network environments and exhibits extreme class imbalance, with benign traffic and low-level reconnaissance activity dominating, while advanced attack behaviors occur rarely.
Such an imbalance poses a challenging setting for intrusion detection research and motivates the use of data-centric augmentation strategies to improve minority-class representation without distorting majority-class behavior.

3.2. Distribution of ATT&CK Tactics

Attack behaviors in the dataset are unevenly distributed across MITRE ATT&CK tactics. Reconnaissance-related activity accounts for the majority of malicious traffic [17,37]. Credential Access [38], Privilege Escalation [39], Exfiltration [40], Lateral Movement [41], Resource Development [42], Initial Access [43], Persistence [44], Defense Evasion [45], and Discovery [46] appear only infrequently.
Several tactics occur fewer than ten times across the entire dataset [17,18]. This extreme sparsity renders conventional supervised learning unreliable and provides a natural testbed for examining the behavior of GAN-based augmentation under severe data scarcity.

3.3. Class-Specific Data Construction

To enable controlled analysis, each ATT&CK tactic is treated independently and formulated as a binary classification problem. For a given tactic, all available samples corresponding to that tactic are retained as the minority class, while benign traffic constitutes the majority class.
GANs are trained exclusively on minority-class samples for each tactic. No benign data are used during generative training, ensuring that learned distributions reflect only the intrinsic characteristics of the targeted attack behavior. This class-specific design avoids contamination from majority-class patterns and enables consistent comparison across ablation dimensions.

3.4. Data Reduction Strategy

The complete UWF-ZeekData22 dataset comprises more than 18 million network flow records and exhibits extreme class imbalance across MITRE ATT&CK tactics [17,18]. Benign traffic and reconnaissance-related activity dominate the dataset, while several advanced attack tactics are represented by only a very small number of labeled samples.
To support repeated GAN training and systematic classifier evaluation under practical computational constraints, a reduced yet representative working dataset is constructed. All available samples corresponding to rare and underrepresented ATT&CK tactics are retained in full. To limit dataset scale while preserving realistic imbalance characteristics, the two dominant classes—benign traffic and reconnaissance activity—are downsampled using reproducible stratified sampling procedures.
This reduction strategy prevents high-frequency classes from overwhelming both generative and discriminative learning processes, while maintaining sufficient diversity to reflect operational network conditions. Because the objective of GAN-based augmentation in this study is to model minority-class feature distributions rather than leverage the absolute abundance of benign traffic, the resulting dataset remains highly imbalanced yet well-suited for controlled generative modeling and downstream evaluation. The final working dataset contains approximately 400,000 samples.

3.5. Feature Preprocessing and Representation

Prior to model training, features serving primarily as identifiers or contextual metadata (e.g., connection IDs, timestamps, IP addresses) are removed to reduce noise and dimensionality. Boolean attributes are converted to numeric form, and missing values are handled using feature-aware imputation strategies.
Categorical features are encoded using label-based encoding schemes that are applied consistently across real and synthetic data. Feature scaling is performed using Min–Max normalization to map all features to the range [−1, 1], a range commonly adopted to promote numerical stability in GAN training [7].
All preprocessing operations are performed exclusively within the training portion of each cross-validation fold, and the resulting transformations are applied unchanged to validation data to prevent information leakage.

3.6. Data Splits and Evaluation Context

For each ATT&CK tactic, GANs are trained using only minority-class samples from the training set. Synthetic samples are generated after GAN training is complete and are incorporated exclusively into training folds during downstream classifier evaluation.
Classifier performance is assessed using a consistent cross-validation pipeline to ensure reliable model evaluation and prevent optimistic bias [47]. Both generative and discriminative components are evaluated under identical data partitions and experimental conditions.

4. Ablation Dimensions

A GAN-based data augmentation pipeline comprises multiple interacting components whose individual effects are often obscured when models are evaluated solely in an end-to-end manner. In intrusion detection under extreme class imbalance, these interactions can substantially affect the stability of adversarial training, the quality of synthetic samples, and downstream classification performance. To disentangle these effects, we conduct a systematic ablation study in which a controlled set of architectural and training-related parameters is varied independently while all other factors are held constant.
Each ablation is instantiated using four widely adopted GAN formulations—Vanilla GAN, Conditional GAN (cGAN), Wasserstein GAN (WGAN), and Wasserstein GAN with Gradient Penalty (WGAN-GP)—which are treated as experimental instruments rather than competing proposals. Unless otherwise stated, all GAN variants share the same multilayer perceptron architecture (two hidden layers with 128 units each), optimizer configuration, and preprocessing pipeline to ensure comparability across ablation settings. Synthetic samples are generated exclusively from minority-class data and incorporated only into training folds during cross-validation to prevent data leakage and biased performance estimates [47].

4.1. Augmentation Ratio

The augmentation ratio determines the number of synthetic minority samples generated per unit of the majority-class size and serves as a primary mechanism for increasing minority-class density during training. For each ATT&CK tactic, the number of synthetic samples is defined as follows:
N s y n =   r   ·   | D m a j |
where N s y n denotes the number of generated synthetic minority samples, r is the augmentation ratio, and | D m a j | represents the size of the post-downsampling majority-class subset used in the final dataset.
Based on experimental design, augmentation ratios
r   ϵ   { 0.25 ,   0.50 }
are evaluated under corresponding moderate and aggressive augmentation regimes. These settings are selected to increase minority-class density from extremely sparse levels to levels sufficient for stable adversarial training, while enabling analysis of classifier robustness as the pressure from synthetic data increases. To prevent uncontrolled dataset growth, the number of synthetic samples per class is optionally capped at a specified maximum, ensuring that augmentation remains proportional and comparable across minority classes.

4.2. GAN Variant (Adversarial Objective)

A central ablation dimension in this study is the adversarial objective used to train the generator–discriminator pair. Four commonly used GAN formulations are evaluated:
  • Vanilla GAN, employing a standard minmax loss;
  • Conditional GAN (cGAN), incorporating label conditioning during generation;
  • Wasserstein GAN (WGAN), using the Wasserstein distance with weight clipping;
  • Wasserstein GAN with Gradient Penalty (WGAN-GP), enforcing the Lipschitz constraint via a gradient penalty.
These variants are not treated as competing models but as alternative instantiations through which the behavior of other ablation dimensions can be observed. This design allows assessment of whether observed effects are consistent across adversarial objectives or specific to particular formulations.

4.3. Training Duration

Training duration directly influences adversarial convergence behavior and the utility of generated samples. To examine sensitivity to training time, each GAN variant is trained for two epoch counts:
e p o c h s   ϵ   { 400 ,   800 }
The lower setting captures early or partial convergence behavior, while the higher setting reflects extended training that may improve distributional alignment or, alternatively, exacerbate issues such as mode collapse or over-smoothing depending on the adversarial objective [12,13]. This ablation evaluates whether increased training duration yields consistent downstream benefits across GAN variants and augmentation regimes.

4.4. Latent Space Configuration

The dimensionality of the latent space controls the diversity and expressiveness of generative samples. In this study, the dimensionality of the latent noise vector is fixed at
z = 32
providing sufficient representational capacity while avoiding unnecessary variability in the ablation analysis. Latent vectors are sampled from a standard normal distribution, and this configuration is held constant across all GAN variants and training settings to isolate the effects of other ablation dimensions.

4.5. Regularization and Network Stability

To promote stable adversarial training while maintaining comparability, fixed dropout rates are applied to both the generator and discriminator networks. Dropout is set to 0.3 for both components across all GAN variants and training configurations.
For Wasserstein-based models, additional stability mechanisms are applied in accordance with standard formulations. Weight clipping is used for WGAN [10], while WGAN-GP employs a gradient penalty with a coefficient λ G P = 10 [11]. All other architectural and optimization parameters, including batch size (64), critic update frequency, and learning rates, are held constant across ablation settings.
Table 2 summarizes the architectural configuration and training hyperparameters used across all GAN variants in the ablation study.

4.6. Data Scaling and Output Activation

All experiments employ a consistent feature-scaling strategy that maps numerical features to the range [−1, 1]. Generator output activation functions are selected to match this scaling regime, ensuring that synthetic samples lie within a valid feature range and can be integrated directly with real data during augmentation. This alignment between preprocessing and generative output supports stable adversarial training [7].

4.7. Rationale for the Ablation Design

By restricting the ablation space to experimentally grounded dimensions and evaluating them consistently across multiple GAN formulations, this study isolates the factors that most strongly influence the success or failure of GAN-based minority augmentation. Rather than optimizing for peak performance, the objective is to identify systematic patterns in classifier sensitivity, false-negative behavior, and robustness under increased augmentation and training pressure.
This controlled, data-centric approach supports reproducible experimentation and provides practical guidance for deploying GAN-based augmentation pipelines in tabular intrusion detection settings.

5. Experimental Methodology

This section describes the experimental methodology used to conduct a controlled ablation study of GAN-based minority data augmentation for intrusion detection. Rather than benchmarking models in an end-to-end manner, the methodology isolates the effects of individual architectural and training-related factors by systematically varying them while holding all other components constant. The experimental design emphasizes interpretability, reproducibility, and causal attribution of observed performance differences.

5.1. Binary Task Construction and Target Classes

Each experiment is formulated as a binary classification task, in which a single MITRE ATT&CK tactic is treated as the positive (minority) class, and all benign traffic (“none”) constitutes the negative (majority) class. The set of evaluated tactics includes Discovery, Credential Access, Privilege Escalation, Exfiltration, Lateral Movement, Resource Development, Defense Evasion, Initial Access, and Persistence [28,29,30,31,32,33,34,35,36]. These tactics are selected due to their extreme class imbalance and their operational relevance in network intrusion detection.
For each tactic, all available minority-class samples are retained, GAN-based augmentation is applied exclusively to the minority class, while majority-class samples are never used during generative training and are included only during downstream classifier evaluation.

5.2. GAN Training Protocol

GANs are trained once per configuration and per minority class using only the training data, rather than being retrained independently within each cross-validation fold. This design choice reduces stochastic variability and ensures that observed differences in downstream performance are attributable to ablation dimensions rather than repeated generative retraining.
All GAN variants share a common multilayer perceptron architecture consisting of two hidden layers with 128 units each. The complete architectural configuration and training hyperparameters used across the ablation experiments are summarized in Table 2. Training is performed using the Adam optimizer with a fixed learning rate and batch size. Architectural differences are limited to those required by the adversarial objective, such as conditioning mechanisms, weight clipping, or the enforcement of a gradient penalty. Training duration is treated as an explicit ablation factor, with GANs trained for either 400 or 800 epochs.

5.3. Synthetic Sample Generation and Augmentation Application

Synthetic minority samples are generated exclusively from the trained generator after GAN training is complete. For each MITRE ATT&CK tactic, the number of synthetic samples is determined using the augmentation ratio defined in Section 4.1, computed relative to the size of the post-downsampling majority-class subset.
In accordance with the ablation design, the augmentation ratios r   { 0.25 ,   0.50 } are evaluated under corresponding moderate and aggressive augmentation regimes. These ratios are applied consistently across GAN variants and minority classes to enable controlled comparison.
Synthetic samples are incorporated exclusively into the training portion of each dataset split during classifier evaluation. No synthetic data is included in validation folds, ensuring strict separation between training and evaluation data and preventing information leakage.

5.4. Cross-Validation and Classifier Evaluation

Downstream evaluation is performed using stratified cross-validation to assess classifier behavior under augmented and non-augmented conditions.
Five classical machine learning classifiers are evaluated for each ablation setting:
  • Logistic Regression [22];
  • Support Vector Machine (SVM) [19];
  • k-Nearest Neighbor (KNN) [20];
  • Decision Tree [22];
  • Random Forest [21].
Classifier hyperparameters are held constant across all experiments to prevent confounding effects. Identical cross-validation partitions are reused across ablation configurations, ensuring that the observed performance differences are attributable to augmentation and training factors rather than variations in data splits.

5.5. Performance Metrics

Classifier performance is evaluated using confusion-matrix-derived metrics, including accuracy, balanced accuracy, precision, recall, and macro-average F1-score. Emphasis is placed on recall and false-negative behavior, as false negatives correspond to undetected attack events in intrusion detection contexts [4].
Performance metrics are aggregated across cross-validation folds to provide stable estimates of classifier behavior. Classification outcomes are summarized using the confusion matrix
T P F N F P T N
where rows correspond to predicted labels and columns to true labels.

5.6. Reproducibility Controls

To ensure reproducibility, all sources of randomness are controlled by fixing random seeds across data preparation, GAN training, and classifier evaluation. Preprocessing pipelines, feature encoders, and normalization parameters are reused consistently across experiments, and identical ablation configurations are applied across all GAN variants and minority classes.

5.7. Computational Environment

All experiments were conducted using a Python-based (3.12) machine learning stack. GAN training and data generation are implemented in PyTorch (2.9.0), while classifier evaluation is performed using scikit-learn. GPU acceleration is used where available to improve computational efficiency but does not affect model architectures, optimization objectives, hyperparameter settings, or evaluation protocols.
When GPU resources are unavailable, equivalent CPU-based implementations are used. This study focuses on methodological sensitivity rather than computational performance; therefore, experimental outcomes are expected to be invariant to the underlying computing platform.

6. Results

This section presents a systematic comparison of four GAN architectures—Vanilla GAN, Conditional GAN (cGAN), Wasserstein GAN (WGAN), and Wasserstein GAN with Gradient Penalty (WGAN-GP)—across nine MITRE ATT&CK tactics, using the experimental framework described in Section 5. Results are evaluated through downstream classifier performance and confusion-matrix-based metrics to assess how different adversarial objectives and augmentation regimes influence minority-class detection under extreme class imbalance [3,48].
Performance is reported across five classifier families (Logistic Regression, SVM, KNN, Decision Tree, and Random Forest), with particular emphasis on recall and false-negative behavior. In intrusion detection contexts, false negatives correspond to undetected attack events and therefore represent a critical operational risk [4].

6.1. Ablation Study: GAN Variant, Augmentation Ratio, and Training Duration

Ablation analysis is essential for understanding how adversarial objectives and training dynamics influence both synthetic utility and downstream classifier behavior [32]. Rather than evaluating GAN-based augmentation as a monolithic intervention, this study isolates the effects of GAN formulation, augmentation ratio, and training duration under controlled experimental conditions.
Specifically, this analysis examines how different GAN variants—Vanilla GAN, cGAN, WGAN, and WGAN-GP—interact with two augmentation regimes ( r   = 0.25 and r   =   0.50 ) and two training durations (400 and 800 epochs) to influence classifier performance when trained on GAN-augmented data.
The design enables assessment of three central questions:
  • How adversarial objective choice influences a classifier’s ability to detect rare attack behavior under an increasing level of synthetic data augmentation,
  • Whether higher augmentation pressure consistently improves minority-class recall or introduces instability and false-positive trade-offs,
  • Whether stability-oriented formulations, such as Wasserstein-based losses and gradient penalty mechanisms [10,11], yield more robust classifier performance under aggressive augmentation regimes.
GANs are trained once per configuration and per minority class, and results are aggregated across stratified cross-validation folds using confusion-matrix-derived metrics [49]. The tables in this section enable detailed inspection of changes in true-positive sensitivity, false-negative rates, and robustness as functions of GAN variant, augmentation ratio, and training duration.

6.1.1. Effects of Augmentation Ratio

To assess sensitivity to synthetic data volume, classifier performance is compared under two augmentation regimes. The moderate setting ( r   = 0.25 ) increases minority-class density while maintaining a strong presence of real samples, whereas the aggressive setting ( r   = 0.50 ) introduces substantially higher synthetic pressure.
Comparing these regimes reveals how different GAN formulations and classifier families respond to increasing levels of synthetic minority data, highlighting conditions under which augmentation improves recall versus scenarios where excessive augmentation degrades discrimination by increasing class overlap.

6.1.2. Effects of Training Duration

To examine sensitivity to the duration of adversarial training, each GAN variant is trained for either 400 or 800 epochs. The lower setting captures early or partial convergence behavior, while the higher setting reflects extended training that may improve distributional alignment or, alternatively, exacerbate instability phenomena such as mode collapse or over-smoothing depending on the adversarial objective [12,33].
Analyzing performance across these settings provides insight into whether longer training consistently benefits minority-class detection or whether gains depend on the interaction between GAN formulation, augmentation pressure, and downstream classifier characteristics.

6.2. Per-Tactic Results

6.2.1. Discovery

Discovery [17,46] is a minority class in the UWF-ZeekData22 dataset, more prevalent than the other minority classes but still severely underrepresented relative to dominant traffic and constituting only a small fraction of overall network activity.
Ablation at r = 0.25 (400 Epochs)
Under a moderate augmentation ratio (r = 0.25) and 400 training epochs, classifier performance for the Discovery tactic is largely stable across GAN variants, including limited sensitivity to the adversarial objective at this augmentation level (Table 3). Observable differences are confined to linear classifiers, while non-linear and tree-based models exhibit near-identical behavior.
For logistic regression, differences across GAN variants are small but systematic. The Vanilla GAN yields 118 false negatives and 194 false positives, whereas cGAN reduces these to 76 and 132, respectively, indicating improved recall and precision. WGAN further reduces false positives to 77 with a comparable false-negative count (91), whereas WGAN-GP increases both false negatives (126) and false positives (167), consistent with mild over-smoothed effects.
SVM performance is effectively invariant across GAN variants, with zero false positives in all cases and false negatives tightly clustered around 2082–2087. KNN exhibits similarly stable behavior, with false negatives consistently around 632–633, indicating preservation of local neighborhood structure under moderate augmentation.
Tree-based classifiers achieve near-perfect performance across all variants. Decision Trees incur at most two or three false negatives for cGAN and WGAN-GP, with no false positives, while Random Forest produces zero false negatives and zero false positives in all cases.
Overall, results at r = 0.25 and 400 epochs show that conditional and Wasserstein objectives yield modest, classifier-specific benefits for linear models, primarily through false-positive reduction, while the impact of adversarial objective choice remains minimal for SVM, KNN, and tree-based classifiers under moderate augmentation.
Ablation at r = 0.50 (400 Epochs)
Increasing the ratio to r = 0.50 substantially increases the minority-class density for the Discovery tactic and yields consistently strong classifier performance across GAN variants (Table 4). At this augmentation level, sensitivity to adversarial objectives is further reduced.
For Logistic Regression, higher augmentation provides true-positive detection across all GAN variants. Differences between adversarial objectives are reflected in false-negative and false-positive behavior. Both cGAN and WGAN reduce false negatives relative to the Vanilla GAN, with WGAN achieving the lowest false-negative (66) and the false-positive (75) counts. In contrast, WGAN-GP exhibits increased false negatives (134) and false positives (276), consistent with mild degradation under aggressive augmentation.
SVM performance remains effectively invariant across all GAN variants, with zero false positives in all cases and false negatives confined to a narrow range of approximately 2081–2090, indicating minimal impact of increased augmentation on margin-based decision boundaries.
KNN behavior is identical across all GAN variants, with 631 false negatives and 188 false positives in each configuration. This uniformity suggests that once the minority density is achieved, local neighborhood structure stabilizes and becomes insensitive to adversarial training objectives.
Tree-based classifiers continue to exhibit near-perfect performance. Decision Trees incur at most one false negative for cGAN, WGAN, and WGAN-GP, with no false positives, while Random Forest achieves zero false negatives and zero false positives across all variants.
Overall, results at r = 0.50 and 400 epochs indicate that increased synthetic augmentation reduces sensitivity to GAN formulation. While cGAN and WGAN provide modest benefits for linear classifiers, class balancing is the dominant factor, with margin-, distance-, and tree-based classifiers exhibiting stable performance once sufficient minority representation is achieved.
Ablation at r = 0.25 (800 Epochs)
Extending training to 800 epochs at an augmentation ratio of r = 0.25 yields largely stable classifier performance for the Discovery tactic, with only modest sensitivity to the adversarial objective (Table 5). Overall trends closely mirror those observed at 400 epochs, indicating diminishing returns for extended training under moderate augmentation.
For Logistic Regression, differences across GAN variants remain small but systematic. False negatives range from 60 to 117, and false positives from 76 to 138, with WGAN achieving the lowest false-negative count (60). Vanilla GAN and cGAN exhibit higher false negatives, with WGAN-GP introducing additional false positives and false negatives, suggesting no additional benefit from gradient-penalty regularization under extended training.
SVM performance remains largely invariant, with false negatives clustered around 2081 for most variants and zero false positives in all but the cGAN case, which introduces three false positives. These results indicate that increasing training duration does not materially affect margin-based separability at this augmentation level.
KNN behavior is effectively identical across all GAN variants, with false negatives consistently between 632 and 636, and false positives fixed at 197. This uniformity suggests that local neighborhood structure stabilizes once sufficient minority density is achieved, regardless of the training duration or adversarial objective.
Tree-based classifiers continue to demonstrate near-perfect performance. Decision Trees incur at most two false negatives and two false positives for cGAN, while all other variants yield zero errors. Random Forest achieves zero false negatives and zero false positives across all configurations.
Overall, results at r = 0.25 and 800 epochs indicate that extending training beyond 400 epochs provides limited additional benefit. While WGAN offers a modest recall advantage for linear classifiers, class balancing remains the dominant factor; margin-, distance-, and tree-based classifiers exhibit stable performance once adequate minority representation is achieved.
Ablation at r = 0.50 (800 Epochs)
At a higher augmentation ratio (r = 0.50) and extended training (800 epochs), classifier behavior for the Discovery tactic is largely stable, with limited sensitivity to the adversarial objective (Table 6). Overall trends indicate diminishing returns for increasing both augmentation and training duration.
For Logistic Regression, variation across GAN variants is more pronounced than for other classifiers. The Vanilla GAN yields 135 false negatives and 182 false positives, whereas the cGAN improves recall and precision, yielding 70 false negatives and 129 false positives. WGAN and WGAN-GP show intermediate performance, with false negatives ranging from 108 to 124 and false positives ranging from 140 to 142, indicating a stable but less favorable outcome than cGAN.
SVM performance remains largely invariant, with false negatives clustered around 2081–2227 and zero false positives in all but the cGAN case, which introduces three false positives. KNN exhibits identical behavior across all GAN variants, with false negatives ranging from 631 to 636 and 188 false positives, confirming that the local neighborhood structure is unaffected by increased augmentation and extended training.
Tree-based classifiers continue to demonstrate near-perfect performance. Decision Trees incur at most two false negatives and no false positives, while Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.50 and 800 epochs indicate that increasing the augmentation ratio and training duration provide limited additional benefit. While cGAN offers modest gains for Logistic Regression through label conditioning, class balancing remains the dominant factor, and classifier choice outweighs GAN choice once sufficient minority representation is achieved.
Figure 1 provides a visual summary of recall performance for the Discovery tactic across GAN variants and training configurations. Decision Tree and Random consistently achieve near-perfect recall across all augmentation ratios and training durations, indicating strong robustness to the choice of the generative model. Logistic Regression and KNN also maintain high recall with minimal variations between GAN variants. In contrast, SVM exhibits lower recall across all configurations, suggesting greater sensitivity to class imbalance even after augmentation.

6.2.2. Credential Access

Credential Access [17,38] is one of the least represented classes in the UWF-ZeekData22 dataset, resulting in a significant imbalance compared with benign “none” traffic. The class’s small size makes it an ideal test case for evaluating the impact of GAN-based augmentation.
Ablation at r = 0.25 (400 Epochs)
At a moderate augmentation ratio (r = 0.25), the Credential Access results indicate that classifier performance shows greater sensitivity to the adversarial objective than observed for the Discovery tactic (Table 7), particularly for linear models.
For logistic regression, all GAN variants increase true-positive detections, but substantial differences emerge in false-negative and false-positive behavior. The Vanilla GAN yields 308 false negatives and 143 false positives, indicating residual class overlap. In contrast, WGAN achieves the best balance, reducing false negatives to 74 and false positives to four, while WGAN-GP exhibits similar performance (81 false negatives and six false positives). By comparison, cGAN performs poorly for this classifier, producing a very high false-negative count (845) despite a low false-positive count (13), indicating severe recall degradation under conditional generation. These results suggest that Wasserstein-based objectives provide more reliable augmentation for linear classifiers under extreme sparsity, whereas conditioning can distort the minority-class manifold in this setting.
SVM performance remains largely stable across GAN variants, with false negatives ranging from 31 to 43 and false positives remaining low (8–10) for Vanilla GAN, WGAN, and WGAN-GP. cGAN introduces a noticeably higher false-positive count (88), indicating increased boundary ambiguity despite maintaining high recall.
KNN behavior is similarly stable, with false negatives ranging from 10 to 14 and false positives from 12 to 14 across GAN variants, suggesting limited sensitivity to the choice of adversarial objective at this augmentation level.
Tree-based classifiers continue to exhibit near-perfect performance. Decision Trees incur at most three false negatives and two false positives for cGAN and only two false negatives for WGAN and WGAN-GP, while Random Forest achieves zero false negatives and zero false positives across all variants.
Overall, results at r = 0.25 and 400 epochs indicate that WGAN and WGAN-GP provide the cleanest augmentation for linear classifiers under extreme data sparsity, whereas cGAN may substantially degrade recall. For SVM, KNN, and tree-based models, performance remains consistently strong across GAN variants, indicating that, at moderate augmentation levels, class balancing outweighs differences in fine-grained adversarial objectives.
Ablation at r = 0.50 (400 Epochs)
Increasing the augmentation ratio to r = 0.50 further stabilizes classifier performance for Credential Access and reduces sensitivity to the adversarial objective for most classifiers (Table 8).
For Logistic Regression, all GAN variants achieve high true-positive counts, but differences persist in recall-precision trade-offs. The Vanilla GAN exhibits residual overlap (319 false negatives, 190 false positives). cGAN substantially degrades recall, producing 959 false negatives despite only 13 false positives. In contrast, WGAN yields the strongest balance, reducing false negatives to 68 and false positives to five, while WGAN-GP performs slightly worse (96 false negatives, nine false positives) but still clearly outperforms Vanilla GAN and cGAN. These results indicate that at higher augmentation levels, Wasserstein-based objectives provide the most reliable linear separability, whereas conditional generation can severely suppress recall.
SVM performance remains largely invariant across GAN variants, with false negatives confined to a narrow range (31–48) and low false-positive counts (8–10), indicating minimal sensitivity to GAN choice once sufficient minority density is achieved. KNN exhibits similarly stable behavior, with false negatives ranging from eight to 13 and false positives ranging from 13 to 15, showing no systematic dependence on the adversarial objective.
Tree-based classifiers continue to demonstrate near-perfect performance. Decision Trees incur at most four false negatives and two false positives (cGAN), while WGAN and WGAN-GP introduce only a single false negative with no false positives. Random Forest achieves zero false negatives and zero false positives across all variants.
Overall, results at r = 0.50 and 400 epochs indicate that increasing the augmentation ratio improves stability and reduces sensitivity to GAN architecture for most classifiers. WGAN continues to provide the cleanest augmentations for Logistic Regression, while for SVM, KNN, and tree-based models, class balancing dominates, and differences among GAN variants largely vanish.
Ablation at r = 0.25 (800 Epochs)
Extending GAN training to 800 epochs at an augmentation ratio of r = 0.25 exposes pronounced differences in stability across GAN variants for Credential Access tactic, particularly for linear and distance-based classifiers (Table 9).
In logistic regression, extended training amplifies divergence across adversarial objectives. The Vanilla GAN degrades substantially, producing 669 false negatives and 742 false positives, whereas the cGAN collapses in recall, yielding 4229 false negatives despite only eight false positives. In contrast, Wasserstein-based models remain stable: WGAN yields 218 false negatives and 33 false positives, and WGAN-GP further improves stability with 178 false negatives and only seven false positives, achieving the lowest combined error among all variants.
SVM performance remains stable for Vanilla GAN, WGAN, and WGAN-GP, with false negatives between 31 and 40 and false positives between eight and 12. However, cGAN exhibits instability, producing 936 false negatives and 70 false positives, indicating sensitivity to extended conditional training. A similar pattern is observed for KNN: Vanilla GAN, WGAN, and WGAN-GP maintain low error rates (12–19 false negatives, 7–14 false positives), whereas cGAN introduces substantial errors (433 false negatives, 33 false positives), suggesting distortion of the local neighborhood structure.
Tree-based classifiers remain robust across all GAN variants. Decision Trees incur at most two false negatives and no false positives, while Random Forest achieves zero false negatives and zero false positives in all cases.
Overall, results at r = 0.25 and 800 epochs demonstrate that extended training can destabilize Vanilla GAN and cGAN under moderate augmentation, particularly for linear and distance-based classifiers. In contrast, Wasserstein-based objectives remain robust, with WGAN-GP providing the most stable performance for linear models. These findings indicate that under moderate augmentation, architectural stability and regularization are more critical than increasing training duration, and that prolonged training without appropriate constraints can be detrimental.
Ablation at r = 0.50 (800 Epochs)
When both the augmentation ratio and the training duration are increased (r = 0.50, epochs = 800), pronounced differences in stability emerge across GAN variants for the Credential Access tactic (Table 10).
For Logistic Regression, the choice of adversarial objective has a strong impact. The Vanilla GAN becomes unstable, producing 779 false negatives and 928 false positives, whereas the cGAN collapses in recall, yielding 6210 false negatives despite only 18 false positives. In contrast, Wasserstein-based models remain stable: WGAN yields 220 false negatives and 15 false positives, and WGAN-GP further improves performance with 197 false negatives and only seven false positives, achieving the strongest recall-precision balance among all variants.
SVM performance remains stable for Vanilla GAN, WGAN, and WGAN-GP, with false negatives between 31 and 38 and false positives between eight and 12. However, cGAN again exhibits instability, producing 1595 false negatives and 181 false positives, indicating the sensitivity of margin-based classifiers to extended conditional training at higher augmentation ratios.
A similar pattern is observed for KNN. Vanilla GAN, WGAN, and WGAN-GP maintain low error rates (8–10 false negatives, 8–14 false positives), whereas cGAN degrades substantially, producing 445 false negatives and 93 false positives, suggesting disruption of local neighborhood structure under prolonged conditional generation.
Tree-based classifiers remain fully invariant. Decision Tree and Random Forest achieve zero false negatives and zero false positives across all GAN variants, confirming robustness to augmentation volume, training duration, and adversarial objectives once sufficient minority representation is achieved.
Overall, results at r = 0.50 and 800 epochs show that extended training under high augmentation can destabilize Vanilla GAN and cGAN, particularly for linear and distance-based classifiers. In contrast, Wasserstein-based objectives, especially WGAN-GP, provide superior stability and recall due to gradient-penalty regularization. In this setting, architectural robustness outweighs augmentation volume in determining downstream classification performance.
Figure 2 summarizes recall performance for the Credential Access task across GAN variants and experimental configurations. Decision Tree and Random Forest maintain near-perfect recall across all settings, indicating strong robustness to the choice of generative model. KNN and SVM also achieve consistently high recall with minimal variation across GAN variants. In contrast, Logistic Regression shows a noticeable decrease in recall when using cGAN, particularly at higher augmentation ratios, suggesting increased sensitivity to synthetic-sample distribution.

6.2.3. Privilege Escalation

Privilege Escalation [17,39] is another minor class in the UWF-ZeekData22 dataset that exhibits severe class imbalance, representing an exceedingly small share of overall traffic.
Ablation at r = 0.25 (400 Epochs)
Under the moderate augmentation ratio (r = 0.25) and 400 training epochs, classifier performance for the Privilege Escalation tactic remains highly stable across all GAN variants, with only minor variations in misclassification counts (Table 11). Overall, sensitivity to the adversarial objective is limited at this augmentation level.
For Logistic Regression, all GAN variants achieve near-perfect classification. The Vanilla GAN yields five false negatives and one false positive, while WGAN and WGAN-GP further reduce false positives to zero, with seven false negatives each. The cGAN introduces a slight increase in false negatives (nine) but maintains zero false positives. Overall, differences among variants are minimal, indicating that linear separability is largely preserved across all adversarial objectives at this ratio.
For SVM, performance is similarly consistent across GAN variants. The Vanilla GAN produces 53 false negatives and eight false positives, while cGAN reduces false negatives to 31 and false positives to 10. WGAN and WGAN-GP further reduce false negatives to 11 and false positives to three. These results indicate a slight numerical advantage for Wasserstein-based objectives, although the absolute differences are small.
For KNN, classification outcomes are nearly identical across all GAN variants. False negatives range from nine to 12, and false positives from five to nine, indicating that distance-based classification is largely insensitive to the adversarial objective at this level of augmentation.
Tree-based classifiers demonstrate near-perfect robustness. Decision Trees incur at most one false negative with no false positives, while Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.25 and 400 epochs indicate that Privilege Escalation detection is largely insensitive to GAN formulation once modest augmentation is applied. Performance is dominated by class balancing effects, with all classifier families exhibiting stable and reliable behavior across adversarial objectives.
Ablation at r = 0.50 (400 Epochs)
When the augmentation ratio is increased to r = 50 while maintaining 400 training epochs, classifier performance of the Privilege Escalation tactic remains highly stable across GAN variants, with only minor fluctuations in misclassification counts (Table 12).
For Logistic Regression, all GAN variants achieve near-perfect performance. The Vanilla GAN produces six false negatives and no false positives, whereas the cGAN slightly increases the number of false negatives to 12 while maintaining zero false positives. WGAN and WGAN-GP exhibit similarly low false-negative counts (eight and nine, respectively); however, WGAN introduces a small number of false positives (eight), whereas WGAN-GP maintains none. Overall, doubling the augmentation ratio does not meaningfully affect linear separability for this tactic.
SVM behavior is consistent across adversarial objectives. False negatives decrease from 68 (Vanilla GAN) to 52 (cGAN) and further to 11 for both WGAN and WGAN-GP, with corresponding false positives reduced to three for Wasserstein-based objectives. While this reflects a slight numerical advantage for WGAN and WGAN-GP, absolute differences remain small.
KNN performance is similarly stable, with false negatives between five and nine and false positives between five and 12 across all GAN variants, indicating preservation of local neighborhood structure even under heavier augmentation.
Tree-based classifiers remain near-perfect. Decision Trees incur at most one false negative, with no false positives, while Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.50 and 400 epochs confirm that Privilege Escalation classification is robust to increased oversampling. Although Wasserstein-based objectives yield marginal improvements for SVM, class balancing remains the dominant factor, and all classifiers, particularly tree-based models, maintain stable, near-perfect performance.
Ablation at r = 0.25 (800 Epochs)
Extending training to 800 epochs while maintaining a moderate augmentation ratio (r = 0.25) results in largely stable classifier performance for the Privilege Escalation tactic, with only modest architecture-dependent differences (Table 13).
For Logistic Regression, all variants remain near-perfect. The Vanilla GAN produces seven false negatives and 21 false positives, while WGAN and WGAN-GP exhibit lower combined error (five false negatives with eight false positives for WGAN; eight false negatives with three false positives for WGAN-GP). In contrast, cGAN increases false negatives to 48 despite low false positives (19), indicating reduced recall under prolonged conditional training.
SVM shows a similar pattern. The Vanilla GAN yields 26 false negatives and 13 false positives, whereas WGAN and WGAN-GP reduce false negatives to 12 and 11, respectively, with only three false positives. cGAN again degrades performance, producing 224 false negatives, reflecting extended conditional training.
KNN performance remains stable for Vanilla GAN, WGAN, and WGAN-GP (6–13 false negatives, 5–9 false positives), while cGAN introduces a higher ratio of false-negative count (26), suggesting distortion of local neighborhood structure.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most one false negative with no false positives, and Random Forest achieves zero false negatives and zero false positives across all variants.
Overall, results at r = 0.25 and 800 epochs indicate that extended training provides modest benefits for Wasserstein-based objectives, particularly for linear and margin-based classifiers, while cGAN shows reduced recall with prolonged training. As in previous configurations, tree-based models remain insensitive to GAN formulation.
Ablation at r = 0.50 (800 Epochs)
Under the most aggressive configuration (r = 0.50, 800 epochs), overall performance on Privilege Escalation remains strong, but clearer architectural differences emerge between linear and margin-based classifiers (Table 14).
For Logistic Regression, all variants remain near-perfect. The Vanilla GAN produces seven false negatives and eight false positives, while cGAN increases false negatives to 43 with the same false-positive count. In contrast, WGAN and WGAN-GP maintain the strongest stability, with seven and 10 false negatives, respectively, and zero false positives, indicating improved preservation of minority-class structure under heavy oversampling.
SVM differences are more pronounced. The Vanilla GAN yields 46 false negatives and 13 false positives, whereas the cGAN degrades substantially, yielding 349 false negatives. By comparison, WGAN and WGAN-GP remain stable, each producing only 11 false negatives and three false positives, confirming the robustness of Wasserstein-based objectives under prolonged training and higher augmentation.
KNN behavior remains consistent for Vanilla GAN, WGAN, and WGAN-GP (5–9 false negatives, 5–12 false positives), while cGAN increases false negatives to 29, suggesting distortion of neighborhood structure during extended conditional training.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most two false negatives with no false positives, and Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.50 and 800 epochs indicate that high augmentation pressure amplifies architectural differences. Conditional GANs show instability under prolonged training, particularly for linear and margin-based classifiers, whereas WGAN and WGAN-GP consistently provide the most stable and reliable augmentation. These findings reinforce the suitability of Wasserstein-based objectives for aggressive oversampling regimes.
Figure 3 summarizes recall performance for the Privilege Escalation task across GAN variants and experimental configurations. Decision Tree, Random Forest, Logistic Regression, and KNN consistently achieve near-perfect recall across all settings, indicating strong separability of this attack class. SVM shows slightly lower recall in some configurations, particularly with cGAN, suggesting higher sensitivity to the distribution of generated samples.

6.2.4. Exfiltration

The Exfiltration [17,40] class represents malicious behavior in which threat actors extract sensitive data from the network. Although this class is extremely rare in the UWF-ZeekData22 dataset, Exfiltration exhibits clear behavioral patterns that can be effectively learned when GAN augmentation provides sufficient synthetic data.
Ablation at r = 0.25 (400 Epochs)
Under a moderate augmentation (r = 0.25) and 400 epochs, Exfiltration detection remains strong across all GAN variants, with only minor differences in misclassification patterns (Table 15).
For logistic regression, performance is near-perfect for all variants. The Vanilla GAN yields seven false negatives and six false positives, while cGAN reduces false negatives to five with zero false positives. WGAN further reduces false negatives to four but increases false positives to 23, and WGAN-GP shows a similar pattern (five false negatives, 30 false positives), indicating slightly broader synthetic coverage under Wasserstein-based training.
SVM performance is similarly stable. The Vanilla GAN produces seven false positives and seven false negatives, while WGAN and WGAN-GP reduce false negatives to six with only three false positives. cGAN increases false negatives to 20, though false positives remain low (six), suggesting modest sensitivity to conditional generation.
KNN results follow a comparable trend. The Vanilla GAN yields five false negatives and seven false positives, while WGAN and WGAN-GP reduce false negatives to three with five false positives. cGAN slightly increases false positives (13), though overall differences remain small.
Tree-based classifiers demonstrate near-perfect robustness. Decision Trees incur at most two false negatives and a single false positive, while Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.25 and 400 epochs indicate that all GAN variants are effective for Exfiltration detection, with only incremental differences. Wasserstein-based objectives offer modest reductions in false negatives for SVM and KNN, while cGAN slightly improves precision for Logistic Regression. However, performance is driven primarily by class balancing rather than by adversarial objective selection, particularly for tree-based models.
Ablation at r = 0.50 (400 Epochs)
At a higher augmentation ratio (r = 0.50) and 400 epochs, Exfiltration detection remains strong across all classifiers, with only minor differences among GAN variants (Table 16).
For logistic regression, all GAN variants achieve near-perfect performance. The Vanilla GAN, WGAN, and WGAN-GP each produce 6 false negatives, while cGAN reports seven. False positives remain low, ranging from 10 (WGAN) to 31 (Vanilla GAN), indicating that increased augmentation does not substantially degrade linear separability.
SVM results are similarly consistent. The Vanilla GAN, WGAN, and WGAN-GP each yield six false negatives, with three for Wasserstein-based models compared to seven for the Vanilla GAN. In contrast, cGAN increases false negatives to 29 and false positives to nine, suggesting reduced margin separability under conditional generation.
KNN performance remains stable across variants. WGAN and WGAN-GP achieve the lowest false-negative counts (three) with five false positives, while Vanilla GAN and cGAN show slightly higher error rates. These differences are modest but indicate a marginally improved preservation of neighborhoods under Wasserstein objectives.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most two false negatives and three false positives across variants, while Random Forest achieves zero false negatives and zero false positives in all cases.
Overall, results at r = 0.50 and 400 epochs indicate that increasing the augmentation provides limited additional benefit for Exfiltration detection. Differences among GAN variants are subtle, with Wasserstein-based objectives showing slightly cleaner behavior for SVM and KNN, while cGAN exhibits mild recall degradation. Tree-based classifiers remain largely insensitive to GAN choice.
Ablation at r = 0.25 (800 Epochs)
When training is extended to 800 epochs at a moderate augmentation ratio (r = 0.25), clearer differences in stability emerge across GAN variants for the Exfiltration tactic (Table 17).
For Logistic Regression, the Vanilla GAN maintains strong performance (seven false negatives, seven false positives), whereas cGAN shows degraded recall with 55 false negatives. WGAN improves recall (10 false negatives), and WGAN-GP achieves the lowest false-negative count (four), but introduces more false positives (28), indicating a recall-precision trade-off under prolonged gradient-penalized training.
SVM results further emphasize the advantage of Wasserstein-based objectives. The Vanilla GAN produces 61 false negatives, and the cGAN increases this to 182. In contrast, WGAN and WGAN-GP reduce false negatives to 20 and seven, respectively, with only three false positives each, demonstrating improved margin stability during extended training.
KNN behavior remains generally stable, though Wasserstein-based models again show the low false-negative counts. The Vanilla GAN and cGAN produce 13 and 25 false negatives, respectively, while WGAN reduces this to nine, and WGAN-GP to four, indicating better preservation of local neighborhood structure.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most three false negatives and one false positive, and Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.25 and 400 epochs indicate that extended training primarily benefits Wasserstein-based objectives, particularly for recall-sensitive linear and margin-based classifiers. WGAN-GP provides the strongest recall but may introduce additional false positives, whereas WGAN offers a more balanced trade-off. In contrast, cGAN exhibits instability under prolonged training.
Ablation at Ratio = 0.50 (Epochs = 800)
Under the most aggressive configuration (r = 0.50, 800 epochs), architectural differences become more pronounced for Exfiltration, particularly in linear and margin-based classifiers (Table 18).
For Logistic Regression, the Vanilla GAN remains stable (six false negatives and three false positives), whereas cGAN shows clear recall degradation with 55 false negatives. WGAN improves recall (10 false negatives), and WGAN-GP achieves the lowest false-negative count (four) but introduces more false positives (25), indicating a recall-precision trade-off under gradient-penalized training.
SVM differences are substantial. The Vanilla GAN yields 83 false negatives, whereas the cGAN yields 313 false negatives. In contrast, WGAN and WGAN-GP maintain strong stability, producing 23 and seven false negatives, respectively, with only three false positives each. This confirms the robustness of Wasserstein-based objectives under heavy augmentation and extended training.
KNN shows a similar pattern. The Vanilla GAN produces three false negatives and 11 false positives, whereas the cGAN increases the number of false negatives to 44. WGAN and WGN-GP maintain lower false-negative counts (six and three, respectively), indicating better preservation of neighborhood structure.
Tree-based classifiers remain invariant. Decision Trees incur at most one false negative, and no false positives, and Random Forests achieve zero false negatives and zero false positives in all variants.
Overall, results at r = 0.50 and 800 epochs indicate that high augmentation pressure amplifies architectural differences. Conditional GANs underperform consistently due to recall degradation, while WGAN and WGAN-GP provide the most stable augmentation. Although WGAN-GP achieves the highest recall, it may introduce a modest increase in false positives. As observed throughout this study, tree-based classifiers remain largely insensitive to the choice of GAN once adequate minority representation is achieved.
Figure 4 summarizes recall performance for the Exfiltration detection task across GAN variants and experimental configurations. All classifiers achieve near-perfect across all settings, indicating that the Exfiltration class is highly separable in the feature space. Minor variations are observed for SVM when using the cGAN variant, but overall classifier performance remains stable regardless of the augmentation ratio or training duration.

6.2.5. Lateral Movement

Lateral Movement [17,41] is one of the most underrepresented classes in the UWF-ZeekData22 dataset, characterized by an exceptionally small number of instances that represent only a negligible fraction of the overall traffic.
Ablation at r = 0.25 (400 Epochs)
Under moderate augmentation (r = 0.25) and 400 training epochs, Lateral Movement exhibits greater sensitivity to adversarial objective choice than previously analyzed tactics, particularly for linear classifiers (Table 19).
For Logistic Regression, differences across GAN variants are pronounced. The Vanilla GAN produces 84 false negatives and 345 false positives, indicating substantial class overlap. cGAN collapse recall increasing false negatives to 711 despite reducing false positives to nine. In contrast, WGAN and WGAN-GP provide the most balanced performance, reducing false negatives to 74 and 62, respectively, with low false positives (11 and seven), highlighting the stabilizing effect of Wasserstein-based objectives on linear decision boundaries under extreme sparsity.
SVM performance remains strong overall, but Wasserstein variants again show improved stability. The Vanilla GAN yields 30 false negatives and 12 false positives, while WGAN and WGAN-GP reduce false negatives to four, with 10 and six false positives, respectively. cGAN slightly reduces the false-negative rate to 23 but does not outperform Wasserstein-based methods.
KNN remains relatively robust, though small recall variations appear. The Vanilla GAN produces zero false negatives but 12 false positives, whereas cGAN, WGAN, and WGAN-GP introduce minor false-negative counts (7–9) with comparable false positives, indicating modest sensitivity to synthetic distribution changes.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most one false negative with no false positives, and Random Forest achieves zero false negatives and zero false positives across all GAN variants.
Overall, results at r = 0.25 and 400 epochs indicate that Lateral Movement detection is more sensitive to the choice of adversarial objective than earlier tactics. cGAN exhibits severe recall degradation under extreme sparsity, whereas WGAN and WGAN-GP consistently provide the most stable augmentation. As in prior cases, tree-based models remain robust once moderate class balancing is achieved.
Ablation at r = 0.50 (400 Epochs)
Increasing the augmentation ratio to r = 0.50 while maintaining 400 training epochs amplifies architectural differences for Lateral Movement, particularly for linear classifiers (Table 20).
For Logistic Regression, divergence across GAN variants is observed. The Vanilla GAN produces 88 false negatives and 363 false positives, indicating substantial class overlap under heavier oversampling. cGAN collapses recall, increasing false negatives to 917 despite reducing false positives to 62. In contrast, WGAN and WGAN-GP provide the most stable balance, reducing false negatives to 69 and 62, respectively, with low false positives (13 and seven), demonstrating improved linear separability under the Wasserstein-based objective.
SVM performance remains strong overall, but Wasserstein variants again show a clear advantage. The Vanilla GAN yields 36 false negatives and 13 false positives, and cGAN performs similarly (37 false negatives, nine false positives). WGAN and WGAN-GP substantially reduce false negatives to four and eight, respectively, with 10 and eight false positives, indicating tighter margin separation under Wasserstein training.
KNN exhibits modest sensitivity, with false negatives ranging from seven to 10 and false positives ranging from nine to 14. WGAN achieves the lowest false-negative count, though overall differences remain limited.
Tree-based classifiers remain largely robust. Decision Trees incur at most one to two false positives with no false negatives, and Random Forest maintains a consistent 2 false negatives with zero false positives across all variants
Overall, results at r = 0.50 and 400 epochs confirm that Lateral Movement detection is particularly sensitive to adversarial objective choice for linear classifiers. cGAN exhibits severe recall degradation, whereas WGAN and WGAN-GP consistently provide the most stable performance. Tree-based models remain comparatively robust once sufficient minority representation is achieved.
Ablation at r = 0.25 (800 Epochs)
Extending training to 800 epochs at a moderate augmentation ratio (r = 0.25) amplifies architectural differences for Lateral Movement, particularly for linear and margin-based classifiers (Table 21).
In logistic regression, instability becomes pronounced. The Vanilla GAN produces 321 false negatives and 371 false positives, while the cGAN collapses in recall with 2919 false negatives. In contrast, WGAN and WGAN-GP substantially improve stability, reducing false negatives to 112 and 84, respectively. WGAN maintains a cleaner balance (18 false positives) compared to WGAN-GP (138 false positives), indicating that gradient-penalized training stability broadens synthetic support under prolonged training.
SVM results further highlight the advantage of Wasserstein-based objectives. The Vanilla GAN yields 61 false negatives, and the cGAN deteriorates sharply with 669 false negatives. WGAN achieves near-ideal separation (two false negatives, two false positives), while WGAN-GP remains stable (three false negatives, 10 false positives), confirming superior margin preservation under Wasserstein training.
KNN shows a similar pattern. The Vanilla GAN produces eight false negatives and four false positives, whereas cGAN introduces substantial degradation with 280 false negatives. WGAN and WGAN-GP maintain low false-negative counts (7–9) with modest false positives, indicating stronger neighborhood consistency.
Tree-based classifiers remain effectively invariant. Decision Trees incur at most one false negative, and Random Forest achieves zero false negatives and zero false positives across all variants.
Overall, results at r = 0.25 and 800 epochs demonstrate that extended training magnifies architectural stability differences for Lateral Movement detection. cGAN consistently underperforms due to severe recall degradation, whereas WGAN and WGAN-GP provide substantially more stable augmentation. As previously observed, tree-based models remain robust, whereas adversarial objective choice primarily affects linear and margin-based models during prolonged training.
Ablation at r = 0.50 (800 Epochs)
Under the most aggressive configuration (r = 0.50, 800 epochs), architectural differences are strongly amplified for Lateral Movement, particularly for linear and margin-based classifiers (Table 22).
For Logistic Regression, instability becomes pronounced. The vanilla GAN produces 205 false negatives and 633 false positives, while the cGAN collapses in recall with 4353 false negatives. In contrast, Wasserstein-based models improve stability: WGAN reduces false negatives to 165 but still introduces 277 false positives, whereas WGAN-GP achieves the strongest balance with 120 false negatives and 45 false positives, indicating superior stability under heavy augmentation and prolonged training.
SVM results show even clearer separation. The Vanilla GAN yields 74 false negatives, whereas cGAN degrades sharply, yielding 1145 false negatives and 166 false positives. By comparison, WGAN and WGAN-GP achieve near-ideal performance, with only 2–3 false negatives and 10–12 false positives, confirming strong margin preservation of local neighborhood structure.
KNN follows a similar trend. The Vanilla GAN produces 16 false negatives, and the cGAN increases this to 246. WGAN and WGAN-GP maintain low false-negative counts (5–6) with moderate false positives (12), indicating better preservation of local neighborhood structure.
Tree-based classifiers remain largely invariant. Decision Trees achieve zero false negatives and false positives for most variants, with at most one false positive under cGAN, and Random Forest maintains zero false negatives and only two false positives across all configurations.
Overall, results at r = 0.50 and 800 epochs demonstrate that aggressive augmentation strongly penalizes Vanilla GAN and cGAN in extremely sparse regimes. In contrast, WGAN and particularly WGAN-GP provide the most stable and balanced augmentation for Lateral Movement detection. As observed throughout the study, tree-based classifiers remain relatively insensitive when sufficient minority samples are present.
Figure 5 summarizes recall performance for the Lateral Movement detection task across GAN variants and experimental configurations. Decision Tree and Random Forest maintain perfect or near-perfect recall across all settings, indicating strong robustness to the augmentation strategy. KNN and SVM also achieve consistently high recall with minimal variation across GAN variants. Logistic Regression shows a noticeable reduction in recall when using the cGAN variant, suggesting sensitivity to the distribution of generated samples.

6.2.6. Resource Development

Resource Development [17,42] is another rare minority class in the UWF-ZeekData22 dataset, with only a handful of observed instances, representing a small portion of the overall network traffic and highlighting the extreme class imbalance.
Ablation at r = 0.25 (400 Epochs)
Under moderate augmentation (r = 0.25) and 400 training epochs, Resource Development shows generally stable behavior across GAN variants, though linear and margin-based classifiers remain sensitive to adversarial objective choice (Table 23).
For Logistic Regression, differences are pronounced. The Vanilla GAN produces 116 false negatives and 12 false positives, whereas the cGAN reduces recall to 879 false negatives. In contrast, WGAN and WGAN-GP substantially improve stability, reducing false negatives to 55 and 51, respectively, with low false positives (12 and 10), demonstrating improved linear separability under Wasserstein-based objectives.
SVM follows a similar trend. The Vanilla GAN yields 12 false negatives and eight false positives, whereas the cGAN increases the number of false negatives to 26. WGAN and WGAN-GP achieve the lowest misclassification, reducing false negatives to four and three, respectively, with 10 false positives, indicating tighter margin separation under Wasserstein training.
KNN performance remains stable across variants, with false negatives between three and seven and false positives between nine and 10, suggesting minimal sensitivity to adversarial objective choice.
Tree-based classifiers remain largely invariant. Decision Trees incur at most 2–3 false negatives with no false positives, and Random Forest reports zero false positives and only 1–3 false negatives across all variants.
Overall, results at r = 0.25 and 400 epochs indicate that Resource Development detection benefits from Wasserstein-based augmentation for linear and margin-based classifiers, while cGAN shows notable recall degradation. As observed with earlier tactics, tree-based models remain comparatively insensitive once moderate class balancing is achieved.
Ablation at r = 0.50 (400 Epochs)
When the augmentation ratio increases to r = 0.50 (Table 24), architectural differences become more pronounced, particularly for linear classifiers.
In logistic regression, instability is evident for simpler objectives. The Vanilla GAN produces 145 false negatives (FN) and 13 false positives (FP), while cGAN collapses recall with 1021 FN. In contrast, WGAN and WGAN-GP substantially improve stability, reducing FN to 56 and 72, respectively, with low FP (13 and 12). These results confirm that Wasserstein-based objectives better preserve linear separability under heavier sampling.
For SVM, performance remains strong but again favors Wasserstein variants. The Vanilla GAN yields 12 FN and 12 FP, and the cGAN increases FN to 28. WGAN and WGAN-GP achieve the lowest FN (five and three) with 10 FP, indicating tighter margin separation.
KNN remains largely unaffected, with FN between five and seven and FP between eight and 12 across all variants, suggesting minimal sensitivity to adversarial objective choice.
Tree-based models remain invariant. Decision Tree achieves perfect classification across all GANs, while Random Forest reports zero FP and only 1–2 FN in all cases.
Overall, at r = 0.50, increasing augmentation primarily exposes cGAN through recall collapse in Logistic Regression. WGAN and WGAN-GP provide consistently balanced augmentation, whereas tree-based classifiers remain largely insensitive to GAN choice once sufficient minority representation is achieved.
Ablation at r = 0.25 (800 Epochs)
Under extreme training (800 epochs) with a moderate augmentation (Table 25), architectural stability differences become pronounced, especially for linear and margin-based classifiers.
In logistic regression, prolonged training destabilizes simpler objectives. The Vanilla GAN produces 713 false negatives and 444 false positives, whereas the cGAN reduces recall to 3331 false negatives. In contrast, WGAN and WGAN-GP substantially improve stability, reducing false negatives to 121 and 133, respectively. However, WGAN-GP introduces more false positives (135) than WGAN (83), indicating slightly broader synthetic support under the gradient penalty.
For SVM, Wasserstein-based models again dominate. The Vanilla GAN yields 153 false negatives, and cGAN increases false negatives to 168. In contrast, WGAN and WGAN-GP substantially improve stability at longer training durations.
For KNN, conditional generation degrades recall (46 false negatives for Vanilla, 117 for cGAN), whereas WGAN and WGAN-GP maintain much lower false-negative counts (5 and 8, respectively), demonstrating better neighborhood preservation.
Tree-based models remain largely invariant. Decision Tree reports at most two false negatives across all variants, and Random Forest maintains 0–1 false negatives with zero false positives.
Overall, at r = 0.25 and 800 epochs, extending training severely penalizes Vanilla GAN and cGAN, particularly for Logistic Regression and KNN. In contrast, WGAN and WGAN-GP provide stable augmentation, with WGAN offering the best FN-FP balance. Tree-based classifiers remain robust regardless of GAN choice.
Ablation at Ratio = 0.55 (Epochs = 800)
Under the most aggressive configuration (r = 0.50, 800 epochs), architectural stability differences are pronounced, particularly for linear and distance-based classifiers (Table 26).
For Logistic Regression, Vanilla GAN becomes highly unstable (1063 false negatives and 444 false positives) while cGAN collapses recall (3991 false negatives). In contrast, WGAN and WGAN-GP remain substantially more stable, reducing false negatives to 106 and 136, respectively. WGAN-GP introduces slightly more false-positive count (146) than WGAN (130), suggesting broader synthetic coverage under the gradient penalty.
For SVM, the Wasserstein-based objectives clearly dominate. Vanilla GAN and cGAN yield 208–219 false negatives, whereas WGAN and WGAN-GP reduce the false-negative count to four and three (12 false positives each), confirming strong margin preservation under prolonged training.
For KNN, similar trends appear. Vanilla GAN (53 false negatives) and cGAN (165 false negatives) degrade substantially, while WGAN and WGAN-GP reduce false negatives to 4–5, with modest false positives (10–12), indicating superior neighborhood stability.
Tree-based models remain invariant. Decision Tree achieves perfect classification across all variants, and Random Forest maintains zero false positives with at most one false negative.
Overall, at r = 0.50 and 800 epochs, heavy augmentation combined with prolonged training severely destabilizes Vanilla GAN and cGAN for Resource Development detection. In contrast, WGAN and WGAN-GP consistently yield stable augmentation, with WGAN achieving the best balance between low false positives and low false negatives. Tree-based classifiers remain largely unaffected by GAN choice.
Figure 6 summarizes recall performance for the Resource Development detection task across GAN variants and experimental configurations. Decision Tree and Random Forest consistently achieve perfect or near-perfect recall across all settings, demonstrating robustness to the choice of generative model. KNN and SVM also maintain high recall with only minor variations across configurations. Logistic Regression shows reduced recall when trained with cGAN-generated samples, particularly at higher augmentation ratios, indicating sensitivity to differences in the synthetic data distribution.

6.2.7. Defense Evasion

Defense Evasion [17,45] is a very rare minority class in the UWF-ZeekData22 dataset, represented by a single observed instance, which accounts for an almost negligible fraction of total network traffic and underscores the severe class imbalance.
Ablation at r = 0.25 (400 Epochs)
Under moderate augmentation (r = 0.25, 400 epochs), Defense Evasion is highly separable across most GAN variants, with differences mainly affecting linear models (Table 27).
For Logistic Regression, Vanilla GAN achieves near-perfect performance (two false negatives and zero false positives). In contrast, cGAN collapses recall (836 false negatives). WGAN and WGAN-GP remain stable, with only five and three false negatives and minimal false positives (2–5), confirming the stabilizing effect of Wasserstein objectives.
For SVM, performance is almost perfect. Vanilla GAN records zero false negatives and zero false positives, while cGAN introduces minor errors (11 false negatives, eight false positives). WGAN and WGAN-GP maintain zero false negatives and only six false positives.
For KNN, all GAN variants perform strongly. Vanilla GAN achieves 0 false negatives and only three false positives, whereas cGAN slightly degrades (seven false negatives, 10 false positives). WGAN and WGAN-GP maintain low errors (≤3 false negatives, ≤5 false positives).
Tree-based models remain invariant. Decision Tree achieves perfect classification, and Random Forest reports only a single false negative across all variants.
Overall, Defense Evasion is inherently separable at this ratio. Vanilla GAN, WGAN, and WGAN-GP perform exceptionally well, whereas cGAN shows recall collapse. Wasserstein-based objectives provide the most stable augmentation, while tree-based classifiers remain unaffected by GAN choice.
Ablation at r = 0.50 (400 Epochs)
Under heavier augmentation (r = 0.50, 400 epochs), Defense Evasion remains highly separable, with architectural differences affecting Logistic Regression (Table 28).
For Logistic Regression, the Vanilla GAN achieves near-perfect performance (three false negatives and 0 false positives). In contrast, cGAN collapses recall (976 false negatives and 16 false positives). WGAN and WGAN-GP maintain stability, producing only 3–5 false negatives and 4–5 false positives, demonstrating robustness under increased synthetic pressure.
For SVM, classification is nearly flawless across variants. Vanilla GAN and WGAN-GP produce zero false negatives, while cGAN introduces a small number (13). False positives remain minimal across variants (≤8).
For KNN, all variants perform strongly. Vanilla GAN records zero false negatives and only 5 false positives, while cGAN introduces a single false negative and 10 false positives. WGAN and WGAN-GP exhibit similarly low misclassification rates, indicating preserved neighborhood structure.
Tree-based models remain invariant. Decision Tree achieves perfect classification except for one false negative under cGAN, and Random Forest reports only a single false negative across all variants.
Overall, increasing the augmentation ratio does not degrade Vanilla GAN or Wasserstein-based models. Defense Evasion is inherently separable once minority density is sufficient. However, cGAN again exhibits severe recall degradation, while WGAN and WGAN-GP provide the most stable and reliable augmentation. Tree-based classifiers remain largely unaffected by GAN choice.
Ablation at Ratio = 0.25 (Epochs = 800)
Under extended training (800 epochs, r = 0.25), Defense Evasion reveals clear stability differences across GAN variants (Table 29).
For Logistic Regression, Vanilla GAN remains near-perfect (one false negative and zero false positives). In contrast, cGAN collapses (4431 false negatives and 611 false positives). Both WGAN and WGAN-GP remain highly stable, with five and two false negatives, respectively, and no false positives.
For SVM, Vanilla GAN achieves perfect classification, with zero false negatives and zero false positives. WGAN and WGAN-GP maintain minimal errors (≤1 false negative, ≤6 false positives), whereas cGAN degrades substantially (633 false negatives).
For KNN, Vanilla GAN, WGAN, and WGAN-GP maintain strong performance (≤1 false negative, minimal false positives), while cGAN again deteriorates (509 false negatives, 24 false positives), indicating distortion of local neighborhood structure.
Tree-based models remain invariant. Decision Tree records at most two false negatives and no false positives, and Random Forest shows only a single false negative across all variants.
Overall, prolonged training at moderate augmentation does not destabilize Vanilla GAN or Wasserstein-based models. However, cGAN consistently collapses across classifiers. These results reinforce that Wasserstein objectives provide stable long-duration training, while conditional generation without additional constraints is prone to severe recall degradation. Tree-based classifiers remain unaffected.
Ablation at r = 0.50 (800 Epochs)
Under the most aggressive configuration (r = 0.50, 800 epochs), Defense Evasion exposes clear architectural stability differences (Table 30).
For Logistic Regression, Vanilla GAN remains stable (one false negative, zero false positives). In contrast, cGAN collapses severely, producing 5841 false negatives and 541 false positives. Both WGAN and WGAN-GP remain highly robust, with four and three false negatives, respectively, and no false positives.
For SVM, Vanilla GAN achieves perfect classification, with zero false negatives and zero false positives. WGAN and WGAN-GP maintain near-perfect separation, introducing only 5–6 false positives with zero false negatives. In contrast, cGAN degrades sharply, producing 1092 false negatives and 132 false positives, confirming instability under heavy oversampling.
For KNN, Vanilla GAN, and Wasserstein variants maintain strong performance, with zero false negatives and low false positives (≤10). In contrast, cGAN again underperforms, producing 673 false negatives and 32 false positives, indicating disruption of neighborhood structure.
Tree-based models remain invariant. Decision Tree shows at most two false negatives and zero false positives, while Random Forest records one false negative and zero false positives across all variants.
Overall, at higher augmentation and extended training, Vanilla GAN, Wasserstein-based models remain stable, whereas cGAN consistently collapses across linear, margin-based, and distance-based classifiers. These findings reinforce the necessity of Wasserstein-based objectives for stability under extreme synthetic pressure, while tree-based classifiers remain largely unaffected.
Figure 7 summarizes recall performance for the Defense Evasion detection task across GAN variants and experimental configurations. Decision Tree, Random Forest, KNN, and SVM achieve near-perfect recall across all settings, indicating strong separability of this attack class. Logistic Regression shows noticeable reduction in recall when trained with cGAN-generated samples, particularly at higher augmentation ratios and training durations. Overall, classifier performance remains highly stable regardless of augmentation strategy.

6.2.8. Initial Access

Initial Access [17,43] is an exceptionally rare minority class in the UWF-ZeekData22 dataset, represented by a single record, accounting for a negligible portion of overall traffic.
Ablation at r = 0.25 (400 Epochs)
At a moderate augmentation ratio (0.25) and 400 epochs (Table 31), Initial Access detection remains highly stable across most GAN variants.
For Logistic Regression, Vanilla GAN performs strongly, producing only five false negatives and zero false positives. In contrast, cGAN degrades substantially, with 1058 false negatives and 133 false positives. WGAN and WGAN-GP slightly improve recall, reducing false negatives to four while keeping false positives very low (five and four, respectively), demonstrating better stability than cGAN and marginal gains over Vanilla GAN.
For SVM, the Vanilla GAN achieves perfect classification with zero false positives and zero false negatives. WGAN and WGAN-GP maintain near-perfect performance, introducing only six false positives and zero false negatives, while cGAN underperforms, yielding 36 false negatives and eight false positives, indicating conditional instability even at modest ratios.
For KNN, all GAN variants perform similarly, with false negatives ≤ 5 and low false-positive rates, suggesting that neighborhood structure is preserved across objectives.
Tree-based models are effectively invariant. The Decision Tree shows only minor errors under cGAN, and the Random Forest records a single false negative.
Overall, at r = 0.25 and 400 epochs, WGAN and WGAN-GP provide slight recall improvements, cGAN underperforms for linear and margin-based classifiers, and tree-based models remain robust.
Ablation at r = 0.50 (400 Epochs)
At a higher augmentation ratio (0.50) and 400 epochs (Table 32), divergence appears primarily in Logistic Regression, while other classifiers remain stable.
For Logistic Regression, Vanilla GAN remains strong, producing only two false negatives and zero false positives. In contrast, cGAN degrades sharply, with 1224 false negatives and 19 false positives. Both WGAN and WGAN-GP restore stability, reducing false negatives to two and five, respectively, with only a small number of false positives (seven and four), confirming the advantage of Wasserstein-based objectives under increased augmentation pressure.
For SVM, performance remains near-perfect across Vanilla GAN, WGAN, and WGAN-GP, with false positives limited to at most 6. In contrast, cGAN introduces 44 false negatives, indicating reduced margin stability.
For KNN, all variants perform well, false negatives remain ≤ 2, while false positives range from six to 11, with cGAN exhibiting the highest false-positive count. This suggests a minor neighborhood overlap.
Tree-based models remain invariant. Decision Tree shows a single false negative under cGAN, while Random Forest produces at most one false negative across variants, with no false positives.
Overall, at r = 0.50 and 400 epochs, WGAN and WGAN-GP provide the most stable augmentation for Initial Access, whereas cGAN consistently underperforms, particularly for linear classifiers. Increasing the minority ratio does not degrade performance for margin-based, distance-based, or tree-based classifiers.
Ablation at r = 0.25 (800 Epochs)
At r = 0.25 with extended training (800 epochs), performance remains highly stable for Vanilla GAN, WGAN, and WGAN-GP, while cGAN exhibits pronounced instability (Table 33).
For Logistic Regression, Vanilla GAN, WGAN, and WGAN-GP all achieve near-perfect classification, each producing at most three false negatives and zero false positives. In contrast, cGAN collapses, generating 4902 false negatives and 22 false positives, indicating severe recall degradation under prolonged training.
For SVM, the same pattern emerges. Vanilla GAN, Wasserstein variants yield perfect separation, with zero false negatives and zero false positives. Conversely, cGAN performed poorly, introducing 927 false negatives and 37 false positives.
For KNN, error rates remain minimal for Vanilla GAN, WGAN, and WGAN-GP, with false negatives ≤ 3 and false positives ≤ 5. In contrast, cGAN degrades sharply, producing 514 false negatives and 29 false positives, reflecting disrupted neighborhood structure.
Tree-based models remain invariant. Decision Trees records only a single false negative under cGAN, and Random Forest shows at most one false negative with no false positives.
Overall, under moderate augmentation with extended training, Vanilla GAN and Wasserstein-based objectives remain stable and reliable, whereas cGAN consistently collapses across all classifiers. These results further underscore the importance of adversarial stability during prolonged training for Initial Access detection.
Ablation at r = 0.50 (800 Epochs)
Table 34 shows the most aggressive configuration (r = 0.50, 800 epochs); clear architectural differences emerge for Initial Access.
For Logistic Regression, the Vanilla GAN, WGAN, and WGAN-GP remain near-perfect, producing at most two false negatives and no false positives, indicating preserved linear separability despite heavy oversampling. In contrast, cGAN collapses, producing 6406 false negatives and 19 false positives.
For SVM, Vanilla GAN, and Wasserstein variants maintain strong performance, with false negatives ≤ 1 and false positives ≤ 6, whereas cGAN degrades substantially, yielding 1654 false negatives and 153 false positives, reflecting severe margin distortion.
For KNN, the same pattern holds. Vanilla GAN, WGAN, and WGAN-GP remain stable, producing at most two false negatives and at most eleven false positives, while cGAN introduces 428 false negatives and 54 false positives, indicating a distrusted neighborhood structure.
Tree-based models remain largely invariant. Decision Tree and Random Forest show perfect or near-perfect classification for non-conditional GANs, with cGAN introducing only two false negatives.
Overall, at high augmentation and extended training, Vanilla GAN and Wasserstein-based objectives remain stable and accurate, whereas cGAN consistently collapses across linear, margin-based, and distance-based classifiers. These results reinforce that Wasserstein training—particularly WGAN-GP—provides the most reliable behavior under an extreme augmentation pressure.
Figure 8 summarizes recall performance for the Initial Access detection task across GAN variants and experimental configurations. Decision Tree and Random Forest consistently achieve perfect or near-perfect recall across all settings, indicating strong robustness to the augmentation strategy. KNN and SVM also maintain high recall with minimal variation across GAN variants. Logistic Regression shows reduced recall when trained with cGAN-generated samples, particularly at higher augmentation ratios and training durations, suggesting sensitivity to differences in the synthetic data distribution.

6.2.9. Persistence

Persistence [17,44] is an extremely rare minority class in the UWF-ZeekData22 dataset, represented by a single observed instance, comprising a negligible fraction of overall network traffic and reflecting the severe class imbalance.
Ablation at r = 0.25 (400 Epochs)
Under moderate augmentation (r = 0.25, 400 epochs), Table 35 shows Persistence detection remains highly stable for non-conditional GAN variants but degrades under conditional generation.
For Logistic Regression, Vanilla GAN, WGAN, and WGAN-GP all achieve near-perfect performance, each producing fewer than six false negatives and at most four false positives, indicating strong linear separability. In contrast, cGAN collapses, producing 1172 false negatives and 19 false positives.
For SVM, performance is consistently strong across all non-conditional GAN variants. GAN, WGAN, and WGAN-GP achieve zero false negatives and at most six false positives. cGAN again underperforms, introducing 28 false negatives and eight false positives, reflecting reduced marginal separability.
For KNN, all variants perform well. GAN, WGAN, and WGAN-GP maintain false negatives ≤ 4 and false positives ≤ 5, while cGAN produces slightly higher errors (five false negatives and 12 false positives).
Tree-based models remain largely invariant. Decision Trees and Random Forests show perfect or near-perfect classification for non-conditional GANs, with cGAN introducing at most two false negatives.
Overall, at moderate augmentation and short training, Vanilla GAN, WGAN, and WGAN-GP provide stable and faithful synthetic samples, whereas cGAN exhibits reduced stability across linear, margin-based, and distance-based classifiers.
Ablation at r = 0.50 (400 Epochs)
Under heavier augmentation (r = 0.50, 400 epochs), Table 36 shows that Persistence detection remains highly stable for Vanilla GAN and Wasserstein objectives, while cGAN degrades markedly.
For Logistic Regression, Vanilla GAN, WGAN, and WGAN-GP maintain near-perfect performance, each producing at most five false negatives and at most five false positives, indicating preserved linear separability despite doubled synthetic density. In contrast, cGAN collapses, producing 1348 false negatives and 127 false positives, reflecting substantial class overlap under conditional oversampling.
For SVM, non-conditional GAN variants (GAN, WGAN, and WGAN-GP) achieve perfect recall (FN = 0) with very small false-positive counts (≤6), whereas cGAN increases misclassification, with 32 false negatives and eight false positives, confirming reduced marginal stability.
For KNN, non-conditional variants maintain very low errors with false negatives ≤ 1 and false positives ≤ 6, while cGAN increases misclassification (nine false negatives and 14 false positives), indicating neighborhood distortion at higher ratios.
Tree-based models remain effectively invariant. Decision Tree achieves perfect classification, and Random Forest records at most one false positive across all variants.
Overall, increasing the augmentation ratio amplifies instability in cGAN, whereas Vanilla GAN and Wasserstein-based objectives scale reliability with synthetic data volume, maintaining strong class fidelity across classifier families.
Ablation at r = 0.25 (800 Epochs)
Under extended training (800 epochs) with moderate augmentation (r = 0.25), Table 37 shows that Persistence remains stable for non-conditional GANs but deteriorates sharply for cGANs.
For Logistic Regression, Vanilla GAN, WGAN, and WGAN-GP all maintain near-perfect performance, with false negatives ≤ 5 and zero false positives, indicating that longer training does not destabilize these objectives. In contrast, cGAN collapses, producing 2209 false negatives and 11 false positives, reflecting severe recall degradation.
For SVM, Vanilla GAN, WGAN, and WGAN-GP achieve perfect recall (FN = 0) with very small false-positive counts (≤6), whereas cGAN introduces 136 false negatives and nine false positives, confirming instability under prolonged conditional training.
For KNN, non-conditional variants remain robust, with false negatives ≤ 1 and false positives ≤ 5, indicating preservation of local neighborhood structure. However, cGAN increases errors, yielding 82 false negatives and eight false positives, indicating disruption of neighborhood structure.
Tree-based models remain effectively invariant. Decision Tree achieves perfect classification, and Random Forest records at most one false positive across variants.
Overall, extending training at a moderate ratio amplifies instability in cGAN, whereas Vanilla GAN and Wasserstein-based objectives remain stable and reliable for Persistence detection.
Ablation at r = 0.50 (800 Epochs)
Under the most aggressive configuration (r = 0.50, 800 epochs), Table 38 presents that Persistence clearly separates stable from unstable adversarial objectives.
For Logistic Regression, Vanilla GAN, WGAN, and WGAN-GP remain highly stable, each producing at most three false negatives and no false positives, demonstrating that minority-class structure is preserved under heavy augmentation. In contrast, cGAN collapses, producing 2592 false negatives and 14 false positives, indicating severe recall loss.
For SVM, non-conditional GAN variants achieve perfect recall (FN = 0) with only minor false-positive counts (≤6), while cGAN degrades markedly, producing 211 false negatives and 14 false positives, confirming instability under sustained conditional training.
For KNN, Vanilla GAN, WGAN, and WGAN-GP maintain a zero false-negative rate, with low false-positive counts (≤7), whereas cGAN increases errors, with 123 false negatives and nine false positives, reflecting neighborhood distortion.
Tree-based classifiers remain effectively invariant. Decision Tree records zero false negatives and at most two false positives, and Random Forest produces at most one false positive across variants.
Overall, at higher augmentation and extended training, cGAN consistently fails, while Vanilla GAN and especially Wasserstein-based objectives remain stable. Tree-based classifiers remain insensitive to GAN choice, augmentation, and training duration.
Figure 9 summarizes recall performance for the Persistence detection task across GAN variants and experimental configurations. Decision Tree and Random Forest consistently achieve perfect or near-perfect recall across all settings, demonstrating robustness to the augmentation strategy. KNN and SVM also maintain high recall with minimal variations across GAN variants. Logistic Regression exhibits a noticeable reduction in recall when trained with cGAN-generated samples, particularly at higher augmentation ratios and longer training durations, indicating sensitivity to differences in the generated data distribution.

6.3. Training Time Results

Ablation Training

Ablation experiments were conducted on Google Colab Pro using the same hardware configuration described in Section 5.7, with modified hyperparameters [32,50], including alternative noise modes, adjusted generator and discriminator dropout rates, and extended epoch settings. These ablation-specific modifications altered the computational profile of each GAN variant. Table 39 reports the observed training time per variant under the ablation configuration.
Under this setup, the Vanilla GAN required 26 min. The cGAN incurred the highest computational cost in this scenario, completing training in 35 min. The additional conditioning mechanism increases model complexity by incorporating label information into both the generator and discriminator networks [16]. Both WGAN and WGAN-GP required 29 min and exhibited nearly identical runtimes under the ablation configuration.
Overall, these results indicate that training-time sensitivity under ablation conditions depends on the interaction between architectural complexity and the hyperparameter configuration, rather than on the adversarial objective alone. While computational costs are modest across variants, conditional generation exhibits the greatest sensitivity to extended training and regularization adjustments in this experimental setting.

6.4. Summary of Results

Across nine MITRE ATT&CK tactics and multiple ablation configurations, the experimental results reveal consistent and interpretable patterns regarding GAN objective choice, augmentation ratio, and training duration.
First, Wasserstein-based objectives (WGAN and WGAN-GP) demonstrate the most stable and reliable behavior across extreme imbalance conditions. Under moderate augmentation (r = 0.25), both variants consistently reduce false negatives for linear and margin-based classifiers while maintaining low false-positive rates. Under aggressive augmentation (r = 0.50) and extended training (800 epochs), WGAN and WGAN-GP remain stable for nearly all tactics, particularly Logistic Regression, SVM, and KNN. WGAN-GP often achieves the lowest false-negative counts but occasionally introduces slightly higher false positives, suggesting broader synthetic support under gradient-penalized training.
Second, the Vanilla GAN performs competitively in moderately imbalanced regimes and remains stable for several tactics under prolonged training. However, under extreme sparsity combined with high augmentation pressure, it occasionally exhibits increased overlap between synthetic minority samples and benign traffic, reflected in elevated false-positive counts in certain tactics (e.g., Lateral Movement and Resource Development).
Third, the conditional GAN (cGAN) consistently exhibits instability under extended training and higher augmentation ratios. Across multiple tactics—including Credential Access, Lateral Movement, Resource Development, Defense Evasion, Initial Access, and Persistence—cGAN frequently produces substantial recall degradation, with false-negative counts increasing by orders of magnitude relative to non-conditional variants. This pattern is most pronounced under the combined stress of high augmentation (r = 50) and long training (800 epochs), indicating sensitivity of conditional objectives to oversaturation in extremely sparse tabular regimes.
Fourth, classifier sensitivity varies systematically:
  • Logistic Regression is the most sensitive to GAN choice, clearly reflecting differences in synthetic distribution alignment.
  • SVM demonstrates strong robustness but still reveals instability under conditional generation.
  • KNN is moderately sensitive, particularly to neighborhood distortion under unstable GAN objectives.
  • Decision Tree and Random Forest remain largely invariant across all configurations once sufficient minority density is achieved, often reaching near-perfect performance regardless of GAN variants.
Fifth, increasing the augmentation ratio from 0.25 to 0.50 generally improves recall stability for robust GAN objectives but amplifies instability in weaker ones. Similarly, extending training from 400 to 800 epochs benefits Wasserstein-based models while exacerbating collapse in conditional formulations.
Overall, the results demonstrate that adversarial objective choice is the dominant factor under extreme imbalance, particularly when augmentation pressure and training duration increase. Wasserstein-based objectives provide the most consistent performance across tactics and classifiers, whereas conditional generation without additional stabilization mechanisms is highly susceptible to recall collapse in sparse cybersecurity settings. Tree-based classifiers remain largely unaffected by GAN selection once minority samples reach sufficient density.

7. Discussions

The results in Section 6 show that GAN-based minority augmentation can substantially improve intrusion detection under extreme class imbalance, but that depends strongly on the augmentation regime and training rather than adversarial complexity alone. Across the controlled ablation study—varying GAN formulation, augmentation ratio ( r   ϵ   { 0.25 , 0.50 } ), training duration (400 vs. 800 epochs), and classifier family—clear, repeatable patterns emerge regarding when augmentation improves minority detection and when it destabilizes downstream learning.
Rather than identifying a universally superior GAN, the findings emphasize the dominant role of data-centric factors, particularly minority density and the interaction between synthetic pressure and classifier geometry. Architectural stability mechanisms matter most under aggressive settings (higher r and/or longer training), where synthetic samples exert a stronger influence on decision boundaries.

7.1. Ablation Results

7.1.1. Architectural Comparison (Vanilla GAN vs. cGAN vs. WGAN vs. WGAN-GP)

Across the nine evaluated MITRE ATT&CK tactics (Discovery, Credential Access, Privilege Escalation, Exfiltration, Lateral Movement, Resource Development, Defense Evasion, Initial Access, and Persistence), results show consistent behavioral differences among GAN variants. These differences reflect trade-offs in training stability and downstream classifier compatibility, and they become most visible as the augmentation ratio and epoch count increase.
Vanilla GAN
Vanilla GAN provides a strong baseline across many tactics. In moderate regimes, it often produces classifier-compatible synthetic samples and yields stable performance for Logistic Regression, SVM, and KNN, whereas tree-based models remain largely invariant once minority density is sufficiently high.
However, Section 6 shows that Vanilla GAN can become less reliable under aggressive settings for ultra-sparse tactics (notably for Lateral Movement and Resource Development), whereas false positives may increase and linear separability can degrade, particularly at 800 epochs and/or r = 0.50. Overall, Vanilla GAN is dependable under moderate augmentation, but less robust under strong synthetic pressure.
Conditional GAN (cGAN)
cGAN is the least stable variant in Section 6. Although conditioning is intended to guide generation, the results show that cGAN frequently collapses in recall as augmentation pressure or training duration increases.
Across multiple tactics—including Credential Access, Lateral Movement, Resource Development, Defense Evasion, Initial Access, and Persistence—cGAN produces very large increases in false negatives for Logistic Regression and often destabilizes SVM and KNN under 800 epochs and/or r = 0.50. These patterns are consistent with conditional overfitting or mode concentration when conditioning is applied to extremely sparse real minority samples.
While cGAN occasionally performs acceptably under lighter settings, Section 6 indicates that its behavior is not consistent enough to be considered robust under extreme imbalance.
WGAN (Wasserstein GAN)
WGAN demonstrates consistently improved stability relative to Vanilla GAN and cGAN, aligned with smoother Wasserstein gradients. In Section 6, WGAN repeatedly avoids catastrophic failures and preserves downstream separability across both ratios and epoch settings.
The advantage is most visible for margin-based and linear models, where WGAN frequently reduces false negatives compared with Vanilla GAN in spares tactics (e.g., Credential Access, Exfiltration, Lateral Movement, Resource Development). However, these gains are often incremental, and in many moderate settings, WGAN performs similarly to Vanilla GAN, indicating stability rather than universal dominance.
WGAN-GP (Wasserstein GAN with Gradient Penalty)
WGAN-GP is generally the most robust architecture under aggressive regimes (higher r and/or 800 epochs), consistently preventing the severe recall collapse observed in cGAN and improving stability for difficult sparse tactics (Credential Access, Lateral Movement, Resource Development, Initial Access, Persistence) through gradient-penalty regularization.
Section 6 also suggests a recurring trade-off where WGAN-GP can occasionally yield slightly higher false positives than WGAN in some settings, consistent with broader synthetic support. Importantly, this does not appear as systematic degradation; rather, it is a stability-overlap trade-off. Overall, WGAN-GP provides the strongest robustness when the synthetic pressure is high.

7.2. Effects of Augmentation Ratio (r = 0.25 vs. r = 0.50)

7.2.1. Ratio = 0.25 (Moderate Synthetic Pressure)

Across Section 6, r = 0.25 is a consistently stable augmentation level. It increases minority density enough to support learning while limiting synthetic dominance. Under this regime, non-conditional variants (Vanilla GAN, WGAN, WGAN-GP) typically maintain compact minority structure, leading to stable Logistic Regression and SVM behavior and coherent KNN neighborhoods.
Tree-based models frequently achieve near-perfect results once even moderate density is achieved, indicating that at r = 0.25, improvements are often driven primarily by class balancing rather than fine-grained GAN differences.

7.2.2. Ratio = 0.50 (Higher Synthetic Pressure)

Increasing r = 0.50 often further reduces minority sparsity, but Section 6 shows that the effect is architecture- and tactic-dependent rather than uniformly beneficial. For stable generators (WGAN/WGAN-GP, and often Vanilla GAN), performance is commonly maintained and sometimes improved, particularly in recall-sensitive cases.
However, r = 0.50 consistently amplifies instability in cGAN, resulting in a substantial increase in false negatives across several tactics and classifiers. For some tactics, higher r yields diminishing returns once density is already adequate, especially for tree-based models that are already near-perfect at r = 0.25.

7.2.3. Ratio Takeaway

Overall, Section 6 supports the conclusion that r = 0.25 is a strong default for stability, while r = 0.50 is viable when paired with robust objectives (especially WGAN/WGAN-GP) but carries a higher risk for unstable variants (especially cGAN). The primary determinant at higher r is whether the GAN objective can preserve minority structure without drift or collapse.

7.3. Effects of Training Duration (400 vs. 800 Epochs)

7.3.1. 400 Epochs

Training at 400 epochs is generally stable across tactics for non-conditional GANs and produces consistent gains without strongly amplifying drift. In Section 6, many tactics already achieve stable performance at 400 epochs once augmentation is applied, particularly for SVM/KNN and tree-based models.

7.3.2. 800 Epochs

Extending training to 800 epochs produces architecture-dependent effects. In Section 6, WGAN and, in particular, WGAN-GP tend to tolerate longer training well and sometimes improve minority recall for sensitive classifiers. In contrast, Vanilla GAN exhibits mild degradation for ultra-sparse tactics under aggressive conditions, whereas cGAN frequently collapses during extended training, producing large false-negative spikes across multiple tactics.
Thus, longer training is not uniformly beneficial; it is most effective when the adversarial objective is sufficiently stable to improve minority modeling without overfitting or collapse.

7.4. Classifier Sensitivity Patterns

Section 6 shows consistent differences in how classifiers respond to synthetic-data fidelity:
  • Logistic Regression is the most sensitive and serves as a strong diagnostic of drift/overlap. It sharply exposes cGAN collapse and differentiates WGAN vs. WGAN-GP trade-offs.
  • SVM is generally robust once class density is adequate, but still shows clear degradation when synthetic distributions become unstable (most notably under cGAN in aggressive regimes).
  • KNN responds primarily to neighborhood coherence; it remains stable under WGAN/WGAN-GP but can degrade when synthetic samples disrupt local structure (again, most visible under cGAN).
  • Decision Tree and Random Forest are the most robust. Across most tactics, once augmentation increases density, these models are often near-perfect and relatively insensitive to GAN variants, ratio, and epochs.

7.5. Summary

Section 6 shows consistent differences in how classifiers respond to synthetic-data fidelity:
  • cGAN is consistently the least reliable under extreme sparsity, often collapsing in recall as r and/or epochs increase.
  • WGAN and WGAN-GP are the most stable across tactics and aggressive settings, with WGAN-GP providing the strongest robustness under high synthetic pressure.
  • Vanilla GAN is a competitive baseline, especially under moderate settings, but less robust than Wasserstein-based objectives for ultra-sparse tactics under heavy pressure.
  • r = 0.25 is the most consistently stable regime, while r = 0.50 can help or be neutral when stability is maintained, but strongly exposes weakness in unstable objectives.
  • Classifier choice matters: LR/SVM/KNN reflect synthetic fidelity, while tree-based models often saturate once density becomes sufficient.

7.6. Limitations

Despite the strong and consistent patterns observed in this study, several limitations should be acknowledged.
First, GAN training under extreme class imbalance remains inherently unstable in ultra-sparse regimes. Although Wasserstein-based objectives demonstrated improved robustness relative to Vanilla GAN and cGAN, performance remained sensitive to augmentation ratio and training duration. As shown in Section 6, increasing the synthetic ratio or extending adversarial training can amplify the architectural weakness, leading to distributional drift, over-smoothing, or recall collapse, particularly for conditional objectives. These findings highlight that minority-manifold fidelity remains difficult to preserve when real samples are extremely limited, and stability does not always guarantee optimal downstream separability.
Second, the experimental analysis is conducted exclusively on the UWF-ZeekData22 dataset, which reflects realistic network telemetry derived from Zeek logs and aligned with multiple MITRE ATT&CK tactics. While the dataset provides a strong testbed for extreme class imbalance in structured network telemetry, the observed architectural and classifier-sensitivity patterns should be interpreted within the context of this type of data. The results are therefore expected to generalize most naturally to other intrusion detection datasets derived from similar Zeek-based network telemetry, where feature structure and traffic semantics are comparable. However, they may not generalize uniformly to cybersecurity datasets constructed from fundamentally different data sources, such as NetFlow summaries, host-based logs, or alert-centric datasets with different feature representations and attack taxonomies. External validation across heterogeneous telemetry sources would therefore be necessary to confirm the broader applicability of the findings. Nevertheless, several observed patterns—such as the relative stability of Wasserstein-based objectives and the sensitivity of linear classifiers to synthetic distribution drift—are consistent with theoretical expectations regarding adversarial training and decision-boundary geometry, suggesting that similar dynamics may emerge in related network telemetry datasets, although empirical validation remains necessary.
Third, the downstream evaluation focuses on classical machine learning classifiers (LR, SVM, KNN, DT, and RF). This choice enables controlled analysis of how the quality of synthetic distributions affects different decision-boundary mechanisms and provides clear diagnostic insight into recall-precision trade-offs. However, it does not capture interactions between GAN-based augmentation and deep neural intrusion detection models that are increasingly used in modern IDS. Deep architectures, such as CNN- or LSTM-based models, may respond differently to synthetic manifold expansion due to their representation of learning capabilities. Therefore, evaluating GAN-based augmentation in conjunction with deep IDS architecture remains an important direction for future research.
Fourth, the GAN architectures were evaluated under aligned architectural templates and harmonized hyperparameter baselines to ensure fair cross-objective comparison. While this controlled design isolates the effects of adversarial objective choice (Vanilla, conditional, Wasserstein, gradient-penalized Wasserstein), it does not represent exhaustive architecture tuning. Certain variants, particularly cGAN, might benefit from alternative conditioning mechanisms, stronger regularization, or modified discriminator–generator ratios. Therefore, the reported results should be interpreted as comparative behavioral analysis under consistent experimental control, rather than absolute bounds on achievable performance.
Finally, the study indirectly evaluates synthetic quality via downstream classifier performance (e.g., false positives, false negatives, and recall stability) rather than relying solely on standalone generative metrics. Although this decision aligns with the applied objective of improving minority-class detection, future work could incorporate additional distributional diagnostics to assess synthetic manifold fidelity. In particular, explicit validation of logical relationships between network telemetry features could further verify that the generated samples preserved domain-consistent behavior rather than merely matching statistical distributions. While unrealistic feature combinations would typically degrade classifier performance by introducing noise into the training process, future work could incorporate domain-aware constraints or feature-consistency checks to directly assess semantic validity in generated network records.

8. Conclusions

This study investigated the role of generative data augmentation for addressing extreme class imbalance in tabular intrusion detection. Using the UWF-ZeekData22 dataset, nine MITRE ATT&CK tactics were modeled as independently binary classification tasks. Therefore, the reported behavior should be interpreted within the context of this dataset and experimental configuration rather than universally generalizable properties of GAN-based augmentation. For each tactic, separate GAN variants—Vanilla GAN, Conditional GAN (cGAN), WGAN, and WGAN-GP—were trained exclusively on minority-class samples and evaluated under controlled ablation settings spanning augmentation ratios (0.25 and 0.50), training durations (400 and 800 epochs), and five classical classifiers.
Within the UWF-ZeekData22 dataset, the results reveal consistent structural patterns rather than isolated performance gains. Minority augmentation substantially reduces false negatives across most tactics, particularly for linear and margin-based classifiers, confirming that increasing minority density improves learnability in sparse regimes. However, improvements are not uniform across GAN formulations. Within the evaluated dataset and experimental settings, Wasserstein-based objectives (WGAN and WGAN-GP) demonstrate the most stable behavior under aggressive augmentation and extended training, maintaining recall while limiting catastrophic drift. In contrast, cGAN frequently exhibits recall collapse under high augmentation pressure or prolonged optimization, and is sensitive to conditional overfitting in extremely sparse settings. Vanilla GAN provides competitive performance in moderate regimes but shows mild instability when augmentation becomes aggressive.
Classifier sensitivity further clarifies these dynamics. Logistic Regression consistently acts as the most sensitive indicator of synthetic distribution fidelity, exposing subtle drift and boundary distortion. SVM and KNN show moderate sensitivity, particularly when the margin structure is affected. Decision Tree and Random Forest remain largely invariant once minority density reaches a sufficient threshold, indicating that class balancing alone often suffices for tree-based ensembles. These findings collectively demonstrate that GAN-based augmentation is effective when synthetic manifold expansion is controlled and structurally coherent, but performance depends on adversarial stability rather than on the synthetic volume alone.
The ablation results also show that moderate augmentation (r = 0.25) with controlled training duration (400 epochs) consistently provides the most reliable balance between recall improvement and distributional fidelity. Increasing the augmentation ratio or the number of training epochs yields diminishing returns and can exacerbate architectural weaknesses. Thus, calibration of augmentation intensity and the choice of adversarial objective are important in ultra-sparse cybersecurity contexts. Future work should validate these observations across additional intrusion detection datasets with different traffic characteristics, feature representations, and attack distributions to further assess the generalizability of the observed stability patterns.

9. Future Work

Several research directions emerge from these findings. First, future work should incorporate explicit distributional diagnostics—such as divergence metrics, manifold geometry analysis, and drift qualification—to complement classifier-based evaluation and more precisely characterize synthetic fidelity under extreme class imbalance. Second, alternative generative paradigms beyond the standard GAN objective should be explored to assess whether they provide improved stability in ultra-sparse tabular regimes. Third, robustness under continual learning settings warrants investigation. Real-world intrusion detection systems operate under evolving attack distributions, and understanding how GAN-based augmentation interacts with shifting distributions is important for operational deployment. Finally, broader validation across additional cybersecurity datasets is necessary to evaluate generalizability and identify dataset-specific behaviors not captured in the present analysis.

Author Contributions

Conceptualization, A.D., S.S.B., D.M., and S.C.B.; methodology, A.D. and S.S.B.; software, A.D.; validation, S.S.B., D.M., and S.C.B.; formal analysis, A.D. and S.S.B.; investigation, A.D.; resources, S.S.B., D.M., and S.C.B.; data curation, A.D. and D.M.; writing—original draft preparation, A.D.; writing—review and editing, S.S.B., D.M., and S.C.B.; visualization, A.D.; supervision, S.S.B., D.M., and S.C.B.; project administration, S.S.B., D.M., and S.C.B.; funding acquisition, S.S.B., D.M., and S.C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets are available at https://datasets.uwf.edu/data/ (accessed on 12 March 2026).

Acknowledgments

This research was partially supported by the Askew Institute at the University of West Florida.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IDSIntrusion Detection System
ATT&CKAdversarial Tactics, Techniques, and Common Knowledge
GANGenerative Adversarial Network
SMOTESynthetic Minority Oversampling Technique
DDiscriminator
GGenerator
LRLogistic Regression
SVMSupport Vector Machine
KNNk-Nearest Neighbor
RFRandom Forest
TNTrue Negative
FNFalse Negative
FPFalse Positive
TPTrue Positive
cGANConditional GAN
WGANWasserstein GAN
WGAN-GPWasserstein GAN with Gradient Penalty
XAIExplainable AI

References

  1. Chawla, N.V.; Japkowicz, N.; Kotcz, A. Editorial: Special Issue on Learning from Imbalanced Data Sets. SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
  2. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  3. Sommer, R.; Paxson, V. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 305–316. [Google Scholar] [CrossRef]
  4. Axelsson, S. The base-rate fallacy and the difficulty of intrusion detection. In Proceedings of the 6th ACM Conference on Computer and Communications Security (CCS), Singapore, 2–4 November 1999; pp. 1–7. [Google Scholar] [CrossRef]
  5. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  6. Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018. [Google Scholar]
  7. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
  8. Lin, Z.; Shi, Y.; Xue, Z. IDSGAN: Generative adversarial networks for attack generation against intrusion detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 1200–1205. [Google Scholar] [CrossRef]
  9. Sabuhi, M.; Zhou, M.; Bezemer, C.-P.; Musilek, P. Applications of generative adversarial networks in anomaly detection: A systematic literature review. IEEE Access 2021, 9, 161003–161029. [Google Scholar] [CrossRef]
  10. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. In Proceedings of the ICML, Sydney, Australia, 10 August 2017. [Google Scholar]
  11. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of Wasserstein GANs. In Proceedings of the NeurIPS, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  12. Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs actually converge? In Proceedings of the ICML, Stockholm, Sweden, 14 July 2018. [Google Scholar] [CrossRef]
  13. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In Proceedings of the NeurIPS, Barcelona, Spain, 5–10 December 2016. [Google Scholar] [CrossRef]
  14. Arora, S.; Zhang, Y. Do GANs actually learn the distribution? In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar] [CrossRef]
  15. Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv 2017, arXiv:1701.04862. [Google Scholar] [CrossRef]
  16. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
  17. Bagui, S.S.; Mink, D.; Bagui, S.C.; Ghosh, T.; Plenkers, R.; McElroy, T. Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT&CK framework. Data 2023, 8, 18. [Google Scholar] [CrossRef]
  18. University of West Florida. UWF-ZeekData22 Dataset. Available online: https://datasets.uwf.edu (accessed on 12 March 2026).
  19. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  20. Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  21. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  22. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
  23. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
  24. Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Comput. Spec. Issue Big Data Anal. Cyber Crime Investig. Prev. 2023, 12, 204. [Google Scholar] [CrossRef]
  25. Bagui, S.S.; Mink, D.; Bagui, S.; Subramaniam, S. Using Resampling to Classify Rare Attack Tactics in UWF-ZeekData22. Knowledge 2024, 4, 96–119. [Google Scholar] [CrossRef]
  26. Insan, H.; Prasetiyowati, S.S.; Sibaroni, Y. SMOTE-LOF and Borderline-SMOTE performance to overcome imbalanced data and outliers on classification. In Proceedings of the 3rd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), Denpasar, Bali, Indonesia, 13–15 December 2023; pp. 136–141. [Google Scholar] [CrossRef]
  27. Asniar; Maulidevi, N.U.; Surendro, K. SMOTE-LOF for noise identification in imbalanced data classification. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 3413–3423. [Google Scholar] [CrossRef]
  28. Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. SIGMOD Rec. 2000, 29, 93–104. [Google Scholar] [CrossRef]
  29. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
  30. Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Advances in Intelligent Computing (ICIC), Hefei, China, 23–26 August 2005; pp. 878–887. [Google Scholar] [CrossRef]
  31. Lee, T.; Kim, M.; Kim, S.-P. Data augmentation effects using borderline-SMOTE on classification of a P300-based BCI. In Proceedings of the 8th International Winter Conference on Brain-Computer Interface (BCI), Gangwon, Republic of Korea, 26–28 February 2020; pp. 1–4. [Google Scholar] [CrossRef]
  32. Lucic, M.; Kurach, K.; Michalski, M.; Gelly, S.; Bousquet, O. Are GANs created equal? A large-scale study. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 3–8 December 2018. [Google Scholar] [CrossRef]
  33. Borji, A. Pros and cons of GAN evaluation measures. Comput. Vis. Image Underst. 2019, 179, 41–65. [Google Scholar] [CrossRef]
  34. Zhao, X.; Fok, K.W.; Thing, V.L.L. Enhancing network intrusion detection performance using generative adversarial networks. Comput. Secur. 2024, 145, 104005. [Google Scholar] [CrossRef]
  35. Agrawal, G.; Kaur, A.; Myneni, S. A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity. Electronics 2024, 13, 322. [Google Scholar] [CrossRef]
  36. Ndayipfukamiye, T.; Ding, J.; Sarwatt, D.S.; Philipo, A.G.; Ning, H. Adversarial Defense in Cybersecurity: A Systematic Review of GANs for Threat Detection and Mitigation. arXiv 2025, arXiv:2509.20411. [Google Scholar] [CrossRef]
  37. MITRE ATT&CK. Reconnaissance (TA0043). Available online: https://attack.mitre.org/tactics/TA0043/ (accessed on 12 March 2026).
  38. MITRE ATT&CK. Credential Access (TA0006). Available online: https://attack.mitre.org/tactics/TA0006/ (accessed on 12 March 2026).
  39. MITRE ATT&CK. Privilege Escalation (TA0004). Available online: https://attack.mitre.org/tactics/TA0004/ (accessed on 12 March 2026).
  40. MITRE ATT&CK. Exfiltration (TA0010). Available online: https://attack.mitre.org/tactics/TA0010/ (accessed on 12 March 2026).
  41. MITRE ATT&CK. Lateral Movement (TA0008). Available online: https://attack.mitre.org/tactics/TA0008/ (accessed on 12 March 2026).
  42. MITRE ATT&CK. Resource Development (TA0042). Available online: https://attack.mitre.org/tactics/TA0042/ (accessed on 12 March 2026).
  43. MITRE ATT&CK. Initial Access (TA0001). Available online: https://attack.mitre.org/tactics/TA0001/ (accessed on 12 March 2026).
  44. MITRE ATT&CK. Persistence (TA0003). Available online: https://attack.mitre.org/tactics/TA0003/ (accessed on 12 March 2026).
  45. MITRE ATT&CK. Defense Evasion (TA0005). Available online: https://attack.mitre.org/tactics/TA0005/ (accessed on 12 March 2026).
  46. MITRE ATT&CK. Discovery (TA0007). Available online: https://attack.mitre.org/tactics/TA0007/ (accessed on 12 March 2026).
  47. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the IJCAI, Montreal, QC, Canada, 20–25 August 1995. [Google Scholar]
  48. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  49. Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
  50. Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 13 December 2019. [Google Scholar] [CrossRef]
Figure 1. Recall heatmaps summarizing classifier performance for the Discovery detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 1. Recall heatmaps summarizing classifier performance for the Discovery detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g001
Figure 2. Recall heatmaps summarizing classifier performance for the Credential Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 2. Recall heatmaps summarizing classifier performance for the Credential Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g002
Figure 3. Recall heatmaps summarizing classifier performance for the Privilege Escalation Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 3. Recall heatmaps summarizing classifier performance for the Privilege Escalation Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g003aElectronics 15 01291 g003b
Figure 4. Recall heatmaps summarizing classifier performance for the Exfiltration detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 4. Recall heatmaps summarizing classifier performance for the Exfiltration detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g004aElectronics 15 01291 g004b
Figure 5. Recall heatmaps summarizing classifier performance for the Lateral Movement detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 5. Recall heatmaps summarizing classifier performance for the Lateral Movement detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g005aElectronics 15 01291 g005b
Figure 6. Recall heatmaps summarizing classifier performance for the Resource Development detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 6. Recall heatmaps summarizing classifier performance for the Resource Development detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g006
Figure 7. Recall heatmaps summarizing classifier performance for the Defense Evasion detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 7. Recall heatmaps summarizing classifier performance for the Defense Evasion detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g007
Figure 8. Recall heatmaps summarizing classifier performance for the Initial Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 8. Recall heatmaps summarizing classifier performance for the Initial Access detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g008aElectronics 15 01291 g008b
Figure 9. Recall heatmaps summarizing classifier performance for the Persistence detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Figure 9. Recall heatmaps summarizing classifier performance for the Persistence detection task across GAN variants and experimental settings. Rows correspond to classifiers, and columns correspond to GAN variants. Panels show the four evaluated configurations: (a) r = 0.25, 400 epochs; (b) r = 0.50, 400 epochs; (c) r = 0.25, 800 epochs; (d) r = 0.50, 800 epochs.
Electronics 15 01291 g009
Table 1. Comparison of related work on class-imbalance handling in intrusion detection.
Table 1. Comparison of related work on class-imbalance handling in intrusion detection.
StudyYearTechniqueDatasetKey Contribution
Chawla et al. [1]2002SMOTEVariousIntroduced synthetic oversampling using interpolation between minority samples
He et al. [2]2008ADASYNVariousAdaptive oversampling focusing on difficult minority regions
Han et al. [30]2005Borderline-SMOTEVariousGenerates samples near decision boundaries to improve the minority
Insan et al. [26]2023SMOTE-LOF, Borderline-SMOTEVariousCombines oversampling with outlier detection to improve sample quality
Zhao et al. [34]2024Vanilla GAN, WGAN, cGANCIC-IDS2017GAN-based data augmentation to improve intrusion detection performance
Agrawal et al. [35]2024GAN-based generative learning (survey)NSL-KDD, UNSW-NB15, CICID2017Survey of GAN-based synthetic attack data generation
Ndayipfukamiye et al. [36]2025WGAN-GP, cGAN, Hybrid GANsVariousSystematic review of GAN-based adversarial defense techniques
This work2026Vanilla GAN, cGAN, WGAN, WGAN-GPUWF-ZeekData22Systematic ablation analysis of GAN architectures and augmentation strategies
Table 2. Architecture and hyperparameters used in the GAN ablation study.
Table 2. Architecture and hyperparameters used in the GAN ablation study.
CategoryParameterValue
GAN variantsModels evaluatedVanilla GAN, cGAN, WGAN, WGAN-GP
Network architectureTypeMultilayer perceptron
Hidden layersNumber2
Hidden units per layerSize128
Latent spaceDimension (z)32
Latent distributionNoise typeNormal
Data preprocessingFeature scaling[−1, 1]
RegularizationGenerator dropout0.3
RegularizationDiscriminator dropout0.3
TrainingBatch size64
TrainingEpochs400, 800
AugmentationRatios0.25, 0.50
EvaluationCross-validationStratified 5-fold
WGAN specificCritic updates5
WGAN specificWeight clipping0.01
WGAN-GP specificGradient penalty coefficient10
Table 3. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 3. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
DiscoveryGAN 51,968 118 194 199,806 50,004 2082 0 200,000 51,453 633 197 199,803 52,086 0 0 200,000 52,086 0 0 200,000
CGAN 52,010 76 132 199,868 49,999 2087 0 200,000 51,454 632 197 199,803 52,084 2 0 200,000 52,086 0 0 200,000
WGAN 51,995 91 77 199,923 50,005 2081 0 200,000 51,454 632 197 199,803 52,086 0 0 200,000 52,086 0 0 200,000
WGAN-GP 51,960 126 167 199,833 50,005 2081 0 200,000 51,454 632 197 199,803 52,083 3 0 200,000 52,086 0 0 200,000
Table 4. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 4. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
DiscoveryGAN 101,860 226 254 199,746 100,005 2081 0 200,000 101,455 631 188 199,812 102,086 0 0 200,000 102,086 0 0 200,000
CGAN 102,004 82 214 199,786 99,997 2089 0 200,000 101,455 631 188 199,812 102,085 1 0 200,000 102,086 0 0 200,000
WGAN 102,020 66 75 199,925 100,005 2081 0 200,000 101,455 631 188 199,812 102,085 1 0 200,000 102,086 0 0 200,000
WGAN-GP 101,952 134 276 199,724 100,005 2081 0 200,000 101,455 631 188 199,812 102,085 1 0 200,000 102,086 0 0 200,000
Table 5. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 5. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
DiscoveryGAN 52,008 78 76 199,924 50,005 2081 0 200,000 51,454 632 197 199,803 52,086 0 0 200,000 52,086 0 0 200,000
CGAN 51,969 117 76 199,924 49,906 2180 3 199,997 51,450 636 197 199,803 52,084 2 2 199,998 52,086 0 0 200,000
WGAN 52,026 60 123 199,877 50,005 2081 0 200,000 51,454 632 197 199,803 52,086 0 0 200,000 52,086 0 0 200,000
WGAN-GP 52,005 81 138 199,862 50,005 2081 0 200,000 51,454 632 197 199,803 52,086 0 0 200,000 52,086 0 0 200,000
Table 6. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 6. Discovery. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
DiscoveryGAN 101,951 135 182 199,818 100,005 2081 0 200,000 101,455 631 188 199,812 102,086 0 0 200,000 102,086 0 0 200,000
CGAN 102,016 70 129 199,871 99,859 2227 3 199,997 101,450 636 188 199,812 102,084 2 0 200,000 102,086 0 0 200,000
WGAN 101,978 108 140 199,860 100,005 2081 0 200,000 101,455 631 188 199,812 102,085 1 0 200,000 102,086 0 0 200,000
WGAN-GP 101,962 124 142 199,858 100,005 2081 0 200,000 101,455 631 188 199,812 102,085 1 0 200,000 102,086 0 0 200,000
Table 7. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 7. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Credential AccessGAN 49,723 308 143 199,857 50,000 31 8 199,992 50,021 10 12 199,988 50,031 0 0 200,000 50,031 0 0 200,000
CGAN 49,186 845 13 199,987 49,988 43 88 199,992 50,018 13 14 199,986 50,028 3 2 199,998 50,031 0 0 200,000
WGAN 49,957 74 4 199,996 50,000 31 10 199,990 50,017 14 14 199,986 50,029 2 0 200,000 50,031 0 0 200,000
WGAN-GP 49,950 81 6 199,994 50,000 31 10 199,990 50,019 12 14 199,986 50,029 2 0 200,000 50,031 0 0 200,000
Table 8. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 8. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Credential AccessGAN 99,712 319 190 199,810 99,999 32 10 199,990 100,023 8 14 199,986 100,031 0 0 200,000 100,031 0 0 200,000
CGAN 99,072 959 13 199,987 99,983 48 8 199,992 100,022 9 15 199,985 100,027 4 2 199,998 100,031 0 0 200,000
WGAN 99,963 68 5 199,995 100,000 31 10 199,990 100,018 13 13 199,987 100,030 1 0 200,000 100,031 0 0 200,000
WGAN-GP 99,935 96 9 199,991 99,999 32 10 199,990 100,020 11 13 199,987 100,030 1 0 200,000 100,031 0 0 200,000
Table 9. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 9. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Credential AccessGAN 49,362 669 742 199,258 49,991 40 8 199,992 50,012 19 7 199,993 50,031 0 0 200,000 50,031 0 0 200,000
CGAN 45,802 4229 8 199,992 49,095 936 70 199,930 49,598 433 33 199,967 50,029 2 0 200,000 50,031 0 0 200,000
WGAN 49,813 218 33 199,967 50,000 31 12 199,988 50,017 14 14 199,986 50,031 0 0 200,000 50,031 0 0 200,000
WGAN-GP 49,853 178 7 199,993 49,999 32 12 199,988 50,019 12 14 199,986 50,030 1 0 200,000 50,031 0 0 200,000
Table 10. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 10. Credential Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Credential AccessGAN 99,252 779 928 199,072 99,993 38 8 199,992 100,021 10 8 199,992 100,031 0 0 200,000 100,031 0 0 200,000
CGAN 93,821 6210 18 199,982 98,436 1595 181 199,819 99,586 445 93 199,907 100,031 0 0 200,000 100,031 0 0 200,000
WGAN 99,811 220 15 199,985 100,000 31 12 199,988 100,023 8 14 199,986 100,031 0 0 200,000 100,031 0 0 200,000
WGAN-GP 99,834 197 7 199,993 100,000 31 12 199,988 100,023 8 13 199,987 100,031 0 0 200,000 100,031 0 0 200,000
Table 11. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 11. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Privilege EscalationGAN 50,008 5 1 199,999 49,960 53 8 199,992 50,004 9 7 199,993 50,013 0 0 200,000 50,012 1 0 200,000
CGAN 50,004 9 0 200,000 49,982 31 10 199,990 50,002 11 9 199,991 50,013 0 1 199,999 50,013 0 0 200,000
WGAN 50,006 7 0 200,000 50,002 11 3 199,997 50,003 10 5 199,995 50,013 0 0 200,000 50,013 0 0 200,000
WGAN-GP 50,006 7 0 200,000 50,002 11 3 199,997 50,001 12 5 199,995 50,013 0 1 199,999 50,013 0 0 200,000
Table 12. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 12. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Privilege EscalationGAN 100,007 6 0 200,000 99,945 68 8 199,992 100,008 5 9 199,991 100,013 0 0 200,000 100,013 0 0 200,000
CGAN 100,001 12 0 200,000 99,961 52 10 199,990 100,008 5 12 199,988 100,012 1 0 200,000 100,013 0 0 200,000
WGAN 100,005 8 8 199,992 100,002 11 3 199,997 100,007 6 5 199,995 100,012 1 0 200,000 100,013 0 0 200,000
WGAN-GP 100,004 9 0 200,000 100,002 11 3 199,997 100,004 9 5 199,995 100,012 1 0 200,000 100,013 0 0 200,000
Table 13. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 13. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Privilege EscalationGAN 50,006 7 21 199,979 49,987 26 13 199,987 50,007 6 9 199,991 50,013 0 0 200,000 50,013 0 0 200,000
CGAN 49,968 48 19 199,981 49,789 224 12 199,988 49,987 26 9 199,991 50,013 0 1 199,999 50,013 0 0 200,000
WGAN 50,008 5 8 199,992 50,001 12 3 199,997 50,003 10 5 199,995 50,012 1 0 200,000 50,013 0 0 200,000
WGAN-GP 50,005 8 3 199,997 50,002 11 3 199,997 50,000 13 5 199,995 50,013 0 0 200,000 50,013 0 0 200,000
Table 14. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 14. Privilege Escalation. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Privilege EscalationGAN 100,006 7 8 199,992 99,967 46 13 199,987 100,008 5 12 199,988 100,013 0 0 200,000 100,013 0 0 200,000
CGAN 99,970 43 8 199,992 99,664 349 12 199,988 99,984 29 12 199,988 100,011 2 0 200,000 100,013 0 0 200,000
WGAN 100,006 7 0 200,000 100,002 11 3 199,997 100,006 7 5 199,995 100,013 0 0 200,000 100,013 0 0 200,000
WGAN-GP 100,003 10 0 200,000 100,002 11 3 199,997 100,004 9 5 199,995 100,013 0 0 200,000 100,013 0 0 200,000
Table 15. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 15. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
ExfiltrationGAN 50,000 7 6 199,994 50,000 7 7 199,993 50,002 5 7 199,993 50,007 0 0 200,000 50,007 0 0 200,000
CGAN 50,002 5 0 200,000 49,987 20 6 199,994 50,003 4 13 199,987 50,005 2 0 200,000 50,006 1 0 200,000
WGAN 50,003 4 23 199,977 50,001 6 3 199,997 50,004 3 5 199,995 50,005 2 1 199,999 50,007 0 0 200,000
WGAN-GP 50,002 5 30 199,970 50,001 6 3 199,997 50,004 3 5 199,995 50,005 2 0 199,999 50,007 0 0 200,000
Table 16. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 16. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
ExfiltrationGAN 100,001 6 31 199,969 100,001 6 7 199,993 99,998 9 7 199,993 100,007 0 0 200,000 100,007 0 0 200,000
CGAN 100,000 7 16 199,984 99,978 29 9 199,991 100,003 4 11 199,989 100,005 2 0 200,000 100,007 0 0 200,000
WGAN 100,001 6 10 199,990 100,001 6 3 199,997 100,004 3 5 199,995 100,007 0 3 199,997 100,007 0 0 200,000
WGAN-GP 100,001 6 14 199,986 100,001 6 3 199,997 100,004 3 5 199,995 100,007 0 3 199,997 100,007 0 0 200,000
Table 17. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 17. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
ExfiltrationGAN 50,000 7 7 199,993 49,946 61 10 199,990 50,004 3 13 199,987 50,007 0 0 200,000 50,007 0 0 200,000
CGAN 49,952 55 7 199,993 49,825 182 12 199,988 49,982 25 13 199,987 50,004 3 0 200,000 50,005 2 0 200,000
WGAN 49,997 10 7 199,993 49,987 20 3 199,997 49,998 9 9 199,991 50,005 2 1 199,999 50,007 0 0 200,000
WGAN-GP 50,003 4 28 199,972 50,000 7 3 199,997 50,003 4 5 199,995 50,006 1 0 200,000 50,007 0 0 200,000
Table 18. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 18. Exfiltration. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
ExfiltrationGAN 100,001 6 3 199,997 99,924 83 13 199,987 100,004 3 11 199,989 100,007 0 0 200,000 100,007 0 0 200,000
CGAN 99,952 55 3 199,997 99,694 313 15 199,985 99,963 44 11 199,989 100,006 1 0 200,000 100,007 0 0 200,000
WGAN 99,997 10 3 199,997 99,984 23 3 199,997 100,001 6 9 199,991 100,007 0 0 200,000 100,007 0 0 200,000
WGAN-GP 100,003 4 25 199,975 100,000 7 3 199,997 100,004 3 5 199,995 100,006 1 0 200,000 100,007 0 0 200,000
Table 19. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 19. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Lateral MovementGAN 49,920 84 345 199,655 49,974 30 12 199,988 49,998 0 12 199,988 50,004 0 0 200,000 50,004 0 0 200,000
CGAN 49,293 711 9 199,991 49,981 23 9 199,991 49,997 7 7 199,993 50,003 1 0 200,000 50,004 0 0 200,000
WGAN 49,930 74 11 199,989 50,000 4 10 199,990 49,995 9 7 199,993 50,004 0 0 200,000 50,004 0 0 200,000
WGAN-GP 49,942 62 7 199,993 50,000 4 6 199,994 49,995 9 7 199,993 50,004 0 0 200,000 50,004 0 0 200,000
Table 20. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 20. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Lateral MovementGAN 99,916 88 363 199,637 99,968 36 13 199,987 99,994 10 14 199,986 100,004 0 0 200,000 100,002 2 0 200,000
CGAN 99,087 917 62 199,938 99,967 37 9 199,991 99,995 9 10 199,990 100,003 1 0 200,000 100,002 2 0 200,000
WGAN 99,935 69 13 199,987 100,000 4 10 199,990 99,997 7 9 199,991 100,002 2 0 200,000 100,002 2 0 200,000
WGAN-GP 99,942 62 7 199,993 100,000 4 8 199,992 99,995 9 9 199,991 100,002 2 0 200,000 100,002 2 0 200,000
Table 21. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 21. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Lateral MovementGAN 49,683 321 371 199,629 49,943 61 9 199,991 49,996 8 4 199,996 50,004 0 0 200,000 50,004 0 0 200,000
CGAN 47,085 2919 83 199,917 49,335 669 54 199,946 49,724 280 12 199,988 50,003 1 0 200,000 50,004 0 0 200,000
WGAN 49,892 112 18 199,982 50,002 2 2 199,988 49,997 7 9 199,991 50,004 0 0 200,000 50,004 0 0 200,000
WGAN-GP 49,920 84 138 199,862 50,001 3 10 199,990 49,995 9 9 199,991 50,004 0 0 200,000 50,004 0 0 200,000
Table 22. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 22. Lateral Movement. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Lateral MovementGAN 99,799 205 633 199,367 99,930 74 8 199,992 99,988 16 7 199,993 100,004 0 0 200,000 100,002 2 0 200,000
CGAN 95,651 4353 149 199,851 98,859 1145 166 199,834 99,758 246 19 199,981 100,003 1 0 200,000 100,002 2 0 200,000
WGAN 99,839 165 277 199,723 100,002 2 12 199,988 99,999 5 12 199,988 100,004 0 0 200,000 100,002 2 0 200,000
WGAN-GP 99,884 120 45 199,955 100,001 3 10 199,990 99,998 6 12 199,988 100,004 0 0 200,000 100,002 2 0 200,000
Table 23. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 23. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Resource DevelopmentGAN 49,887 116 12 199,988 49,991 12 8 199,992 50,000 3 10 199,990 50,003 0 0 200,000 50,001 2 0 200,000
CGAN 49,124 879 13 199,987 49,977 26 8 199,992 49,996 7 9 199,991 50,001 2 0 200,000 50,000 3 0 200,000
WGAN 49,948 55 12 199,988 49,999 4 10 199,990 49,999 4 10 199,990 50,001 2 0 200,000 50,002 1 0 200,000
WGAN-GP 49,952 51 10 199,990 50,000 3 10 199,990 49,999 4 10 199,990 50,000 3 0 200,000 50,002 1 0 200,000
Table 24. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 24. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Resource DevelopmentGAN 99,858 145 13 199,987 99,991 12 12 199,988 99,996 7 12 199,988 100,003 0 0 200,000 100,002 1 0 200,000
CGAN 98,982 1021 15 199,985 99,975 28 8 199,992 99,996 7 12 199,988 100,003 0 0 200,000 100,001 2 0 200,000
WGAN 99,947 56 13 199,987 99,998 5 10 199,990 99,998 5 8 199,992 100,003 0 0 200,000 100,002 1 0 200,000
WGAN-GP 99,931 72 12 199,988 100,000 3 10 199,990 99,997 6 8 199,992 100,003 0 0 200,000 100,002 1 0 200,000
Table 25. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 25. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Resource DevelopmentGAN 49,290 713 444 199,556 49,850 153 9 199,991 49,957 46 8 199,992 50,003 0 0 200,000 50,003 0 0 200,000
CGAN 46,672 3331 15 199,985 49,835 168 10 199,990 49,886 117 8 199,992 50,002 1 0 200,000 50,002 1 0 200,000
WGAN 49,882 121 83 199,917 49,999 4 12 199,988 49,998 5 10 199,990 50,003 0 0 200,000 50,002 1 0 200,000
WGAN-GP 49,870 133 135 199,865 49,999 4 12 199,988 49,995 8 10 199,990 50,001 2 0 200,000 50,002 1 0 200,000
Table 26. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 26. Resource Development. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Resource DevelopmentGAN 98,940 1063 444 199,556 99,795 208 11 199,989 99,950 53 153 199,985 100,003 0 0 200,000 100,003 0 0 200,000
CGAN 96,012 3991 27 199,973 99,784 219 11 199,989 99,838 165 9 199,991 100,003 0 0 200,000 100,002 1 0 200,000
WGAN 99,897 106 130 199,870 99,999 4 12 199,988 99,999 4 10 199,990 100,003 0 0 200,000 100,002 1 0 200,000
WGAN-GP 99,869 134 146 199,854 100,000 3 12 199,988 99,998 5 12 199,988 100,003 0 0 200,000 100,002 1 0 200,000
Table 27. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 27. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Defense EvasionGAN 49,999 2 0 200,000 50,001 0 0 200,000 50,001 0 3 199,997 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 49,165 836 18 199,982 49,990 11 8 199,992 49,994 7 10 199,990 50,001 0 0 200,000 50,000 1 0 200,000
WGAN 49,996 5 2 199,998 50,001 0 6 199,994 50,000 1 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
WGAN-GP 49,998 3 5 199,995 50,001 0 6 199,994 49,998 3 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
Table 28. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 28. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Defense EvasionGAN 99,998 3 0 200,000 100,001 0 0 200,000 100,001 0 5 199,995 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 99,025 976 16 199,984 99,988 13 8 199,992 100,000 1 10 199,990 100,000 1 0 200,000 100,000 1 0 200,000
WGAN 99,998 3 4 199,996 100,001 0 6 199,994 100,000 1 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
WGAN-GP 99,996 5 5 199,995 100,001 0 6 199,994 100,001 0 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
Table 29. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 29. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Defense EvasionGAN 50,000 1 0 200,000 50,001 0 0 200,000 50,001 0 3 199,997 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 45,570 4431 611 199,389 49,368 633 13 199,987 49,492 509 24 199,976 49,999 2 0 200,000 50,000 1 0 200,000
WGAN 49,996 5 0 200,000 50,000 1 4 199,996 50,000 1 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
WGAN-GP 49,999 2 0 200,000 50,000 1 6 199,994 50,001 0 7 199,993 50,001 0 0 200,000 50,000 1 0 200,000
Table 30. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 30. Defense Evasion. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Defense EvasionGAN 100,000 1 0 200,000 100,001 0 0 200,000 100,001 0 3 199,997 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 94,160 5841 541 199,459 98,909 1092 132 199,868 99,328 673 32 199,968 99,999 2 0 200,000 100,000 1 0 200,000
WGAN 99,997 4 0 200,000 100,001 0 5 199,995 100,001 0 10 199,990 100,001 0 0 200,000 100,000 1 0 200,000
WGAN-GP 99,998 3 0 200,000 100,001 0 6 199,994 100,001 0 10 199,990 100,001 0 0 200,000 100,000 1 0 200,000
Table 31. Initial Access. Feature Sscaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 31. Initial Access. Feature Sscaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Initial AccessGAN 49,996 5 0 200,000 50,001 0 0 200,000 50,001 0 4 199,996 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 48,943 1058 133 199,867 49,965 36 8 199,992 49,989 12 9 199,991 50,000 10 0 200,000 50,000 1 0 200,000
WGAN 49,997 4 5 199,995 50,001 0 6 199,994 49,999 2 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
WGAN-GP 49,997 4 4 199,996 50,001 0 6 199,994 49,999 2 5 199,995 50,000 1 0 200,000 50,000 1 0 200,000
Table 32. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 32. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Initial AccessGAN 99,999 2 0 200,000 100,001 0 0 200,000 100,001 0 6 199,994 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 98,777 1224 19 199,981 99,957 44 9 199,991 99,992 0 11 199,989 100,000 1 0 200,000 100,000 1 0 200,000
WGAN 99,999 2 7 199,993 100,001 0 6 199,994 99,999 2 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
WGAN-GP 99,996 5 4 199,996 100,001 0 6 199,994 100,000 1 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
Table 33. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 33. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Initial AccessGAN 50,000 1 0 200,000 50,001 0 0 200,000 50,001 0 3 199,997 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 45,099 4902 22 199,978 49,074 927 37 199,963 49,487 514 29 199,971 50,000 1 0 200,000 50,000 1 0 200,000
WGAN 49,999 2 0 200,000 50,001 0 0 199,994 50,001 0 7 199,993 50,000 1 0 200,000 50,000 1 0 200,000
WGAN-GP 49,998 3 0 200,000 50,001 0 0 200,000 50,001 0 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
Table 34. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 34. Initial Access. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
Initial AccessGAN 100,000 1 0 200,000 100,001 0 0 200,000 100,001 0 0 200,000 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 93,595 6406 19 199,981 98,347 1654 153 199,847 99,573 428 54 199,946 99,999 2 0 200,000 99,999 2 0 200,000
WGAN 100,000 1 0 200,000 100,000 1 6 199,994 99,999 2 11 199,989 100,000 1 0 200,000 100,000 1 0 200,000
WGAN-GP 99,999 2 0 200,000 100,001 0 3 199,997 100,001 0 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
Table 35. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 35. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
PersistenceGAN 49,998 3 0 200,000 50,001 0 0 200,000 50,001 0 4 199,996 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 48,829 1172 19 199,981 49,973 28 8 199,992 49,996 5 12 199,988 49,999 2 0 200,000 50,000 1 0 200,000
WGAN 49,995 6 0 200,000 50,001 0 6 199,994 49,998 3 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
WGAN-GP 49,996 5 4 199,996 50,001 0 4 199,996 50,001 0 4 199,996 50,001 0 0 200,000 50,000 1 0 200,000
Table 36. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 36. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 400|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
PersistenceGAN 99,996 5 0 200,000 100,001 0 0 200,000 100,001 0 6 199,994 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 98,653 1348 127 199,873 99,969 32 8 199,992 99,992 9 14 199,986 100,001 0 0 200,000 100,001 0 0 200,000
WGAN 99,998 3 4 199,996 100,001 0 6 199,994 100,000 1 4 199,993 100,001 0 0 200,000 100,000 1 0 200,000
WGAN-GP 99,998 3 5 199,995 100,001 0 5 199,995 100,000 1 6 199,994 100,001 0 0 200,000 100,000 1 0 200,000
Table 37. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Table 37. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.25.
Logistic RegressionSVMKNNDecision TreeRandom Forest
PersistenceGAN 49,997 4 0 200,000 50,001 0 0 200,000 50,001 0 3 199,997 50,001 0 0 200,000 50,000 1 0 200,000
CGAN 47,792 2209 11 199,989 49,865 136 9 199,991 49,919 82 8 199,992 50,000 1 0 200,000 50,000 1 0 200,000
WGAN 49,996 5 0 200,000 50,001 0 6 199,994 50,000 1 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
WGAN-GP 49,999 2 0 200,000 50,001 0 4 199,996 50,001 0 5 199,995 50,001 0 0 200,000 50,000 1 0 200,000
Table 38. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Table 38. Persistence. Feature scaling: [−1, 1]|Noise sensitivity(z): 32|Noise mode: normal|Epochs: 800|Batch size: 64|G dropout: 0.3|D dropout: 0.3|Ratio: 0.50.
Logistic RegressionSVMKNNDecision TreeRandom Forest
PersistenceGAN 99,999 2 0 200,000 100,001 0 0 200,000 100,001 0 3 199,997 100,001 0 0 200,000 100,000 1 0 200,000
CGAN 97,409 2592 14 199,986 99,790 211 14 199,986 99,878 123 9 199,991 99,999 2 0 200,000 100,000 1 0 200,000
WGAN 99,998 3 0 200,000 100,001 0 6 199,994 100,001 0 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
WGAN-GP 99,998 3 0 200,000 100,001 0 6 199,994 100,001 0 7 199,993 100,001 0 0 200,000 100,000 1 0 200,000
Table 39. Ablation training time per variant.
Table 39. Ablation training time per variant.
VariantTime (min) *
GAN26
cGAN35
WGAN29
WGAN-GP29
* The reported time is the wall-clock execution time for a single run, measured directly from the notebook cell execution time. No averaging across multiple runs was performed.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Debelie, A.; Bagui, S.S.; Bagui, S.C.; Mink, D. A Systematic Ablation Study of GAN-Based Minority Augmentation for Intrusion Detection on UWF-ZeekData22. Electronics 2026, 15, 1291. https://doi.org/10.3390/electronics15061291

AMA Style

Debelie A, Bagui SS, Bagui SC, Mink D. A Systematic Ablation Study of GAN-Based Minority Augmentation for Intrusion Detection on UWF-ZeekData22. Electronics. 2026; 15(6):1291. https://doi.org/10.3390/electronics15061291

Chicago/Turabian Style

Debelie, Asfaw, Sikha S. Bagui, Subhash C. Bagui, and Dustin Mink. 2026. "A Systematic Ablation Study of GAN-Based Minority Augmentation for Intrusion Detection on UWF-ZeekData22" Electronics 15, no. 6: 1291. https://doi.org/10.3390/electronics15061291

APA Style

Debelie, A., Bagui, S. S., Bagui, S. C., & Mink, D. (2026). A Systematic Ablation Study of GAN-Based Minority Augmentation for Intrusion Detection on UWF-ZeekData22. Electronics, 15(6), 1291. https://doi.org/10.3390/electronics15061291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop