6.1. Downstream Classifier Performance
To evaluate how different adversarial objectives affect classifier performance, each GAN variant (Vanilla GAN, cGAN, WGAN, and WGAN-GP) was trained separately for each minority ATT&CK tactic and used to augment the training set.
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10 summarize the resulting confusion matrices for all five classifiers. Each entry presents the aggregated confusion matrix values (TPs, FNs, FPs, and TNs) across all cross-validation folds, using the same preprocessing, data splits, and classifier hyperparameters as described in
Section 5.
6.1.1. Discovery
The results for the Discovery tactic (
Table 3) reveal clear differences in classifier behavior across adversarial training objectives. Because each GAN variant is trained independently on Discovery samples prior to classifier evaluation and all downstream models are evaluated using identical preprocessing and stratified cross-validation splits, observed performance differences can be attributed to how each adversarial objective shapes the synthetic minority-class distribution.
For logistic regression, all GAN variants substantially improve minority-class recognition; however, notable differences emerge across objectives. The Vanilla GAN and cGAN achieve the most balanced performance, exhibiting low counts of false negatives and false positives. Specifically, the Vanilla GAN produces 79 false negatives and 90 false positives, whereas the cGAN achieves the lowest false-negative count (48) and maintains a low false-positive rate (78). This suggests that label conditioning enables more accurate reconstruction of minority-class decision boundaries, improving recall without sacrificing precision. In contrast, WGAN introduces a higher number of false positives (138) despite comparable recall, suggesting that Wasserstein-based training may produce broader or more diffuse feature distributions that reduce linear separability. WGAN-GP further degrades performance by increasing both error types (115 false negatives and 100 false positives), suggesting that gradient-penalized regularization may oversmooth the minority distribution under extreme sparsity, leading to reduced boundary sharpness.
For SVM, the Vanilla GAN, cGAN, and WGAN yield near-identical performances, with minimal false positives and comparable false-negative counts. Once the minority class is sufficiently densified through augmentation, the SVM’s margin-based decision function becomes largely insensitive to the specific adversarial objective. This indicates that, beyond a certain density threshold, improvements in synthetic sample quality have a diminishing impact on margin-based classifiers. In contrast, WGAN-GP exhibits substantially poorer performance, producing 1026 false positives and 6864 false negatives, indicating severe degradation in margin separability. This behavior suggests that gradient-penalized training distorts class boundaries when the underlaying minority distribution is highly sparse.
A similar pattern was observed for KNN. The Vanilla GAN, cGAN, and WGAN produce nearly identical results, with false-negative counts between 640 and 647 and 197 false positives in each case. This reflects KNN’s reliance on local neighborhood structure, where sufficiently clustered synthetic samples produce stable classification behavior regardless of the generative objective. However, WGAN-GP introduces additional errors, yielding 279 false positives and 1749 false negatives, suggesting reduced neighborhood purity. This suggests that overly smooth synthetic distributions can blur local structure, negatively impacting distance-based classifiers.
For the Decision Tree and Random Forest classifiers, no misclassifications were observed across all GAN variants. This indicates that once the Discovery class is sufficiently represented, tree-based models effectively isolate the minority class through hierarchical partitioning. Their invariance across GAN architectures suggests that these models rely primarily on feature-level separability rather than on fine-grained distributional differences in the synthetic data.
Overall, the Discovery results reveal a consistent pattern: the choice of adversarial objective primarily affects linear and distance-based classifiers, while tree-based models remain robust once sufficient minority representation is achieved. Among the evaluated approaches, cGAN provides the most balanced improvement in recall and precision, suggesting that conditional generation is particularly effective in preserving class-specific structure under moderate sparsity. In contrast, WGAN-GP introduces significant degradation for non-tree classifiers, suggesting that strong regularization can be detrimental when training data is extremely limited. These findings highlight a key trade-off between distribution smoothness and the preservation of discriminative structure, which is critical for effective minority-class augmentation in intrusion detection systems.
6.1.2. Credential Access
The results for Credential Access (
Table 4) demonstrate pronounced performance differences across GAN variants, particularly for classifiers sensitive to severe class imbalance. Because each GAN architecture is trained independently on Credential Access samples prior to classifier evaluation and all downstream models are evaluated under identical preprocessing and stratified cross-validation conditions, observed differences reflect how each adversarial objective shapes the synthetic minority-class distribution.
In logistic regression, conditioning provides a substantial advantage. The Vanilla GAN produces numerous errors, with 3374 false negatives and 2545 false positives, indicating poor alignment between the synthetic samples and a linear decision boundary. In contrast, the cGAN sharply reduces both error types, yielding only 90 false negatives and 158 false positives. This indicates that label conditioning enables the generator to better capture class-specific feature distributions, thereby significantly improving linear separability under extreme imbalance. WGAN exhibits a performance nearly identical to that of cGAN, suggesting that once minority structure is sufficiently captured, the benefit of the Wasserstein objective is marginal for linear classifiers. By comparison, WGAN-GP performs substantially worse, producing the highest false-positive count (3155) and an elevated number of false negatives (4376), indicating that gradient-penalized regularization may overly smooth the synthetic distribution, increasing overlap between minority samples and benign traffic and degrading the clarity of the decision boundary.
For SVM and KNN, both cGAN and WGAN achieve almost error-free classification, with only a small number of misclassifications (≤5 false positives and ≤42 false negatives). This suggests that once the minority class is sufficiently densified, both margin-based (SVM) and distance-based (KNN) classifiers reach a performance plateau, beyond which further improvements in generative fidelity have limited impact. In contrast, WGAN-GP again degrades performance, producing 609 FPs for SVM and 152 FPs for KNN, along with increased false negatives, indicating reduced class separability and distortion of local neighborhood structure under gradient-penalized training.
For Decision Tree and Random Forest classifiers, no misclassifications were observed across all GAN variants. Once Credential Access samples are augmented to sufficient density, tree-based models robustly isolate the minority class through hierarchical splitting and remain largely invariant to differences in adversarial objectives. This robustness suggests that tree-based models rely primarily on feature-level partitioning rather than precise distributional fidelity, making them less sensitive to variations in synthetic data generation. As in the Discovery case, this behavior reflects effective class separability rather than data leakage, because synthetic samples are generated exclusively from the training data and the trained generators are reused across fixed cross-validation splits.
Overall, the Credential Access results indicate that conditional generation yields the most consistent improvements for linear classifiers, highlighting the importance of incorporating class-specific information when modeling highly sparse attack behaviors. While WGAN performs comparably under this setting, WGAN-GP introduces systematic performance degradation across multiple models, reinforcing the observation that strong regularization can be detrimental when training data is extremely limited. Tree-based ensemble methods remain largely unaffected by the choice of GAN once class imbalance is mitigated, demonstrating that sufficient minority representation is often more critical than the specific generative objective for these models.
6.1.3. Privilege Escalation
The results for Privilege Escalation (
Table 5) indicate that the choice of adversarial objective has a measurable impact on linear and distance-based classifiers, while tree-based models remain largely unaffected once sufficient minority-class augmentation is achieved. As in previous experiments, all GAN variants were trained independently on Privilege Escalation samples prior to classifier evaluation, and all downstream models were evaluated under identical preprocessing and cross-validation conditions.
For Logistic Regression, the Vanilla GAN yields the most favorable balance between recall and precision, with only 27 false negatives and 15 false positives. In comparison, both cGAN and WGAN maintain a higher false-negative count (130) and introduce additional false positives (71), indicating a modest reduction in precision relative to the Vanilla GAN. This suggests that, for this tactic, simpler adversarial objectives may better preserve the original minority-class structure, whereas conditioning or Wasserstein-based training may introduce slight distributional shifts that reduce linear separability. WGAN-GP performs substantially worse, producing 5762 false negatives and 3533 false positives, indicating that gradient-penalized regularization significantly increases class overlap and degrades boundary definition under sparse conditions.
A similar pattern was observed for SVM. The Vanilla GAN achieves lower error rates, with 136 false negatives and 15 false positives. In contrast, both cGAN and WGAN exhibit increased error counts, producing 1561 false negatives and 566 false positives. This behavior suggests that margin-based classifiers are sensitive to subtle distortions in the synthetic feature distribution, where even small shifts introduced by conditioning or Wasserstein objectives can adversely affect margin placement. WGAN-GP again underperforms, generating a sharp rise in misclassifications (8666 false negatives and 927 false positives), indicating severe degradation in margin separability due to overly smoothed or distorted synthetic samples.
For KNN, the Vanilla GAN yields the lowest error rates, with only 5 false negatives and 13 false positives, suggesting that it preserves local neighborhood structure well. Both cGAN and WGAN introduce a modest increase in error, while WGAN-GP substantially degrades performance, yielding 1528 false negatives and 467 false positives. This pattern indicates that local neighborhood integrity is best preserved under simpler generative objectives, while more complex regularization can blur local structure and reduce neighborhood purity.
For Decision Tree and Random Forest, classification performance is effectively perfect across all GAN variants. Decision Tree has at most a single false positive for WGAN and none for the other variants, while Random Forest achieves zero false positives and zero false negatives across all cases. This confirms that tree-based ensemble models are robust to variations in synthetic data distribution, as long as key discriminative feature thresholds are preserved. As in prior tactics, this behavior reflects effective class separability rather than data leakage, as synthetic samples are generated exclusively from training data and evaluated using fixed cross-validation splits.
Overall, for Privilege Escalation, the Vanilla GAN provides the most consistent augmentation for linear and distance-based classifiers under the evaluated configuration. While cGAN and WGAN preserve reasonable recall, they introduce additional classification errors relative to the Vanilla GAN, and WGAN-GP consistently underperforms. These results highlight that increased model complexity does not necessarily translate to improved performance under extreme data sparsity and may instead introduce instability or unnecessary smoothing. Tree-based classifiers remain robust across all GANs once class imbalance is mitigated, reinforcing the observation that sufficient minority representation is more critical than the specific adversarial objective for these models.
6.1.4. Exfiltration
The results for Exfiltration (
Table 6) demonstrate that classifier performance is strongly influenced by the adversarial objective, particularly for linear and distance-based models under extreme class imbalance. As in previous experiments, all GAN variants were trained independently on Exfiltration samples prior to classifier evaluation and evaluated under identical preprocessing and stratified cross-validation conditions.
For logistic regression, the vanilla GAN yields a reasonable balance between recall and precision, producing 115 false negatives and 36 false positives. Introducing conditional generation via cGAN reduces false positives to 16 but increases false negatives to 215, indicating a trade-off in which precision improves at the expense of recall. This suggests that conditioning introduces a more conservative decision boundary that reduces false alarms but may fail to capture the full diversity of minority-class patterns. The WGAN further reduces false positives to 6, but at a substantial cost to recall, yielding 942 false negatives. This behavior suggests that Wasserstein-based training may prioritize boundary sharpness over coverage of the minority class, leading to the underrepresentation of rare patterns. In contrast, WGAN-GP achieves the lowest false-negative count (7) but introduces 126 false positives, indicating a strong recall bias accompanied by increased overlap with benign traffic due to oversmoothing of the synthetic distribution.
For SVM, the Vanilla GAN exhibits limited recall, producing 1274 false negatives and 520 false positives. Both cGAN and WGAN substantially improve recall relative to Vanilla GAN; cGAN reduces false negatives to 604, while WGAN results in 6101 false negatives, indicating weaker recall compared to cGAN. However, cGAN retains a higher false-positive count (126), whereas WGAN yields a much cleaner decision boundary with only 15 false positives. This highlights a clear trade-off between margin sensitivity and boundary precision, with cGAN improving minority-class detection while WGAN favors stricter separation at the cost of missed detection. In contrast, WGAN-GP performs poorly, producing 18,511 false negatives and 456 false positives, indicating severe distortion of margin structure and instability under gradient-penalized training.
A similar pattern was observed for KNN, which is sensitive to local neighborhood density. The Vanilla GAN performs well, with 169 false negatives and 17 false positives, while cGAN improves both recall and precision, reducing false negatives to 70 and false positives to 11. This suggests that conditional generation effectively enhances local cluster coherence for this tactic. In contrast, WGAN increases false negatives to 550 and false positives to 186, and WGAN-GP further degrades performance, yielding 710 false negatives and 489 false positives. These results indicate that overly smooth or diffuse synthetic distributions disrupt local neighborhood structure, reducing classification reliability for distance-based methods.
For Decision Trees and Random Forests classifiers, no misclassifications were observed for Vanilla GAN, cGAN, and WGAN, and only a single false positive was observed for WGAN-GP in the Decision Tree model. Random Forests achieved zero false positives and zero false negatives across all GAN variants. This confirms that tree-based models are robust to variations in synthetic data distribution once sufficient minority representation is achieved, relying primarily on feature thresholding rather than distributional precision. As in prior tactics, tree-based classifiers remain largely invariant in the choice of adversarial objective once sufficient minority-class augmentation is achieved.
Overall, the Exfiltration results indicate that no single GAN objective dominates across all classifiers. cGAN and WGAN improve recall for margin- and distance-based models, with WGAN producing the cleanest decision boundaries for SVM. WGAN-GP favors recall at the expense of precision and consistently degrades performance under the evaluated configuration. These findings highlight a fundamental trade-off between minority-class coverage and boundary sharpness, where different GAN objectives emphasize different aspects of the data distribution. Tree-based classifiers remain robust across GAN choices, underscoring that adversarial objective selection is most critical for linear and neighborhood-based models in highly imbalanced cybersecurity detection tasks.
6.1.5. Lateral Movement
The results for Lateral Movement (
Table 7) show pronounced differences across adversarial objectives, particularly for linear and distance-based classifiers, reflecting the attack’s extreme sparsity in the original dataset. As in prior experiments, all GAN variants were trained independently on Lateral Movement samples prior to classifier evaluation and evaluated under identical preprocessing and stratified cross-validation conditions.
For Logistic Regression, the Vanilla GAN yields relatively weak performance, producing 353 false negatives and 649 false positives, indicating poor linear separability despite augmentation. This suggests that the Vanilla GAN is unable to adequately capture the underlaying structure of extremely sparse minority samples. In contrast, both cGAN and WGAN substantially improve performance, reducing false negatives to six and false positives to two. These results indicate that incorporating either conditioning or the Wasserstein objective significantly enhances the alignment of synthetic samples with the true minority-class distribution under extreme sparsity. In comparison, WGAN-GP degrades performance, yielding 303 false negatives and 1059 false positives, indicating that gradient-penalized regularization may introduce excessive smoothing, leading to increased class overlap and reduced separability.
A similar pattern was observed for SVM. The Vanilla GAN exhibits limited recall, producing 1600 false negatives and 121 false positives. Both cGAN and WGAN achieve very low error rates, reducing false negatives to six and false positives to two, indicating strong margin separability once the minority class is sufficiently densified. This demonstrates that for extremely rare attack types, improving minority density substantially impacts margin-based classifiers, provided that synthetic samples preserve class structure. In contrast, WGAN-GP performs substantially worse, yielding 3679 false negatives and 1755 false positives, indicating severe distortion of the margin due to overly smoothed or poorly structured synthetic data.
For KNN, the Vanilla GAN produces moderate errors, with 46 false negatives and 47 false positives. Both cGAN and WGAN achieve almost error-free classification, with only two false negatives and four false positives, indicating well-preserved local neighborhood structure. This suggests that both conditioning and Wasserstein objectives effectively reconstruct local feature relationships when sufficient structure is learned. WGAN-GP again underperforms, producing 25 false negatives and 45 false positives, consistent with degraded local neighborhood coherence due to overly diffuse synthetic samples.
For the Decision Tree classifier, no misclassifications were observed across all GAN variants. Random Forest similarly achieves very low error rates, with only two false negatives for the Vanilla GAN, cGAN, and WGAN and no misclassifications under WGAN-GP. This further confirms that tree-based models are resilient to variation in synthetic data distributions, as long as key feature thresholds defining the minority class are preserved. As in prior tactics, tree-based models remain largely invariant to the choice of adversarial objective once sufficient minority-class augmentation is achieved.
Overall, the Lateral Movement results indicate that cGAN and WGAN are particularly effective at stabilizing classification performance for extremely rare attack types, especially for linear and distance-based classifiers. In contrast, the Vanilla GAN provides insufficient augmentation in this setting, and WGAN-GP consistently underperforms under the evaluated configuration. These findings highlight that, under extreme sparsity, the ability to accurately reconstruct minority-class structure is more critical than enforcing strong regularization or distributional smoothness. Tree-based classifiers remain robust regardless of the choice of GAN once class imbalance is mitigated.
6.1.6. Resource Development
The results for Resource Development (
Table 8) show that classifier performance is strongly influenced by the choice of adversarial objective, particularly for linear and margin-based classifiers, reflecting the extreme sparsity of this tactic in the original dataset. As in prior experiments, all GAN variants were trained independently on Resource Development samples before classifier evaluation and were evaluated under identical preprocessing and stratified cross-validation conditions.
For logistic regression, the Vanilla GAN yields moderate performance, producing 28 false negatives and 89 false positives, indicating limited linear separability even after augmentation. This suggests that the Vanilla GAN struggles to adequately reconstruct the minority-class feature distribution under extreme sparsity. In contrast, both cGAN and WGAN substantially improve performance, reducing false negatives to one and false positives to zero. These results indicate that conditioning and the Wasserstein-based objectives enable more accurate modeling of sparse class-specific structure, leading to near-perfect linear separability. In comparison, WGAN-GP performs substantially worse, yielding 4489 false negatives and 1572 false positives, indicating that gradient-penalized regularization introduces excessive smoothing and increases overlap between minority and majority classes.
A similar pattern was observed for SVM. The Vanilla GAN exhibits limited recall, with 505 false negatives and 37 false positives. Both cGAN and WGAN markedly improve performance, reducing false negatives to 14 and false positives to 6, yielding very low error rates. This demonstrates that margin-based classifiers benefit significantly from improved minority density when synthetic samples preserve structural consistency. In contrast, WGAN-GP again underperforms, generating 8572 false negatives and 293 false positives, indicating severe degradation of margin separability due to distorted synthetic distributions.
For KNN, the Vanilla GAN already performs well, with 20 false negatives and 6 false positives. Both cGAN and WGAN further reduce classification error, producing only one false negative and two false positives, indicating improved neighborhood purity. This suggests that both conditioning and Wasserstein objectives enhance local cluster coherence for this tactic. WGAN-GP introduces noticeably higher errors, with 49 false negatives and 82 false positives, indicating distortion of local density relationships due to overly smooth or diffuse synthetic samples.
For Decision Tree and Random Forest classifiers, no misclassifications were observed across all GAN variants. Once the Resource Development class is sufficiently augmented, tree-based models robustly isolate the minority class through hierarchical splitting and remain largely invariant to the choice of adversarial objective. This confirms that tree-based models depend primarily on feature-threshold separability rather than on precision distributional modeling. As in previous tactics, this behavior reflects effective class separability rather than data leakage.
Overall, the Resource Development results indicate that cGAN and WGAN provide the most consistent improvements for extremely rare attack types, particularly for linear, margin-based, and distance-based classifiers. The Vanilla GAN shows limited improvement under severe sparsity, whereas WGAN-GP consistently underperforms across the evaluated configurations. These findings reinforce that accurate reconstruction of minority-class structure is more critical than increased model complexity or regularization strength in highly sparse cybersecurity settings. Tree-based classifiers remain robust regardless of the choice of GAN once class imbalance is mitigated.
6.1.7. Defense Evasion
The results for Defense Evasion (
Table 9) show consistent, strong classification performance across all GAN variants. Given the extreme sparsity of this tactic in the original dataset, synthetic augmentation appears sufficient to produce a learnable minority-class representation regardless of the specific adversarial objective employed. As in prior experiments, all GAN variants were trained independently on Defense Evasion samples and evaluated under identical preprocessing and cross-validation conditions.
For logistic regression, all GAN variants achieve very low error rates. The Vanilla GAN, cGAN, and WGAN each produce only one false negative and zero false positives, indicating strong linear separability following augmentation. WGAN-GP introduces a slight degradation, with two false negatives and zero false positives, but overall performance remains high. This suggests that, for this tactic, even simple generative models are sufficient to reconstruct the minority-class structure once minimal representation is achieved. These results suggest that for this tactic, linear classifiers primarily benefit from class densification rather than subtle differences in adversarial objectives.
For SVM, classification is effectively error-free across all GAN variants. The Vanilla GAN and WGAN-GP achieve zero false negatives and zero false positives, while cGAN and WGAN introduce only two false positives each, with no false negatives. This indicates that margin-based classifiers reach a saturation point once a minimal level of minority density is achieved, beyond which improvements in generative modeling provide negligible benefit. This suggests that once the minority class is synthetically densified, margin-based classifiers become largely insensitive to the choice of GAN architecture.
A similar pattern was observed for KNN. The Vanilla GAN and WGAN-GP produce three false positives and no false negatives, while cGAN introduces eight false positives with no false negatives. WGAN yields the highest error among the variants for KNN, producing 80 false positives and 1 false negative, indicating some degradation in local neighborhood structure. This suggests that while overall performance remains high, certain adversarial objectives may still introduce minor distortions in local density, though these effects are insufficient to significantly affect classification outcomes. Despite this, overall performance remains strong across all models.
For Decision Tree classifiers, no misclassifications were observed for any GAN variant. Random Forest also achieves near-error-free performance: both the Vanilla GAN and WGAN introduce a single false negative, while cGAN and WGAN-GP achieve zero misclassifications. As in previous tactics, tree-based classifiers remain largely invariant to the adversarial objective once the minority-class augmentation is achieved. This further reinforces that tree-based models rely on clear feature thresholds rather than precise distributional fidelity.
Overall, the Defense Evasion result indicates that all GAN variants are effective at stabilizing classifier performance for this extremely rare attack type once basic class balance is achieved. Differences among adversarial objectives are marginal and classifier- dependent, with tree-based models exhibiting consistent robustness across all augmentation strategies. These findings suggest that, for certain attack types, the primary challenge lies in achieving sufficient representation rather than optimizing the generative objective, highlighting diminishing returns in model complexity once separability is established.
6.1.8. Initial Access
The results for Initial Access (
Table 10) show consistently strong classification performance across all GAN variants. Even with minimal synthetic augmentation, all classifiers achieve low error rates, indicating that this attack type is readily learned once class imbalance is mitigated. As in prior experiments, all GAN variants were trained independently on Initial Access samples and evaluated under identical preprocessing and cross-validation conditions.
For logistic regression, all GAN variants perform exceptionally well. The Vanilla GAN and WGAN-GP each produce one false negative and zero false positives, while cGAN and WGAN achieve error-free classification with zero false negatives and zero false positives. These results indicate strong linear separability of the augmented minority class across all adversarial objectives. This suggests that the underlying feature distribution of this tactic is inherently well-structured, requiring only minimal augmentation to become linearly separable.
For SVM, performance is error-free across all GAN variants, with zero false negatives and zero false positives. Once synthetic augmentation is applied, margin-based classification appears insensitive to the choice of adversarial objective for this tactic. This indicates that the classifier reaches a saturation point, beyond which further improvements in synthetic data quality do not translate into measurable performance gains.
A similar trend was observed for KNN. The Vanilla GAN and WGAN-GP produce only three false positives and no false negatives, while cGAN and WGAN introduce slightly more false positives (seven) but still no false negatives. These differences are minor, and the overall performance remains high across all variants. This suggests that the local neighborhood structures are sufficiently well-defined after augmentation, making KNN robust to minor variations in synthetic sample generation.
For Decision Tree classifiers, no misclassifications were observed across all GAN variants. Random Forest also achieves near-error-free performance, with each GAN variant producing a single false negative and zero false positives. As with prior tactics, tree-based ensemble models remain robust once sufficient augmentation of the minority class is achieved. This further confirms that tree-based methods rely primarily on feature-level separability rather than precise distributional fidelity.
Overall, the Initial Access results indicate that GAN-based augmentation is sufficient to stabilize classifier performance for this tactic regardless of the adversarial objective used. Differences among Vanilla GAN, cGAN, WGAN, and WGAN-GP are minimal, and classifier performance remains consistently high across linear, margin-based, and tree-based models. These findings highlight diminishing returns in model complexity, where achieving basic class balance is sufficient, and more advanced generative objectives provide little additional benefit.
6.1.9. Persistence
The results for Persistence (
Table 11) show consistently strong classification performance across all GAN variants. Once GAN-based augmentation is applied, all classifiers achieve very low error rates, indicating that this attack type is readily learned regardless of the specific adversarial objective used. As in prior experiments, all GAN variants were trained independently on Persistence samples and evaluated under identical and cross-validation conditions.
For Logistic Regression, all GAN variants perform exceptionally well. The Vanilla GAN, cGAN, and WGAN each produce only two false negatives and zero false positives, while WGAN-GP achieves error-free classification with zero false negatives and zero false positives. These results indicate the strong linear separability of the augmented minority class across all adversarial objectives. This suggests that the Persistence tactic exhibits a well-defined feature structure that becomes readily separable with minimal augmentation.
For SVM, classification performance is effectively error-free across all GAN variants. The Vanilla GAN and WGAN-GP produce no false negatives or false positives, while cGAN and WGAN introduce only two false positives, each with no false negatives. These differences are minimal and do not materially affect the overall performance. This indicates that margin-based classifiers quickly reach a performance ceiling once sufficient minority representation is achieved.
A similar pattern is observed for KNN. The Vanilla GAN and WGAN-GP produce three false positives and no false negatives, whereas cGAN and WGAN introduce slightly more false positives (seven), while still producing no false negatives. Despite these minor differences, the overall classification performance remains high across all variants. This suggests that local neighborhood structures are well preserved across all GAN objectives once basic class density is established.
For the Decision Tree classifier, no misclassifications were observed across any GAN variant. Random Forest also achieves near-error-free performance, with each GAN variant producing only one false negative and no false positives. As with prior tactics, tree-based ensemble models remain robust once sufficient augmentation of the minority class is achieved. This reinforces the idea that tree-based models rely primarily on clear feature thresholds rather than on subtle distributional differences in synthetic data.
Overall, the Persistence results indicate that GAN-based augmentation is sufficient to stabilize classifier performance for this tactic, regardless of the adversarial objective used. Differences among Vanilla GAN, cGAN, WGAN, and WGAN-GP are negligible, and classifier behavior remains consistent across linear, margin-based, distance-based, and tree-based models. These findings further support the observation that, for certain attack types, achieving sufficient minority representation is more important than optimizing the generative model’s complexity, leading to diminishing returns from more advanced GAN variants.
6.3. Summary of Results
Section 6 presented a comprehensive empirical evaluation of four GAN architectures—Vanilla GAN, Conditional GAN (cGAN), Wasserstein GAN (WGAN), and Wasserstein GAN with Gradient Penalty (WGAN-GP)—for generating minority-class cyberattack data for the UWF-ZeekData22 dataset. Performance was assessed across nine MITRE ATT&CK tactics using five classical machine learning classifiers, with consistent preprocessing, training, and evaluation protocols and a fixed augmentation ratio of 0.15. Notably, several attack types in the dataset contain fewer than 10 samples, and in some cases only a single instance, placing the problem in an extreme-sparsity or few-shot learning regime.
Across most tactics and classifiers, Vanilla GAN provided a strong and reliable baseline. Despite its simpler adversarial objective, Vanilla GAN frequently achieved a favorable balance between recall and false-positive rate for linear and distance-based classifiers. This behavior is particularly evident for moderately rare tactics, such as Discovery, Privilege Escalation, and Persistence, where Vanilla GAN often matched or exceeded the performance of more complex architectures under the evaluated configuration. This indicates that under limited data, simpler models that preserve the observed feature structure can be more effective than complex models that attempt to learn the full data distribution.
Conditional GANs (cGANs) exhibited highly variable behavior across tactics. In some cases, conditional generation reduced false negatives for extremely sparse attack types; however, performance was inconsistent across classifiers and tactics, with noticeable degradation in certain linear and distance-based models. This variability suggests that conditioning mechanisms require sufficient data density to be effective and may introduce instability when applied to highly sparse or poorly defined class distributions. These patterns indicate that cGAN performance was sensitive to the structure of the minority-class data and the conditioning mechanism, limiting its reliability as a uniform augmentation strategy in this setting.
Wasserstein GANs (WGANs) demonstrated comparatively consistent performance across attack types and classifiers. In many cases, WGANs achieved low false-negative rates for linear and margin-based classifiers while maintaining controlled false-positive levels. This consistency reflects the stabilizing effect of the Wasserstein objective, which yields smoother gradients and improves training robustness under limited data. This suggests that the Wasserstein objective produced synthetic samples that preserved class structure more reliably than conditional objectives under the evaluated augmentation setting.
WGAN-GP exhibited mixed and often degraded performance across several tactics, particularly for linear and distance-based classifiers. While WGAN-GP occasionally produced smoother synthetic distributions, this smoothing often increased overlap between minority and majority classes, reducing discriminative separability. This indicates that strong regularization may be detrimental in extreme-sparsity settings, where preserving sharp feature distinctions is more important than enforcing smooth distributions.
Classifier sensitivity varied substantially across model families. Tree-based classifiers (Decision Tree and Random Forest) were largely insensitive to the choice of GAN architecture once sufficient minority-class augmentation was applied and frequently achieved near-error-free performance across tactics. This suggests that these models rely primarily on feature-level partitioning and thresholding rather than precise distributional fidelity. In contrast, Logistic Regression, SVM, and KNN were more sensitive to differences in synthetic data distributions, making them effective indicators of the influence of adversarial objectives on minority-class separability.
Overall, the results indicate that the selection of GAN architecture has a meaningful impact on downstream classifier behavior in imbalanced intrusion detection tasks, particularly for linear and distance-based models. While simple adversarial objectives often prove effective, architectures that emphasize stable distributional alignment exhibit more consistent behavior across diverse attack types. A key finding across all experiments is that, in extreme-sparsity settings, the effectiveness of the GAN-based augmentation depends more on preserving discriminative structure and appropriate augmentation calibration than on architectural complexity. These findings motivate a deeper discussion of stability, classifier sensitivity, and practical trade-offs, which is addressed in the following section.