4.1. Experiment Setup
Datasets: We evaluate our method on seven widely used benchmark datasets: MNIST [
39], SVHN [
40], CIFAR-10 [
41], CIFAR-100 [
41], Tiny ImageNet [
40], Imagenette [
42], and ImageNet [
43]. MNIST consists of 60,000 training and 10,000 testing grayscale images of handwritten digits, each with a resolution of
. SVHN contains 73,257 training and 26,032 testing RGB images of street-view house numbers at
resolution, spanning 10 digit classes. CIFAR-10 and CIFAR-100 both comprise
natural color images, with CIFAR-10 covering 10 general object categories, while CIFAR-100 offers a more challenging setting with 100 fine-grained classes. Tiny ImageNet contains 200 categories, each with 500 training images resized to
resolution. Imagenette is a curated subset of ImageNet consisting of approximately 13,000 images from 10 selected classes, with all images uniformly resized to
. ImageNet (ILSVRC 2012) is a large-scale benchmark dataset comprising over 1.2 million labeled training images and 50,000 validation samples across 1000 object categories, providing a realistic and challenging evaluation setting for high-capacity models.
Adversarial Attacks: To evaluate the robustness of the proposed framework, we test it against a range of adversarial attacks, including FGSM [
5], PGD [
8], C&W [
19], MIFGSM [
29], DeepFool [
44], and AutoAttack [
45]. All attacks are implemented following widely adopted settings. For FGSM, PGD, MIFGSM, and AutoAttack, the perturbation limit is
. The step size for PGD and MIFGSM is
, with 20 and 5 steps, respectively. For the C&W attack, we set the learning rate to 0.01 and run it for 1000 steps. DeepFool uses 50 steps and an overshoot of 0.02. On the ImageNet dataset, the perturbation bound is reduced to
, with attack steps set to 10 and 50.
Baseline: To ensure a fair and comprehensive evaluation, we compare our method with a diverse set of baselines, including both randomized and non-randomized adversarial defenses. For randomized defenses, We consider Additive Noise [
13] and its multiplicative noise injection variant, as well as Random Bits [
46], RPF [
47], CTRW [
48], and RN [
36] as baseline methods. For non-randomized baselines, we include several adversarial training-based methods such as RobustWRN [
49], AWP [
50], SAT [
51], LLR [
52], and RobNet [
53]. These baselines cover a broad range of defense strategies and enable rigorous comparison across different threat models.
4.2. Results on CIFAR
To demonstrate the effectiveness of our proposed method, we first performed evaluations under six different adversarial attacks. Beyond the standard baseline of a deterministic classifier with adversarial training (denoted as AT), we included two additional noise-based baselines: additive noise injection and a stronger baseline, multiplicative noise injection, which simply fuses the feature maps via multiplying the noise. Additive noise is sampled from a standard Gaussian distribution
, while multiplicative noise is sampled from
. Furthermore, we also compared our results with the latest defense algorithms, including RPF, CTRW and RN. Although these noise-based baselines are simple, they can achieve satisfactory adversarial robustness within our defense scheme. The results are summarized in
Table 2.
On the CIFAR-10 dataset, the analysis of noise injection methods reveals clear gains over the AT baseline. Additive Noise achieves a 5.45% increase in PGD20 robustness over AT (from 52.16% to 57.61%), while Multiplicative Noise further improves this to 59.49% under PGD20 and 63.78% under AutoAttack, accompanied by modest gains in clean accuracy. The RPF method, leveraging geometric constraints, elevates PGD20 robustness to 61.27% and AutoAttack accuracy to 64.38%, signaling the advantage of structured randomness. The CTRW method achieves a significant PGD20 robustness of 69.48% and an AutoAttack accuracy of 73.56%, marking an important advance in robustness. The RN method demonstrates strong resistance to multi-step attacks, achieving 74.55% robustness against PGD20. Our proposed approach, DSGN, consistently surpasses all prior competitors. DSGN achieves a superior clean accuracy of 92.21% and attains 80.43% robust accuracy against PGD20, representing a 5.88% improvement over the strongest prior competitor RN (74.55%). Furthermore, under the AutoAttack benchmark, DSGN achieves an impressive 84.52% robustness, which is a remarkable 20.14% improvement compared to RPF (64.38%). Crucially, our model uniquely maintains over 80% robustness consistently across multiple attack types, including FGSM (82.32%), PGD20 (80.43%), and MIFGSM (80.54%), while simultaneously maintaining superior clean accuracy, effectively mitigating the common trade-off between robustness and accuracy.
The performance disparity becomes even more pronounced on the more challenging CIFAR-100 dataset, highlighting the limitations of prior work in high-complexity settings. The AT baseline struggles significantly, achieving only 28.71% robustness under PGD20 and 24.48% under AutoAttack. Both Additive and Multiplicative noise injection methods yield moderate improvements, yet their AutoAttack accuracies remain below 39%. The RPF method pushes the robustness envelope further, reaching 42.88% AutoAttack accuracy. The CTRW method achieves 42.01% for PGD20 and 45.58% for AutoAttack. The RN method provides a PGD20 robustness of 47.70%, but its AutoAttack robustness drops to 39.01%. However, our method, DSGN, sets a new standard with 56.34% precision under AutoAttack and 48.23% robustness under PGD20, representing a significant improvement of over 10% compared to RPF under AutoAttack. Equally important is the clean accuracy, where DSGN attains 67.71%, substantially outperforming all prior baselines, such as RPF (56.88%). This result underscores our model’s ability to circumvent the traditional robustness–accuracy trade-off even on complex datasets with higher class cardinality. Collectively, these findings robustly validate the superiority of our DSGN method across all evaluated metrics and datasets, demonstrating its effectiveness in achieving state-of-the-art performance in both clean accuracy and adversarial robustness.
We further evaluate the adversarial robustness of our method on CIFAR-10 using the WideResNet-34-10 architecture, and compare it against a variety of randomized baselines, as summarized in
Table 3. Consistent with the observations on ResNet-18, our method outperforms all competing baselines across four commonly used adversarial attacks. Compared with the AT, our method achieves significantly higher robust accuracy, with a 27.18% improvement under PGD
20 and a 29.68% gain under AutoAttack. Similarly, when compared to the additive noise injection, which achieves 60.55% accuracy under AutoAttack, our method improves the robustness by 21.37%. This suggests that our approach provides more stable and effective defense even under stronger attack settings. In addition, our method also surpasses advanced randomized defenses such as Random Bits, Multiplicative Noise, RPF, and CTRW, demonstrating both higher clean and robust accuracy. For instance, compared to CTRW, which achieves 74.23% robust accuracy under AutoAttack, our method improves robust accuracy by 7.69%. Furthermore, even against the most competitive recent baseline RN, our method achieves a consistent improvement across all attack scenarios. Specifically, we observe a 1.36% improvement under AutoAttack and a 1.25% boost under PGD
20.
4.3. Evaluate with Stronger Attacks
In addition to the standard evaluation of adversarial robustness presented in
Table 2, we further evaluate the performance of various defense methods under stronger attack settings on CIFAR-10 and CIFAR-100, as shown in
Figure 2. We mainly consider two scenarios: increasing the number of PGD attack steps (from
) and increasing the magnitude of the perturbation
(with
). We compare our method with Overfit (denoted as AT), RPF (denoted as RPF), Additive Noise (denoted as Add), and Multiplicative Noise (denoted as Mult). In the scenario with increased PGD steps,
Figure 2a,c demonstrate the robustness of different methods on CIFAR-10 and CIFAR-100, respectively. Unlike the deterministic adversarial training baseline, randomized methods exhibit stronger resilience against the increasing number of attack iterations. Among these, Ours consistently achieves the highest robust accuracy, maintaining 80.0% on CIFAR-10 and 48.0% on CIFAR-100 under PGD
100. In contrast, RPF attains 60.6% and 39.0%, while Additive and Multiplicative noise injection methods perform even lower.
Although all methods experience performance degradation as the perturbation size
increases, our method consistently maintains the highest robust accuracy across all settings, as shown in
Figure 2b,d. On CIFAR-10, our method achieves 58.4% robust accuracy at
, clearly outperforming RPF (36.2%) as well as additive and multiplicative noise baselines. Even under stronger perturbations (e.g.,
), our method still reaches 36.6% accuracy, the best among all evaluated methods. A similar trend is observed on CIFAR-100: at
, our method maintains 18.6% robust accuracy, surpassing RPF (16.6%) and other baselines.
4.4. State-of-the-Art Comparison
We also compare our proposed DSGN with leading defense methods to highlight its effectiveness. The evaluation is conducted on two widely used benchmarks: WideResNet-34-10 for CIFAR-10 and ResNet-50 for ImageNet. The results are summarized in
Table 4. Specifically, we report robust accuracies on CIFAR-10 under PGD
20 and AA, and on ImageNet under PGD
10 and PGD
50. Our method consistently outperforms all existing approaches across both datasets and evaluation settings. On CIFAR-10, it achieves robust accuracies of 83.31% under PGD
20 and 84.92% under AA, surpassing the previously strongest baseline RN by 0.32% and 16.22%, respectively. On ImageNet, our method attains robust accuracies of 74.26% and 72.27% under PGD
10 and PGD
50, representing clear improvements over both CTRW and RN. In particular, our method outperforms RPF by 17.70% under PGD
10 and 16.86% under PGD
50. These results demonstrate the strong robustness and scalability of our method, especially under more challenging attack scenarios and on large-scale datasets.
4.5. Inspection of Generalization
We conduct comprehensive experiments on diverse datasets and network architectures to assess the generalization capability of the proposed method. The experimental results are summarized in
Table 5 and
Table 6. In
Table 5, we focus on evaluating the robustness of our method across six widely used image classification datasets: MNIST, SVHN, CIFAR-10, CIFAR-100, Tiny-ImageNet and Imagenette. All experiments are conducted using ResNet-18 as the backbone, except for MNIST, which uses LeNet. We report accuracy under clean, FGSM, and PGD
20 attack settings. Compared to the baseline (no defense), our method achieves significantly improved robustness across all datasets, particularly under strong PGD attacks. For instance, on SVHN and CIFAR-10, our method achieves over 80% accuracy under PGD
20 attacks, while the baseline model suffers severe degradation. These results indicate that our method maintains strong generalization capability across datasets of varying complexity and scale. In
Table 6, we investigate the impact of different network architectures on our method. We evaluate the performance of our method on several mainstream models, including VGG19, GoogLeNet, DenseNet121, and various ResNet variants (ResNet-32/44/56). As shown in the results, our method consistently outperforms the baseline under adversarial settings for all architectures. Notably, our method maintains high robustness even on deeper networks such as ResNet-56, achieving 69.1% accuracy under PGD
20, compared to only 1.2% for the no defense. This suggests that our method generalizes effectively across networks of varying depths and design paradigms. In summary, we evaluate the generalization ability of our method from two perspectives: datasets and architectures. For datasets, our method demonstrates strong robustness across a wide range of data distributions. For architectures, we confirm that our method adapts well to different network types, as well as varying widths and depths. These results validate that our method is a highly generalizable defense method for adversarial robustness.