Effects of Composite Cross-Entropy Loss on Adversarial Robustness
Abstract
1. Introduction
- We introduce a second-term auxiliary cross-entropy loss for assistance in the objective loss function. For this second cross-entropy loss, a target classification is assigned instead of the ground truth . Three target classifications were tested in our experiments: the false classification label, the all-one label, and the model’s prediction. The impacts of this auxiliary loss on feature distribution and adversarial robustness were investigated.
- We provide an analysis of the influence of the auxiliary cross-entropy loss on the training process. The gradients generated by this auxiliary loss have various forms and impacts on the final solution. Some functions encourage the model to learn a more spread-out structured feature space, while others reduce the distance of features between data points with improved robustness.
- The specific target classes were also integrated in the adversarial training algorithms, and model robustness was improved by training with these target adversarial examples. The target labels used for adversary generation are as follows: the most likely incorrect label, the all-one label, a specific classification, and the model’s prediction (the cross-entropy loss refers to the negative Shannon entropy, which can be interpreted as the KL divergence between the model’s prediction and all-one class). The performances of these target adversarial trained models were evaluated and compared with a non-target adversarial trained model.
2. Method
2.1. Materials
2.2. Composite Cross-Entropy Loss
- : At the beginning of the training process, the model misclassifies a larger proportion of the data. To encourage faster learning from these incorrectly classified samples, the target classification can incorporate the falsely classified label, thereby increasing the loss between the prediction and the misclassification. This target class label can be represented as a conditional vector (see Equation (3)), where is the one-hot-encoded label of the current model classification.
- : The second special target classification is the all-one class label, where all the elements in the vector are one, denoted as , which can be interpreted as a random-directed adversarial label. Model robustness is enhanced by reducing the loss between the prediction and this all-one target. In adversarial training, the process involves first increasing the loss associated with the correct label (or decreasing the loss for an incorrect label) to generate adversaries, followed by training the model to correctly classify these adversaries by minimizing the loss with respect to the true label [8]. Reducing the loss for the all-one class simply resembles the process of target adversary generation. However, since the parameter space is optimized simultaneously by the separate components of the composite loss, the learning process from the “adversaries” must remain dominated by the standard cross-entropy. Therefore, the weight assigned to the cross-entropy with respect to the all-one class should be smaller than that of the standard cross-entropy to avoid destabilizing the training process.
- Another special target classification is the model’s prediction . The cross-entropy of the original prediction is also known as Shannon entropy [17]. Shannon entropy can be described in an alternative form, which also defines the Kullback–Leibler (KL) divergence between the prediction and the all-one classification (see Equation (4)):
2.3. Categories of Composite Cross-Entropy Loss
2.4. Target Adversarial Training
2.5. Evaluation Metrics
3. Results
3.1. Sparser Feature Distribution
3.2. Incorporation into Adversarial Training
4. Discussion
4.1. Trade-Off in Robustness and Feature Distribution
4.2. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Jamin, A.; Humeau-Heurtier, A. (Multiscale) Cross-Entropy Methods: A Review. Entropy 2020, 22, 45. [Google Scholar] [CrossRef] [PubMed]
- Mao, A.; Mohri, M.; Zhong, Y. Cross-Entropy Loss Functions: Theoretical Analysis and Applications. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 23803–23828. Available online: https://proceedings.mlr.press/v202/mao23b.html (accessed on 18 March 2025).
- Nar, K.; Ocal, O.; Sastry, S.S.; Ramchandran, K. Cross-Entropy Loss Leads To Poor Margins. September 2018. Available online: https://openreview.net/forum?id=ByfbnsA9Km (accessed on 25 March 2025).
- Pang, T.; Du, C.; Dong, Y.; Zhu, J. Towards Robust Detection of Adversarial Examples. arXiv 2018, arXiv:1706.00633. [Google Scholar] [CrossRef]
- Wan, W.; Zhong, Y.; Li, T.; Chen, J. Rethinking Feature Distribution for Loss Functions in Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18 June 2018; pp. 9117–9126. Available online: https://openaccess.thecvf.com/content_cvpr_2018/html/Wan_Rethinking_Feature_Distribution_CVPR_2018_paper.html (accessed on 25 March 2025).
- Mustafa, A.; Khan, S.; Hayat, M.; Goecke, R.; Shen, J.; Shao, L. Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3385–3394. Available online: https://openaccess.thecvf.com/content_ICCV_2019/html/Mustafa_Adversarial_Defense_by_Restricting_the_Hidden_Space_of_Deep_Neural_ICCV_2019_paper.html (accessed on 25 March 2025).
- Ding, N.; Arabian, H.; Möller, K. Feature space separation by conformity loss driven training of CNN. IFAC J. Syst. Control. 2024, 28, 100260. [Google Scholar] [CrossRef]
- Ding, N.; Möller, K. Adversarial training with borderline samples. J. Supercomput. 2025, 81, 1025. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
- Müller, R.; Kornblith, S.; Hinton, G.E. When Does Label Smoothing Help? arXiv 2020, arXiv:1906.02629. [Google Scholar] [CrossRef]
- Twinanda, A.P.; Shehata, S.; Mutter, D.; Marescaux, J.; de Mathelin, M.; Padoy, N. EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos. IEEE Trans. Med. Imaging 2017, 36, 86–97. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. Available online: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html (accessed on 19 March 2025).
- Tan, M.; Le, Q. Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946. [Google Scholar] [CrossRef]
- Ding, N.; Möller, K. Minimally Distorted Adversarial Images with a Step-Adaptive Iterative Fast Gradient Sign Method. AI 2024, 5, 922–937. [Google Scholar] [CrossRef]
- Derivation of the Gradient of the cross-entropy Loss. Available online: https://jmlb.github.io/ml/2017/12/26/Calculate_Gradient_Softmax/ (accessed on 19 March 2025).
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial Machine Learning at Scale. arXiv 2017, arXiv:1611.01236. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar] [CrossRef]
Class | Surgical Tool | Number of Frames |
---|---|---|
1 | Grasper | 23,507 |
2 | Bipolar | 3222 |
3 | Hook | 44,887 |
4 | Scissors | 1483 |
5 | Clipper | 2647 |
6 | Irrigator | 2899 |
7 | Bag | 1545 |
Index | Composite Cross-Entropy | Gradients |
---|---|---|
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | ||
8 |
Index | Composite Cross-Entropy | PGD | Accuracy | ||
---|---|---|---|---|---|
0 | (Lr = 0.001) | 0% | 92.61% | 5.68 | 2.25 |
1 | 2.57% | 91.53% | 11.84 | 4.62 | |
2 | 4.02% | 93.50% | 11.88 | 4.25 | |
3 | 1.68% | 92.55% | 11.40 | 4.43 | |
4 | 5.93% | 92.38% | 11.97 | 4.49 | |
5 | 17.73% | 91.39% | 0.75 | 0.33 | |
5(2) | 20.03% | 92.07% | 0.74 | 0.31 | |
6 | 9.74% | 91.05% | 0.68 | 0.32 | |
6(2) | 31.09% | 91.38% | 0.66 | 0.29 | |
7 | 21.27% | 91.19% | 0.73 | 0.35 | |
8 | 7.20% | 92.39% | 17.05 | 6.46 |
Index | Composite Cross-Entropy | PGD | Accuracy | ||
---|---|---|---|---|---|
1 | 0% | 90.74% | 6.51 | 2.70 | |
6 | 4.94% | 88.83% | 0.67 | 0.29 | |
7 | 3% | 90.15% | 0.77 | 0.36 |
Training Objective Function | PGD Attack | Accuracy | ||
---|---|---|---|---|
80.47% | 91.66% | 7.91 | 3.33 | |
80.75% | 90.93% | 6.49 | 2.90 | |
80.76% | 91.05% | 6.77 | 2.86 |
Adversary Generation Function | PGD Attack | Accuracy | ||
---|---|---|---|---|
77.54% | 91.60% | 12.49 | 5.16 | |
(i-FGSM) | 80.04% | 91.46% | 7.82 | 3.54 |
79.84% | 91.29% | 8.15 | 3.48 | |
80.47% | 91.08% | 6.94 | 3.16 | |
57.27% | 88.64% | 11.77 | 4.88 | |
74.59% | 90.79% | 10.95 | 4.62 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, N.; Möller, K. Effects of Composite Cross-Entropy Loss on Adversarial Robustness. Electronics 2025, 14, 3529. https://doi.org/10.3390/electronics14173529
Ding N, Möller K. Effects of Composite Cross-Entropy Loss on Adversarial Robustness. Electronics. 2025; 14(17):3529. https://doi.org/10.3390/electronics14173529
Chicago/Turabian StyleDing, Ning, and Knut Möller. 2025. "Effects of Composite Cross-Entropy Loss on Adversarial Robustness" Electronics 14, no. 17: 3529. https://doi.org/10.3390/electronics14173529
APA StyleDing, N., & Möller, K. (2025). Effects of Composite Cross-Entropy Loss on Adversarial Robustness. Electronics, 14(17), 3529. https://doi.org/10.3390/electronics14173529