Effects of Adversarial Training on the Safety of Classification Models
Abstract
:1. Introduction
- In [13], the authors proposed a method for understanding the behavior of AI by visualizing the activation of each layer of neural network models.
- RQ1.
- Can classification models appropriately classify adversarial examples by adversarial training?
- RQ2.
- Do they show same results as RQ1 for safety-related datasets?
- RQ3.
- Does adversarial training affect safety regardless of the model used for training?
- RQ4.
- At what rate should adversarial examples be included with normal data to archive a balance between safety and accuracy?
- 1.
- Adversarial training can improve safety. Depending on the experimental results, safety, which is defined as the successful classification of adversarial examples, is improved by 20–40%.
- 2.
- Adversarial training can be used to verify the safety of AI. Experimental results show that adversarial training can affect safety improvement, which means it can be used as a verification method to ensure safety by applying a certain number of adversarial examples to the training dataset.
- 3.
- Adversarial training can be utilized as a factor to verify safety regardless of the model structure. Improvement in safety is observed regardless of the model structure.
- 4.
- We propose a process of adjusting the adversarial-to-normal ratio in training datasets according to the preferential requirements. Experimental results with different adversarial example inclusion ratios show that adversarial inclusion ratios play a key role in a tradeoff between accuracy and safety.
2. Related Work
2.1. Studies That Directly Addressing the NFRs of AI
2.1.1. Fairness of AI
2.1.2. Security of AI
2.1.3. Transparency of AI
2.1.4. Safety of AI
2.2. Studies That Indirectly Addressing NFRs of AI
2.2.1. Explainable AI (XAI)
2.2.2. Adversarial Examples and Attacks
“a good response to adversarial examples is an important safety issue in AI”.
3. Overview of Experiments
3.1. Definition of Safety
“a measure of whether a model responds appropriately to data with untrained features or to data that has been adversarially attacked to obtain incorrect results from the model”.
3.2. Overview of Our Experiments
4. Methodology
4.1. Datasets and Models
4.1.1. Datasets
4.1.2. Models
4.2. FGSM
4.3. Methodology
5. Results and Discussion
5.1. Results from Experiments with CIFAR-10
5.2. Results from Experiments with CIFAR-100
5.3. GTSRB Results
5.4. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bird, S.; Dudík, M.; Edgar, R.; Horn, B.; Lutz, R.; Milan, V.; Sameki, M.; Wallach, H.; Walker, K. Fairlearn: A Toolkit for Assessing and Improving Fairness in AI, Microsoft Tech Report; MSR-TR-2020-32. 2020; 142–149.
- Dwork, C.; Hardt, M.; Pitassi, T.; Reingold, O.; Zemel, R. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, Cambridge, MA, USA, 8–10 January 2012; pp. 214–226. [Google Scholar]
- Feldmen, M.; Friedler, S.A.; Moeller, J.; Scheidegger, C.; Venkatasubramanian, S. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 259–268. [Google Scholar]
- Tramer, F.; Atildakis, V.; Geambasu, R.; Hsu, D.; Hubaux, J.P.; Humbert, M.; Lin, H. Fairtest: Discovering unwarranted associations in data-driven applications. In Proceedings of the IEEE European Symposium on Security and Privacy, Paris, France, 26–28 April 2017. [Google Scholar]
- Zhang, J.; Harman, M. Ignorance and Prejudice. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021. [Google Scholar]
- Zemel, R.; Wu, Y.; Swersky, K.; Pitassi, T.; Dwork, C. Learning fair representations. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 325–333. [Google Scholar]
- Mei, S.; Zhu, X. The security of latent dirichlet allocation. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; pp. 681–689. [Google Scholar]
- Mei, S.; Zhu, X. Using machine teaching to identify optimal training-set attacks on machine learners. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Barreno, M.; Nelson, B.; Joseph, A.D.; Tygar, J.D. The security of machine learning. Mach. Learn. 2010, 2, 121–148. [Google Scholar] [CrossRef] [Green Version]
- Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in AI safety. arXiv 2016, arXiv:1606. 06565. [Google Scholar]
- Juric, M.; Sandic, A.; Brcic, M. AI safety: State of the field through quantitative lens. In Proceedings of the 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 28 September–2 October 2020; pp. 1254–1259. [Google Scholar]
- Leike, J.; Martic, M.; Krakovna, V.; Ortega, P.A.; Everitt, T.; Lefrancq, A.; Legg, S. AI safety gridworlds. arXiv 2017, arXiv:1711.09883. [Google Scholar]
- Yosinski, J.; Clune, J.; Nguyen, A.; Fuchs, T.; Lipson, H. Understanding neural networks through deep visualization. arXiv 2015, arXiv:1506.06579. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asi, R.; Yu, B. Interpretable machine learning: Definitions, methods, and applications. arXiv 2019, arXiv:1901.04592. [Google Scholar] [CrossRef] [Green Version]
- Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Zieba, K. End to end learning to self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Levinson, J.; Askel, J.; Becker, J.; Dolson, J.; Held, D.; Kammel, S.; Thrun, S. Towards fully autonomous driving: Systems and algorithms. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium, Baden-Baden, Germany, 5–9 June 2011; pp. 163–168. [Google Scholar]
- Vieira, S.; Pinaya, W.H.; Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurologicla disorders: Methods and applications. Neurosci. Biobehav. Rev. 2017, 74, 58–75. [Google Scholar] [CrossRef] [Green Version]
- Ramsundar, B.; Kearnes, S.; Riley, P.; Webster, D.; Konerding, D.; Pande, V. Massively multitask networks for drug discovery. arXiv 2015, arXiv:1502.02072. [Google Scholar]
- Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
- Krause, J.; Pere, A.; Ng, K. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; pp. 5686–5697. [Google Scholar]
- Tan, J.; Ung, M.; Cheng, C.; Greence, C.S. Unsupervised feature contruction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In Pacific Symposium on Biocomputing Co-Charis; World Scientific: Singapore, 2014; pp. 132–143. [Google Scholar]
- Pesapane, F.; Volonté, C.; Codari, M.; Sardanelli, F. Artificial intelligence as a medical device in radiology: Ethical and regulatory: Ethical and regulatory issues in Europe and the United States. Insights Into Imaging 2018, 9, 743–753. [Google Scholar] [CrossRef]
- Miller, D.D.; Brown, E.W. Artificial intelligence in medical practice: The question to the answer? Am. J. Med. 2018, 131, 129–133. [Google Scholar] [CrossRef]
- Fu, K.; Cheng, D.; Tu, Y.; Zhang, L. Credit card fraud detection using convolutional neural networks. In International Conference on Neural Information Processing; Springer: Cham, Switzerland, 2016; pp. 483–490. [Google Scholar]
- Samek, W.; Wieg, T.; Müller, K.R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Kurakin, A.; Goodfellow, I.; Bengio, S.; Dong, Y.; Liao, F.; Liang, M.; Abe, M. Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems; Springer: Cham, Switzerland, 2018; pp. 195–231. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Goodfellow, I.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 506–519. [Google Scholar]
- Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef] [Green Version]
- Aryal, K.; Gupta, M.; Abdelsalam, M. A Survey on Adversarial Attacks for Malware Analysis. arXiv 2021, arXiv:2111.08223. [Google Scholar]
- Kimmell, J.C.; Abdelsalam, M.; Gupta, M. Analyzing Machine Learning Approaches for Online Malware Detection in Cloud. In Proceedings of the 2021 IEEE International Conference on Smart Computing (SMARTCOMP), Irvine, CA, USA, 23–27 August 2021; pp. 189–196. [Google Scholar]
- McDole, A.; Abdelsalam, M.; Gupta, M.; Mittal, S. Analyzing CNN Based Behavioural Malware Detection Techniques on Cloud IaaS. Cloud Comput. 2020, 2020, 12403. [Google Scholar]
- Kimmel, J.C.; Mcdole, A.D.; Abdelsalam, M.; Gupta, M.; Sandhu, R. Recurrent Neural Networks Based Online Behavioural Malware Detection Techniques for Cloud Infrastructure. IEEE Access 2021, 9, 68066–68080. [Google Scholar] [CrossRef]
- Poon, H.-K.; Yap, W.-S.; Tee, Y.-K.; Lee, W.-K.; Goi, B.-M. Hierarchical gated recurrent neural network with adversarial and virtual adversarial training on text classification. Neural Netw. 2019, 119, 299–312. [Google Scholar] [CrossRef]
- Terzi, M.; Susto, G.A.; Chadhari, P. Directional adversarial training for cost sensitive deep learning classification applications. Eng. Appl. Artif. Intell. 2020, 91, 103550. [Google Scholar] [CrossRef] [Green Version]
- Dong, X.; Zhu, Y.; Zhang, Y.; Fu, Z.; Xu, D.; Yang, S.; Melo, G. Leveraging adversarial training in self-learning for cross-lingual text classification. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 11–15 July 2020; pp. 1541–1544. [Google Scholar]
- Ajunwa, I.; Friedler, S.; Scheidegeer, C.E.; Venkatasubramanian, S. Hiring by Algorithm: Predicting and Preventing Disparate Impack. 2016. Available online: http://tagteam.harvard.edu/hub_feeds/3180/feed_items/2163401 (accessed on 17 January 2022).
- Krizhevskey, A.; Hinton, G. Leawrning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 2012, 32, 323–332. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1995, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]












| Population Ratio | Matrix | 0.05 | 0.1 | ||||
|---|---|---|---|---|---|---|---|
| LeNet | ResNet18 | VGG16 | LeNet | ResNet18 | VGG16 | ||
| 10:0 | Acc | 63 | 53 | 78 | 63 | 53 | 78 | 
| Safety | 18 | 33 | 33 | 11 | 27 | 27 | |
| 8:2 | Acc | 60 | 51 | 73 | 60 | 50 | 74 | 
| Safety | 45 | 35 | 56 | 38 | 31 | 53 | |
| 7:3 | Acc | 58 | 49 | 72 | 57 | 49 | 69 | 
| Safety | 45 | 36 | 56 | 41 | 34 | 54 | |
| 6:4 | Acc | 58 | 48 | 70 | 58 | 47 | 70 | 
| Safety | 47 | 36 | 56 | 45 | 34 | 56 | |
| Population Ratio | Matrix | 0.05 | 0.1 | ||||
|---|---|---|---|---|---|---|---|
| LeNet | ResNet18 | VGG16 | LeNet | ResNet18 | VGG16 | ||
| 10:0 | Acc | 27 | 19 | 40 | 27 | 19 | 40 | 
| Safety | 15 | 7 | 10 | 4 | 6 | 7 | |
| 8:2 | Acc | 26 | 17 | 33 | 25 | 17 | 35 | 
| Safety | 15 | 9 | 20 | 12 | 8 | 21 | |
| 7:3 | Acc | 26 | 17 | 33 | 21 | 16 | 29 | 
| Safety | 15 | 10 | 22 | 12 | 9 | 20 | |
| 6:4 | Acc | 24 | 15 | 30 | 22 | 16 | 26 | 
| Safety | 18 | 10 | 22 | 15 | 10 | 20 | |
| Population Ratio | Matrix | 0.05 | 0.1 | ||||
|---|---|---|---|---|---|---|---|
| LeNet | ResNet18 | VGG16 | LeNet | ResNet18 | VGG16 | ||
| 10:0 | Acc | 85 | 71 | 93 | 85 | 71 | 93 | 
| Safety | 17 | 33 | 40 | 17 | 33 | 40 | |
| 8:2 | Acc | 76 | 63 | 90 | 73 | 60 | 85 | 
| Safety | 57 | 49 | 75 | 58 | 42 | 70 | |
| 7:3 | Acc | 75 | 60 | 86 | 70 | 58 | 81 | 
| Safety | 63 | 50 | 76 | 65 | 45 | 72 | |
| 6:4 | Acc | 75 | 58 | 87 | 65 | 54 | 77 | 
| Safety | 66 | 52 | 79 | 65 | 46 | 73 | |
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.; Han, J. Effects of Adversarial Training on the Safety of Classification Models. Symmetry 2022, 14, 1338. https://doi.org/10.3390/sym14071338
Kim H, Han J. Effects of Adversarial Training on the Safety of Classification Models. Symmetry. 2022; 14(7):1338. https://doi.org/10.3390/sym14071338
Chicago/Turabian StyleKim, Handong, and Jongdae Han. 2022. "Effects of Adversarial Training on the Safety of Classification Models" Symmetry 14, no. 7: 1338. https://doi.org/10.3390/sym14071338
APA StyleKim, H., & Han, J. (2022). Effects of Adversarial Training on the Safety of Classification Models. Symmetry, 14(7), 1338. https://doi.org/10.3390/sym14071338
 
         
                                                


 
       