# Diversity Adversarial Training against Adversarial Attack on Deep Neural Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Adversarial Attacks

#### 2.2. Defense Methods against Adversarial Attacks

## 3. Methodology

**FGSM:**The FGSM method can create ${x}^{*}$ using ${L}_{\infty}$:

**I-FGSM:**I-FGSM is an extension of the FGSM. In this method, instead of changing the amount of $\u03f5$ at each step, a smaller amount, $\alpha $, is changed and eventually clipped by $\u03f5$:

**DeepFool:**The DeepFool approach creates an adversarial example with less distortion from the original sample; it generates ${x}^{*}$ through the linearization approximation. However, because the neural network is nonlinear, this method is more complicated than FGSM.

**CW:**The fourth method is the Carlini attack that can generate an adversarial example with 100% attack success rate, which uses a different objective function:

Algorithm 1 Diversity adversarial training | |

Input: | |

${x}_{j}\in X$ | ▹ Original training dataset |

$y\in Y$ | ▹ original class |

t | ▹ validation data |

${M}_{L}$ | ▹ local model |

${M}_{T}\in X$ | ▹ Original training dataset |

$y\in Y$ | ▹ original class |

t | ▹ validation data |

${M}_{L}$ | ▹ local model |

fgs, ifgs, dp, cw | ▹ FGSM, I-FGSM, DeepFool, CW methods |

Diversity adversarial training: (${M}_{L}$, x, fgs, ifgs, dp, cw)
| |

${X}^{*}$← Generation of adversarial example (${M}_{L}$, x, fgs) | |

${X}^{*}$← Generation of adversarial example (${M}_{L}$, x, ifgs) | |

${X}^{*}$← Generation of adversarial example (${M}_{L}$, x, dp) | |

${X}^{*}$← Generation of adversarial example (${M}_{L}$, x, cw) | |

Train the target model ${M}_{T}$← (X, Y) + (${X}^{*}$, Y) | |

Record accuracy of the target model ${M}_{T}\left(t\right)$ | |

return ${M}_{T}$ |

## 4. Experiment Setup

#### 4.1. Datasets

#### 4.2. Model Configuration

#### 4.2.1. Local Model

#### 4.2.2. Holdout Model

#### 4.2.3. Target Model

#### 4.3. Generation on Adversarial Examples from Local Model

## 5. Experimental Results

## 6. Discussion

**Assumption:**The assumption of the diversity training method is that the attack is a black box, that is, the attacker does not have any information about the target model. In this study, the attacker generates various adversarial examples using the holdout model, which is known to the attacker, then transfers the black-box attack to the target model.

**Generation of diverse adversarial examples:**Unlike the conventional adversarial training method, the diversity training approach demonstrates a method of training the target model with various adversarial examples. It generates various adversarial samples targeting the local model known to the defender. Different adversarial examples are generated using the FGSM, I-FGSM, DeepFool, and CW methods, each of which has different degrees of distortion. Detailed comparison of FGSM, I-FGSM, DeepFool, and CW methods revealed that the CW method generates adversarial examples with the smallest distortion. The adversarial samples generated from these FGSM, I-FGSM, DeepFool, and CW methods are used to additionally train the target model. Through this process, a more robust model is obtained against an unknown adversarial attack.

**Model configuration:**In this paper, there exist three different types of models: target model, local model, and holdout model. These models are constructed with different architectures, as shown in Table 1, Table 2 and Table 3, respectively. In the local model, various adversarial examples are generated based on FGSM, I-FGSM, DeepFool, and CW methods from the local model known to the defender. The target model is the final inference model, which becomes a black-box model for the attacker. The target model is trained with various adversarial examples to become robust against unknown adversarial attacks. However, the attacker uses an adversarial example generated from the holdout model to make the target model misclassify, which is known as a transfer attack.

**Dataset:**Our experiments were conducted using the MNIST and Fashion-MNIST datasets, both of which contain grayscale images of size 28 × 28. However, there was a performance difference between MNIST and Fashion-MNIST. As a result of the characteristics of the Fashion-MNIST, the similarity between T-shirt and shirt images is higher than that of the numeric images, and thus, the test images of Fashion-MNIST tend to demonstrate lower accuracy than those of the MNIST. However, for both datasets, even after being trained with various adversarial samples, the diversity training method maintained almost the same accuracy as the model trained with the original training data.

**Defense considerations:**While a separate module for the target model is not required, the adversarial training approach is a simple and effective method to defend against adversarial attacks. Unlike existing adversarial training methods, the diversity training method is robust against unknown adversarial attacks by generating various adversarial samples and including them into the training data of the model. Regarding local models, our approach does not require a large number of such models because it generates several adversarial examples by using one local model in the same way as those existing adversarial learning methods. Regarding the generated adversarial examples, the attacker can produce various adversarial samples, and even under such conditions, the presented approach is observed to be robust on various adversarial attacks. In terms of accuracy obtained by training with the original data, the diversity training method maintains an accuracy similar to that of the existing adversarial training method.

**Applications:**One potential application field of the diversity training method is in autonomous vehicles. For an autonomous vehicle, the attacker intentionally deceives its classification mechanism to misclassify the road sign, which could be a modified adversarial example. To defend against such an adversarial attack, our demonstrated method can be used as a defense method to correctly identify the modified road sign. In addition, in the case of a medical business, there exists a risk of treatment misjudgment for a patient due to revised adversarial examples. Therefore, the diversity training method can be used in such systems to increase the correct classification rate of the medical treatment.

**Limitation and future work**In the diversity training method, the generation approaches used for adversarial samples include FGSM, I-FGSM, DeepFool, and CW methods. However, there exist some other methods to generate such samples. In addition, the diversity training method uses one local model similar to the basic adversarial example method. However, if the number of local models increases, research on various ensemble adversarial training by constructing several local models instead of a single one will be an interesting topic. In terms of the evaluation, the method of analyzing the distribution of the recognized adversarial training for the original sample and unknown adversarial example as a decision boundary will be an interesting topic.

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

Layer Type | MNIST Shape |
---|---|

Convolution + ReLU | [3, 3, 32] |

Convolution + ReLU | [3, 3, 32] |

Max pooling | [2, 2] |

Convolution + ReLU | [3, 3, 64] |

Convolution + ReLU | [3, 3, 64] |

Max pooling | [2, 2] |

Fully connected + ReLU | [200] |

Fully connected + ReLU | [200] |

Softmax | [10] |

Parameter | MNIST Model | CIFAR10 Model |
---|---|---|

Learning rate | 0.1 | 0.1 |

Momentum | 0.9 | 0.9 |

Delay rate | - | 10 (decay 0.0001) |

Dropout | 0.5 | 0.5 |

Batch size | 128 | 128 |

Epochs | 50 | 200 |

**Figure A1.**For MNIST, the failure rate for adversarial examples generated with the holdout model using the without method, baseline method, and diversity training method.

**Table A3.**Target model architecture [41] for CIFAR10.

Layer Type | CIFAR10 Shape |
---|---|

Convolution + ReLU | [3, 3, 64] |

Convolution + ReLU | [3, 3, 64] |

Max pooling | [2, 2] |

Convolution + ReLU | [3, 3, 128] |

Convolution + ReLU | [3, 3, 128] |

Max pooling | [2, 2] |

Convolution + ReLU | [3, 3, 256] |

Convolution + ReLU | [3, 3, 256] |

Convolution + ReLU | [3, 3, 256] |

Convolution + ReLU | [3, 3, 256] |

Max pooling | [2, 2] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Max pooling | [2, 2] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Convolution + ReLU | [3, 3, 512] |

Max pooling | [2, 2] |

Fully connected + ReLU | [4096] |

Fully connected + ReLU | [4096] |

Softmax | [10] |

**Figure A2.**For Fashion-MNIST, the failure rate for adversarial examples generated with the holdout model using the without method, baseline method, and diversity training method.

## References

- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.
**2015**, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tanzi, L.; Vezzetti, E.; Moreno, R.; Aprato, A.; Audisio, A.; Massè, A. Hierarchical fracture classification of proximal femur X-ray images using a multistage Deep Learning approach. Eur. J. Radiol.
**2020**, 133, 109373. [Google Scholar] [CrossRef] [PubMed] - El Asnaoui, K.; Chawki, Y. Using X-ray images and deep learning for automated detection of coronavirus disease. J. Biomol. Struct. Dyn.
**2020**, 1–12. [Google Scholar] [CrossRef] [PubMed] - Barreno, M.; Nelson, B.; Joseph, A.D.; Tygar, J. The security of machine learning. Mach. Learn.
**2010**, 81, 121–148. [Google Scholar] [CrossRef] [Green Version] - Kwon, H.; Yoon, H.; Park, K.W. Selective Poisoning Attack on Deep Neural Networks. Symmetry
**2019**, 11, 892. [Google Scholar] [CrossRef] [Green Version] - Kwon, H. Detecting Backdoor Attacks via Class Difference in Deep Neural Networks. IEEE Access
**2020**, 8, 191049–191056. [Google Scholar] [CrossRef] - Kwon, H.; Yoon, H.; Park, K.W. Multi-targeted backdoor: Indentifying backdoor attack for multiple deep neural networks. IEICE Trans. Inf. Syst.
**2020**, 103, 883–887. [Google Scholar] [CrossRef] - Kwon, H.; Lee, J. AdvGuard: Fortifying Deep Neural Networks against Optimized Adversarial Example Attack. IEEE Access
**2020**. [Google Scholar] [CrossRef] - Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Fawzi, A.; Fawzi, O.; Frossard, P. Analysis of classifiers’ robustness to adversarial perturbations. Mach. Learn.
**2018**, 107, 481–508. [Google Scholar] [CrossRef] - Shen, S.; Jin, G.; Gao, K.; Zhang, Y. Ape-gan: Adversarial perturbation elimination with gan. arXiv
**2017**, arXiv:1707.05474. [Google Scholar] - Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
- Xu, W.; Evans, D.; Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. In Proceedings of the Network and Distributed System Security Symposium (NDSS 2018), San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
- Kwon, H.; Yoon, H.; Park, K.W. Acoustic-decoy: Detection of adversarial examples through audio modification on speech recognition system. Neurocomputing
**2020**, 417, 357–370. [Google Scholar] [CrossRef] - Kwon, H.; Yoon, H.; Park, K.W. POSTER: Detecting audio adversarial example through audio modification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2521–2523. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 23–25 May 2016; pp. 582–597. [Google Scholar]
- Kwon, H.; Kim, Y.; Park, K.W.; Yoon, H.; Choi, D. Advanced ensemble adversarial example on unknown deep neural network classifiers. IEICE Trans. Inf. Syst.
**2018**, 101, 2485–2500. [Google Scholar] [CrossRef] [Green Version] - He, W.; Wei, J.; Chen, X.; Carlini, N.; Song, D. Adversarial example defense: Ensembles of weak defenses are not strong. In Proceedings of the 11th USENIX Workshop on Offensive Technologies (WOOT 17), Vancouver, BC, Canada, 14–15 August 2017. [Google Scholar]
- Carlini, N.; Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017; pp. 3–14. [Google Scholar]
- Goodfellow, I.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv
**2016**, arXiv:1607.02533. [Google Scholar] - Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2574–2582. [Google Scholar]
- McDaniel, P.; Papernot, N.; Celik, Z.B. Machine learning in adversarial settings. IEEE Secur. Priv.
**2016**, 14, 68–72. [Google Scholar] [CrossRef] - LeCun, Y.; Cortes, C.; Burges, C.J. MNIST Handwritten Digit Database; AT&T Labs: Florham Park, NJ, USA, 2010; Volume 2, Available online: http://yann.Lecun.Com/exdb/mnist (accessed on 5 March 2021).
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv
**2017**, arXiv:1708.07747. [Google Scholar] - Kwon, H.; Yoon, H.; Choi, D. Restricted evasion attack: Generation of restricted-area adversarial example. IEEE Access
**2019**, 7, 60908–60919. [Google Scholar] [CrossRef] - Kwon, H.; Kim, Y.; Yoon, H.; Choi, D. Random untargeted adversarial example on deep neural network. Symmetry
**2018**, 10, 738. [Google Scholar] [CrossRef] [Green Version] - Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbrucken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version] - Agarap, A.F. Deep learning using rectified linear units (relu). arXiv
**2018**, arXiv:1803.08375. [Google Scholar] - Kingma, D.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.
**2014**, 15, 1929–1958. [Google Scholar] - Athalye, A.; Engstrom, L.; Ilyas, A.; Kwok, K. Synthesizing robust adversarial examples. In Proceedings of the International Conference on Machine Learning, PMLR, Vienna, Austria, 25–31 July 2018; pp. 284–293. [Google Scholar]
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Query-efficient black-box adversarial examples (superceded). arXiv
**2017**, arXiv:1712.07113. [Google Scholar] - Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
- Xiao, C.; Li, B.; Zhu, J.Y.; He, W.; Liu, M.; Song, D. Generating adversarial examples with adversarial networks. arXiv
**2018**, arXiv:1801.02610. [Google Scholar] - Bottou, L. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 421–436. [Google Scholar]
- Hamdia, K.M.; Zhuang, X.; Rabczuk, T. An efficient optimization approach for designing machine learning models based on genetic algorithm. Neural Comput. Appl.
**2020**, 1–11. [Google Scholar] [CrossRef] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; pp. 2672–2680. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

**Figure 1.**Overview of the diversity training scheme: the target model ${M}_{T}$ is trained with adversarial examples obtained from various generation methods using the local model ${M}_{L}$. A, B, C, and D indicate fast gradient descent method (FGSM), iterative-FGSM (I-FGSM), DeepFool, and Carlini and Wagner (CW) methods, respectively.

**Figure 2.**In MNIST, adversarial examples generated by various generation methods for the local model.

**Figure 3.**In Fashion-MNIST, adversarial examples generated by various generation methods for the local model.

**Figure 4.**For MNIST, the attack success rate for adversarial examples generated with the holdout model using the without method, baseline method, and diversity training method.

**Figure 5.**For Fashion-MNIST, the attack success rate for adversarial examples generated with the holdout model using the without method, baseline method, and diversity training method.

**Figure 6.**The accuracy of original test data for the without method, baseline method, and diversity training method in MNIST and Fashion-MNIST.

**Table 1.**Local model architecture for MNIST and Fashion-MNIST. Conv. means convolutional layer. FC indicates fully connected layer.

Layer Type | Shape |
---|---|

Conv.with ReLU [30] | [3, 3, 32] |

Conv. with ReLU | [3, 3, 32] |

Max pooling | [2, 2] |

Conv. with ReLU | [3, 3, 64] |

Conv.with ReLU | [3, 3, 64] |

Max pooling | [2, 2] |

FC with ReLU | [200] |

FC with ReLU | [200] |

Softmax | [10] |

Parameter | MNIST | Fashion-MNIST |
---|---|---|

Optimizer | Adam [31] | Adam |

Learning rate | 0.1 | 0.1 |

Momentum | 0.9 | 0.85 |

Delay rate | - | - |

Dropout [32] | 0.5 | 0.5 |

Batch size | 128 | 128 |

Epochs | 20 | 20 |

**Table 3.**Holdout model architecture for MNIST and Fashion-MNIST. Conv. means convolutional layer. FC indicates fully connected layer.

Layer Type | Shape |
---|---|

Conv. + ReLU | [3, 3, 128] |

Conv. + ReLU | [3, 3, 64] |

Max pooling | [2, 2] |

Conv. + ReLU | [3, 3, 64] |

Conv. + ReLU | [3, 3, 64] |

Max pooling | [2, 2] |

FC + ReLU | [128] |

Softmax | [10] |

**Table 4.**Target model architecture for MNIST and Fashion-MNIST. Conv. means convolutional layer. FC indicates the fully connected layer.

Layer Type | Shape |
---|---|

Conv. with ReLU | [5, 5, 64] |

Conv. with ReLU | [5, 5, 64] |

Max pooling | [2, 2] |

Conv. with ReLU | [5, 5, 64] |

Max pooling | [2, 2] |

FC with ReLU | [128] |

Softmax | [10] |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kwon, H.; Lee, J.
Diversity Adversarial Training against Adversarial Attack on Deep Neural Networks. *Symmetry* **2021**, *13*, 428.
https://doi.org/10.3390/sym13030428

**AMA Style**

Kwon H, Lee J.
Diversity Adversarial Training against Adversarial Attack on Deep Neural Networks. *Symmetry*. 2021; 13(3):428.
https://doi.org/10.3390/sym13030428

**Chicago/Turabian Style**

Kwon, Hyun, and Jun Lee.
2021. "Diversity Adversarial Training against Adversarial Attack on Deep Neural Networks" *Symmetry* 13, no. 3: 428.
https://doi.org/10.3390/sym13030428