Adversarial examples are one of the most intriguing topics in modern deep learning. Imperceptible perturbations to the input can fool robust models. In relation to this problem, attack and defense methods are being developed almost on a daily basis. In parallel, efforts are being made to simply pointing out when an input image is an adversarial example. This can help prevent potential issues, as the failure cases are easily recognizable by humans. The proposal in this work is to study how chaos theory methods can help distinguish adversarial examples from regular images. Our work is based on the assumption that deep networks behave as chaotic systems, and adversarial examples are the main manifestation of it (in the sense that a slight input variation produces a totally different output). In our experiments, we show that the Lyapunov exponents (an established measure of chaoticity), which have been recently proposed for classification of adversarial examples, are not robust to image processing transformations that alter image entropy. Furthermore, we show that entropy can complement Lyapunov exponents in such a way that the discriminating power is significantly enhanced. The proposed method achieves 65% to 100% accuracy detecting adversarials with a wide range of attacks (for example: CW, PGD, Spatial, HopSkip) for the MNIST dataset, with similar results when entropy-changing image processing methods (such as Equalization, Speckle and Gaussian noise) are applied. This is also corroborated with two other datasets, Fashion-MNIST and CIFAR 19. These results indicate that classifiers can enhance their robustness against the adversarial phenomenon, being applied in a wide variety of conditions that potentially matches real world cases and also other threatening scenarios.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited