Next Article in Journal
Fewer Dimensions for Higher Thermal Performance: A Review on 2D Nanofluids
Previous Article in Journal
Multicriteria Analysis of a Solar-Assisted Space Heating Unit with a High-Temperature Heat Pump for the Greek Climate Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ShuffleDetect: Detecting Adversarial Images against Convolutional Neural Networks

1
Robert Bosch GmbH, 013937 Bucharest, Romania
2
Faculty of Science, Technology and Medicine, University of Luxembourg, L-4364 Esch-sur-Alzette, Luxembourg
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(6), 4068; https://doi.org/10.3390/app13064068
Submission received: 3 February 2023 / Revised: 28 February 2023 / Accepted: 20 March 2023 / Published: 22 March 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Recently, convolutional neural networks (CNNs) have become the main drivers in many image recognition applications. However, they are vulnerable to adversarial attacks, which can lead to disastrous consequences. This paper introduces ShuffleDetect as a new and efficient unsupervised method for the detection of adversarial images against trained convolutional neural networks. Its main feature is to split an input image into non-overlapping patches, then swap the patches according to permutations, and count the number of permutations for which the CNN classifies the unshuffled input image and the shuffled image into different categories. The image is declared adversarial if and only if the proportion of such permutations exceeds a certain threshold value. A series of 8 targeted or untargeted attacks was applied on 10 diverse and state-of-the-art ImageNet-trained CNNs, leading to 9500 relevant clean and adversarial images. We assessed the performance of ShuffleDetect intrinsically and compared it with another detector. Experiments show that ShuffleDetect is an easy-to-implement, very fast, and near memory-free detector that achieves high detection rates and low false positive rates.

1. Introduction

Convolutional neural networks (CNNs) trained on large sets of examples are dominant tools for object recognition [1]. Although CNNs are capable of accurately classifying new images into object categories, they can nevertheless be deceived by adversarial attacks [2], whose strategies generally consist of altering inputs with perturbations that lead to classification errors.
These attacks can be classified in terms of the amount of information that the attackers have at their disposal. Gradient-based attacks (e.g., [3,4,5,6]) require information about the CNN’s architecture and weights. Transfer-based attacks (e.g., [7,8,9]) require less insider knowledge about the CNN but query the CNN for a set of inputs, and the collected information is used to create a substitute model, similar to the targeted CNN. This substitute model is attacked by gradient-based methods, leading to adversarial images that also fool the target CNN. Score-based attacks (see [10]) are even less demanding. They do not have access to the training data, model architecture, or CNN parameters. They only make use of the CNN’s predicted output probabilities for all or a subset of object classes.
Ideally, security issues posed by adversarial attacks are prevented by methods that detect malicious input images, potentially exclude them from further processing by the CNN, and alert the user. Such detectors may be tailor-made for a specific type of attack or applied efficiently to a large variety of attacks. Their performances are encompassed by a series of indicators that assess how far their outputs can be trusted, and the memory overhead, time, or complexity required to finish their tasks.
These detectors can be classified as supervised and unsupervised. On the one hand, supervised techniques have knowledge of adversarial images, and may attempt to reinforce CNNs by adding adversarial images to the training set [4]. These techniques are particularly effective when attacks are known in advance. On the other hand, unsupervised techniques [11,12,13,14] operate without prior access to adversarial images. Instead, they apply transformations to the input image and analyze the consistency of predictions between the input image and its transformed versions. These techniques operate on the premise that CNNs maintain consistent predictions for clean images.
This paper introduces ShuffleDetect as a new unsupervised method for the detection of adversarial images; it is simple to implement and works efficiently against adversarial images created by a series of 8 different attacks applied to 10 different ImageNet-trained CNNs.
To summarize, given image  I , ShuffleDetect assesses whether  I  (resized to a square  N × N  image, if necessary, to fit the CNN’s input size) is adversarial or not for a given CNN  C . Firstly, the algorithm extracts the dominating category  Dom C ( I )  in which  C  classifies  I . Secondly, the algorithm essentially “splits”  I  into non-overlapping patches of an equal size  s × s . Thirdly, for each permutation  σ  of a set of t permutations of these patches, the algorithm creates a shuffled image  s h ( I , σ ) , and requires from  C  the dominating category  Dom C ( s h ( I , σ ) )  in which  C  classifies the shuffled image. Lastly, the algorithm compares the outcome with  Dom C ( I ) . The detector classifies an input image  I  as “adversarial” if the proportion of permutations  σ  among t permutations is such that the dominant categories of  I  and  s h σ ( I , s )  differ by more than a certain threshold value  R t h .
The remainder of this paper is organized as follows. Section 2 provides an overview of how CNNs perform image classification, defines the attack scenarios and adversarial image requisites, and fixes some concepts and notations used throughout the article. Section 3 is devoted to related works, provides the topography of detection methods, and lists the main evaluation criteria used to assess their performances. The design of the ShuffleDetect method is detailed in Section 4, where the pseudo-code of the  ShuffleDetect C , R t h , t  algorithm is also given explicitly.
To evaluate the reliability of our ShuffleDetect method, we tested it against a large set of adversarial attacks deceiving a significant series of CNNs. Section 5 lists the 10 selected CNNs trained on ImageNet, as well as the reasons for their choices, the 100 clean ancestor images, and the specific scenarios used in our experiments. Section 6 lists the 8 attacks that are considered in this paper, seven of which are “white-box”, while one is “black-box”. Whenever applicable, we performed both the targeted and untargeted versions of the attacks. A total of 15,000 attack runs led to 9580 relevant adversarial images: 2975 adversarial images for the targeted scenario and 6505 adversarial images for the untargeted scenario, as described in Section 7.
Section 8 specifies the parameters used by our detector for images handled by CNNs trained on ImageNet. This section essentially amounts to measuring the outcomes of  ShuffleDetect σ C  individually for each permutation  σ , each CNN  C , each clean image, and each image adversarial for  C , obtained by each attack for each scenario. The results lead to the selection of candidates for the threshold value  R t h . The performance of the detector  ShuffleDetect C , R t h , t  is then assessed in Section 9 against the indicators given in Section 3 for the candidate values of  R t h . Beyond this intrinsic performance assessment, ShuffleDetect is compared with the well-known detector Feature Squeezing in Section 10.
Section 11 summarizes our findings, specifies our recommendations for the values of the parameters relevant to ShuffleDetect, and indicates some directions for future work.
Additional figures, tables, and relevant data are provided in the Appendix, including the original clean images, the permutations used, and individual performances of ShuffleDetect per CNN per attack per scenario.
Algorithms and experiments were implemented using Python 3.8 [15] with NumPy 1.19 [16] and PyTorch 1.9 [17] (including in particular the Adversarial Robustness Toolbox Python library used in Section 6). In addition, we used Maple 2022 to create the permutations used in Section 8 and Section 9. The main computations were performed on nodes using Nvidia Tesla V100 GPUs, which are part of the IRIS HPC Cluster at the University of Luxembourg [18].

2. CNNs and Adversarial Images

A CNN, which is expected to perform image classification, is first trained on a large dataset  S  of images. Training consists of sorting the given images into a finite set of predefined categories. The categories  c 1 , , c , their number , and the images used in the process are associated with  S , and are common to any CNN trained on  S . The training phase of a CNN consists of two phases. Firstly, the CNN is given the training images, and, for each training image, a vector of length , where each real-value component assesses the probability that the training image represents an object in the corresponding category. Secondly, CNN is challenged against a validation set of images that assesses its ability to sort images accurately. Once trained, a CNN can be exposed to an arbitrary image, and perform its classification according to categories.
An important, albeit technical, issue involves the sizes of the images. While the sizes of the images of  S  are arbitrary and may vary from one image to another, a CNN handles images of a fixed input size. Therefore, a resizing process is usually necessary to adapt a given image to the input size of the CNN before classification. To simplify the notation, we consider that this resizing process has been performed, and the input size handled by the CNN is square (Section 5 specifies which resizing function is used in the experiments). We also often identify image  I  with its resized version, which fits the input size of the CNN.
Image classification and label values. Concretely, given an input image  I , the trained CNN produces a classification output vector
o I = o I [ 1 ] , , o I [ ] ,
where  0 o I [ i ] 1  for  1 i , and  i = 1 o I [ i ] = 1 . Each component  o I [ i ]  defines the  c i -label value measuring the probability that image  I  belongs to the category  c i . Consequently, the CNN classifies image  I  as belonging to the category  c k  if  k = arg max 1 i ( o I [ i ] ) . One denotes  ( c k , o I [ k ] )  this outcome, and  Dom C ( I ) = c k  the dominating category in which  C  classifies  I . The higher the label value  o I [ k ] , the higher the confidence that  I  represents an object of the category  c k .
Adversarial image requisites. Assume that we are given  C  a trained CNN,  c a  a category among the possible categories, and  A  an image classified by  C  as belonging to  c a , with  τ a  its  c a -label value.
For any attack scenario that we consider in this paper (namely the target or the untargeted scenario, as made precise below), we assume that the attack aims at creating a new adversarial image  D ( A ) , which remains so close to the ancestor’s clean image A  that a human would not be able to distinguish between  D ( A )  and  A . The quantity  ϵ ( A , D ( A ) ) , which controls (or restricts) the global maximum amplitude allowed for the value modification of each individual pixel of  A  to obtain  D ( A ) , numerically assesses this human perception.
In the untargeted scenario C  is only required to classify the adversarial image  D ( A )  as any class  c c a . In the target scenario, one selects, a priori, a target category  c t c a . One would expect the adversarial image  D ( A )  to be classified by  C  as belonging to the target category  c t , without any requirements on the  c t -label value beyond it being strictly dominant among all label values (this coincides with the concept of a good enough adversarial image introduced in [19]; see [19] for variants of the target scenario involving  τ -strong adversarial images).
Throughout the remainder of this article, any attack leading to the creation of adversarial images will be referred to as  a t k .

3. Related Works and Evaluation Criteria

As pointed out in the Introduction (Section 1), addressing the security issues posed by adversarial attacks often requires some warning that an attack is indeed taking place. The role of detectors is key in this process because their principal role is to decide whether an image is clean or not. Such detectors can be categorized into two groups: supervised and unsupervised detectors (see [20]).
Supervised detectors are designed and trained with images known to be adversarial and obtained from one or more attacks. In contrast, unsupervised detectors require no prior access to adversarial images and are, therefore, not limited to any particular type of attack. This suggests that unsupervised methods, which are more resource-efficient because they do not require any training for new attacks, may be more robust against new adversarial attacks than supervised attacks.
Numerous detection methods from both categories have been introduced (some of which aim at detecting adversarial images for ImageNet-trained CNNs). One can mention the following four detection methods referred to in [20]: the supervised LID [21], the unsupervised NIC [22], ANR [14], and FS [13].
The supervised Local intrinsic dimensionality (LID) method extracts intermediate layer activations from the CNN when fed with either clean or adversarial inputs. At each layer, the activations stemming from the image (clean or adversarial) and the activations stemming from a limited number of clean neighbors of the image are used to compute the local intrinsic dimensionality. The authors of [21] found that adversarial images tend to have higher local intrinsic dimensionality values. This property is exploited using the extracted values as features to train a binary classifier that declares an image as clean or adversarial.
The network invariant approach (NIC) is an unsupervised method that declares an image to be adversarial if it is out-of-distribution, and clean if it is in distribution. This notion refers to the distribution observed for the ImageNet training set, which consists of only clean images, for each CNN layer activation. For a given image, one obtains a collection of layer-level declarations, indicating whether the image is in distribution or not for that particular layer. The detector’s final declaration is an aggregation of all the layer-level declarations.
The adaptive noise reduction (ANR) algorithm is an unsupervised method that uses scalar quantization and smoothing spatial image filters to squeeze input images. The detector compares the categories predicted by the CNN for an image and for its squeezed version. If these categories are not identical, the image is considered to be adversarial.
The feature squeezing (FS) algorithm is an unsupervised method that applies depth reduction to an image color bit, a median image filter for local smoothing, and a variant of the Gaussian kernel for non-local spatial smoothing, leading to a squeezed image. The detector compares the output vectors predicted by the CNN for squeezed and unsqueezed images. The  L 1  distance between the two vectors is measured, and if it exceeds a certain threshold, the image is considered adversarial.
Remark. Ideally, we compare ShuffleDetect with well-known detectors, among which NIC, LID, ANR, and FS, are introduced above. However, our attempt to do so led us to face several highly challenging issues, among which the following: The codes of most of these detectors are not available, the claimed performances are on CNNs different from ours (Inception V3 trained on ImageNet for instance), or on CNNs trained on different datasets than ImageNet (such as CIFAR10 or MNIST for instance, which also implies that these CNNs use images of smaller sizes than ours), the used attacks are not systematically and clearly documented, the definitions of the used performance indicators vary from one paper to another. A thorough comparison would require implementing all relevant alternative detectors essentially from scratch, and challenging them under the same conditions as ShuffleDetect. We do not undertake this complete task here and keep it for future work. Nevertheless, we provide in Section 10 a limited comparison between ShuffleDetect and FS.
Evaluation criteria. In the present paper, the performance of the detector is evaluated with the following indicators [20]:
  • Detection rate (DR) represents the percentage of adversarial images that are correctly identified as such by the detector.
  • False positive rate (FPR) represents the percentage of clean ancestor images that are identified as adversarial by the detector.
  • Complexity refers to the time required to train a supervised detector.
  • Overhead refers to the overall memory and computation resources necessary to use the detector (supervised or not). It depends on the number of parameters and size of the architecture of the detector, when applicable.
  • Inference time latency is the amount of time required by the detector to run on an image. If the method is supervised, the inference time latency does not take into account the time needed to train the detector (this part is already taken into account in the Complexity measurements).
  • Precision, Recall, and F1 scores used to quantify the detection performance are defined by the following formulae:
    Precision = TP TP + FP
    Recall = TP TP + FN
    F 1 = 2 × Recall × Precision Recall + Precision
    where TP (true positive) is the number of correctly detected adversarial images, FN (false negative) is the number of adversarial images that escaped the detector, and FP (false positive) is the number of clean images declared adversarial by the detector. These formulae are pertinent whenever the number of clean images is equal to the number of adversarial images created by a given attack for a given CNN. This aspect is taken into account in Section 9.

4. ShuffleDetect

The general goal of the shuffling process is to interchange different parts of an image. We noticed in [23] that if one shuffles a clean image, CNNs usually classify the shuffled image into the same category as the unshuffled clean image. We also noticed that the situation differs from an adversarial image because CNNs usually tend to classify the shuffled adversarial image that is no longer in the same category as the unshuffled adversarial image, at least for those created by the two attacks of [23] (which are considered again in Section 6).
These findings, valid for the two attacks, led to the detection method exposed below, which is based on the assumption that shuffling affects the adversarial noise more than it affects the image’s original components, whichever the attack.
Shuffling an image. One is given image  I  of fixed (square) size  n × n  fitting the CNN’s input size, and an integer s, such that patches of size  s × s  create a partition in the mathematical meaning of the term, or a grid in the more visual meaning of the term, of  I . This latter condition requires that s divides n since the number of patches is the integer  N s = n s 2 . It is convenient in practice to label the patch  P i , j , positioned in the  i th  column and  j th  row of the grid, as  P k , where  k = n s ( i 1 ) + j  for  1 i , j n s  (see Table A2 in Appendix B for an example, which is used in our experiments actually).
The set of possible scrambles of an image of size  n × n  is essentially parametrized by the symmetric group  S N s  of permutations of  N s  letters since  S N s  operate on the set of  N s  patches. Indeed, a permutation  σ S N s  is represented as a finite product of cycles, each of the form  ( k 1 , k 2 , , k M ) , these cycles having two-by-two disjoint supports. Each cycle symbolizes that the M patches  P k 1 , P k 2 , , P k M , associated to  k 1 , k 2 , , k M , respectively, are rotated in a circular way:  P k 1  takes the position of  P k 2 , and so on until  P k M  takes the position of  P k 1 .
The group  S N s  is of order  N s ! , and is non-trivial provided s is a strict divisor of n, which we assume from now on.
Given  σ S N s , one denotes by  s h σ ( I , s )  the image obtained from  I  by swapping its patches according to  σ . Both the unshuffled image  I  and the shuffled image  s h σ ( I , s )  are given to the CNN for classification. Figure 1 illustrates the process with the partition of an image into 4 patches  P 1 , P 2 , P 3 , P 4 : The permutation  σ S 4 , selected among the altogether  4 ! = 24  elements of  S 4 , is defined as the product of two cycles of length 2, which actually amounts to interchanging the patches on the diagonals.
ShuffleDetect. To some extent, the global design of the algorithm ShuffleDetect mimics the design of classical probabilistic primality tests (such as those of Fermat, Solovay–Strassen, or Miller–Rabin, see [24], chapter 7 for instance), where the validity of an equation, which should be satisfied if a given integer p is a prime, is checked for a series of rounds until either one has gained confidence (parameterized by the number of rounds) that p is probably a prime or the equation is not satisfied for one of the rounds, in which case one knows that p is not a prime. In our context, the equation, which assesses the detection of whether image  I  is adversarial or not, consists of comparing the dominating categories, given by a given CNN  C , before and after shuffling, for many permutations.
With consistent notations, one round of this detection method for image  I  is as follows. One picks at random a permutation  σ S N s , with  σ i d . Unless all of the patches of  I  addressed by  σ  are identical (what happens if all  N s  patches of  I  are identical, which occurs, for instance, a fortiori if  I  is absolutely monochrome throughout all its pixels),  σ i d  ensures that  I s h σ ( I , s ) . The output of ShuffleDetect for  I  for the specific permutation  σ , denoted by  ShuffleDetect σ C ( I ) , is:
1 if Dom C ( I ) Dom C ( s h σ ( I , s ) ) , 0 if Dom C ( I ) = Dom C ( s h σ ( I , s ) ) .
The image  I  is said  σ -adversarial if  ShuffleDetect σ C ( I ) = 1 , and  σ -clean if  ShuffleDetect σ C ( I ) = 0 .
For the full ShuffleDetect algorithm, written as  ShuffleDetect C , R t h , t ( I )  for the considered CNN  C  and image  I , one chooses a fixed number  t [ 1 , N s ! ]  of rounds. For obvious practical reasons, t should remain relatively small, in particular far smaller than  N s ! . Then one selects at random t two-by-two distinct permutations  σ 1 , , σ t S N s , with  σ r i d  for all  1 r t . One performs the successive t rounds  ShuffleDetect σ 1 C ( I ) , , ShuffleDetect σ t C ( I ) .
The threshold ratio  R t h  is fixed as a percentage at will. For any number t of permutations, the threshold ratio defines the integer  s t h = t R t h , which is the number of permutations, such that  R t h s t h t .
The algorithm  ShuffleDetect C , R t h , t  declares image  I :
  • as “adversarial” for  C  if the output of  ShuffleDetect σ C ( I )  is  σ -adversarial for more than  s t h  of the t permutations  σ 1 , , σ t ,
  • and as “clean” otherwise.
In more algorithmic terms,  ShuffleDetect C , R t h , t  on image  I  works as described in the pseudo-code Algorithm 1. The user decides on the CNN  C , the degree of trust  R t h , and the number of permutations t that index the rounds of the loop composing Steps (7) to (13). Once these parameters are chosen, Steps (1), (2), (3) are essentially setups, defined by the choices made for the parameters  C , R t h , t , while Steps (4), (5), (6) are essentially an initializing phase, depending only on  I  and  C . The choice of  R t h  clearly determines the values of the indicators assessing the performance of the ShuffleDetect method (see Section 8 for a discussion on this issue and a recommended value).
Algorithm 1  ShuffleDetect C , R t h , t ( I )  pseudo-code
  1:
Compute and store t permutations  σ 1 , , σ t
  2:
Select the size s for the patches
  3:
Compute the integer  s t h = t R t h
  4:
From  C , obtain the classification output vector  o I
  5:
Extract  Dom C ( I )
  6:
Set  N = 0
  7:
For i from 1 to t run  ShuffleDetect σ i C ( I )  as follows:
  8:
 Create  s h σ ( I , s ) .
  9:
 From  C , obtain the classification output vector  o s h σ ( I , s )
10:
 Extract  Dom C ( s h σ ( I , s ) ) .
11:
 Compare  Dom C ( I )  and  Dom C ( s h σ ( I , s ) ) .
12:
 Output 0 if they match, and 1 if they do not. In this latter case,  N : = N + 1 .
13:
end
14:
Output “adversarial” if  N s t h , and “clean” otherwise.
Remarks. (1) Note that the process of comparing dominant categories does not require a precise assessment of their actual label values. Even in the case where an image is considered  σ -clean for a given permutation  σ , it is likely that, although the same category dominates both in the unshuffled and shuffled versions of the image, its label values differ strongly between both images.
(2) Although there is some flexibility a priori in setting the value of parameter s at will, there are choices that turn out to be more appropriate for a given CNN’s input size (see Section 8 for the choice of s and its rationals for the experiments performed in this paper).
(3) When assessing many images of the same size, even if one fixes the number (t) of rounds once and for all, which is convenient in practice, there is still some flexibility in when to select the permutations. One option is to “reset” the random choice of t permutations for each image to be tested. Another option is to proceed to the choice of the permutations at the same time as one chooses the value t, so that both t and the set of t random permutations  σ 1 , , σ t  are decided once for all images to test. There are pros and cons for both options, the former being (slightly) more time-consuming and (slightly) more memory-consuming but less biased, the latter saving time, allowing for an easier comparison and reproduction of the experiments, but providing a possible security leak because an attacker may ultimately guess what the t-selected permutations are and adapt to them accordingly. See Section 8 for the choices made in our experiments.
(4) Although there are theoretical measures and bounds of the proportion of composite numbers declared probably primes after t rounds of a probabilistic test, there is no such thing regarding the proportion of adversarial images that are declared clean after t rounds of ShuffleDetect. Therefore, for the time being, our choice of parameters is purely experimental.
(5) One can generalize the ShuffleDetect method thanks to the group of symmetries that preserve the square, namely the (non-abelian) dihedral group  D 8  of order 8. Indeed, with consistent notations, and since each patch is a square, one could add to the action of a cycle  ( k 1 , k 2 , , k M )  of a permutation a randomly chosen sequence of elements  γ k 1 , γ k 2 , , γ k M D 8 , which will act on the respective corresponding patches as well. We do not further explore this direction here, and stick to the exposed design of ShuffleDetect, which actually amounts to taking the identity for all symmetries  γ k j D 8 .

5. The CNNs, the Scenarios, the Ancestor Images

The selection of CNNs used in our experiments followed three criteria involving practicality, stability, and comparability. First, we required the availability of the pre-trained versions of the CNNs in the PyTorch [17] library. Moreover, we required the CNNs to have stable architecture. Finally, to allow comparisons, despite their diversity in terms of architecture (number of layers, number of parameters, etc.), we required all CNNs to have the same image input size, and for this input size to be square (note that this later requirement is fulfilled by most CNNs in general).
This led us to select the following 10 well-known CNNs, trained on ImageNet [25], and with an input size of  224 × 224 , namely  C 1 =  VGG16 [26],  C 2 =  VGG19 [26],  C 3 =  ResNet50 [27],  C 4 =  ResNet101 [27] and  C 5 =  ResNet152 [27],  C 6 =  DenseNet121 [28],  C 7 =  DenseNet169 [28],  C 8 =  DenseNet201 [28],  C 9 =  MobileNet [29], and  C 10 =  MNASNet [30].
Then, from the 1000 categories of ImageNet, we picked at random 10 ancestor classes and 10 corresponding target classes, as shown in Table 1.
For each of the 10 ancestor classes ( 1 p 10 ), we randomly selected 10 ( 1 q 10 ) ancestor images  A q p  from the ImageNet validation set, classified as belonging to  c a p  by the 10 CNNs. Whenever necessary, these ancestor images were resized to the CNNs common input size  224 × 224 , thanks to the bilinear interpolation function [31]. Figure A1 and Table A1 in Appendix A present the 100 ancestor images  A q p  and their original sizes.
Starting with these 100 ancestor images, for each of the 10 CNNs listed above, the attacks, described in Section 6, were aimed at creating adversarial images either for the target scenario  ( c a p , c t p )  of Table 1 (all CNNs produced negligible  c t p -label values for the ancestors as a starting point) or for the untargeted scenario (in which case, it does not matter which category  c c a  becomes dominant).

6. The 8 Attacks

This section presents the main features of the attacks employed in this paper and provides the chosen values for their parameters. Except for the EA attack, all attacks were applied using the Adversarial Robustness Toolbox (ART) [32], which is a Python library that includes several attack methods. ART functions and parameters used are specified in italics.

6.1. EA

Reference [19] is a black-box evolutionary algorithm-based attack that creates an initial population consisting of copies of the ancestor X and modifies their pixels over generations. The goal of the EA is encoded in its fitness function,  f i t ( I n d ) = o [ c t ] I n d , where  I n d  is a population individual and  o [ c t ] I n d  is the individual’s  c t  probability given by the CNN. The population size is set to 40, the magnitude by which a pixel could be mutated in one generation is  α = 1 / 255 , the maximum mutation magnitude is  ϵ = 8 / 255 , and the maximum number of generations is  N =  10,000. We run both the targeted and untargeted versions of this attack. In the targeted case, for all CNNs, the threshold that dictates the adversarial image’s minimum  c t  probability was set to meet the good enough requirements of [19].

6.2. FGSM

Reference [4], a white-box attack, is a one-step algorithm that calculates the gradient of the loss function  J ( X , y )  with respect to input X, to find the direction in which to modify X. In its untargeted version, the adversarial image is
X a d v = X + ϵ s i g n ( Δ X J ( X , c a ) ) ,
while in its targeted version, it is
X a d v = X ϵ s i g n ( Δ X J ( X , c t ) ) .
In the above equations,  ϵ  is the perturbation size, defined in the implementation by  e p s = 2 / 255 , and  Δ  is the gradient function, as used in [4]. We use the FastGradientMethod function with the default value  e p s _ s t e p = 0.01 . We run both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.3. BIM

Reference [3], a white-box attack, is an iterative version of FGSM. The adversarial image  X 0 a d v  is initialized with X and is gradually updated for a given number of steps N, as follows:
X + 1 a d v = C l i p ϵ { X a d v + α s i g n ( Δ A ( J C ( X a d v , c a ) ) ) }
in its untargeted version and
X + 1 a d v = C l i p ϵ { X a d v α s i g n ( Δ A ( J C ( X a d v , c t ) ) ) } ,
in its targeted version, where  α  is the step size at each iteration and  ϵ  (which coincides with the ART function eps) is the maximum perturbation magnitude of  X a d v = X N a d v . We use the BasicIterativeMethod function with the default values  e p s _ s t e p = 0.01 m a x _ i t e r = i n t ( e p s × 255 × 1.25 ) , and  e p s = 1 / 255 . We run with both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.4. PGD Inf

Reference [33], a white-box attack, is similar to the  B I M  attack, with the difference that the image at the first attack iteration is not initialized with X, but rather with a random point situated within an  L -ball around X. The distance between X and  X a d v  is measured using  L  and the  ϵ  parameter represents the maximum perturbation magnitude. We use the ProjectedGradientDescent function with  n o r m = inf , and the default values  e p s _ s t e p = 0.01 b a t c h _ s i z e = 1 , and  e p s = 1 / 255 . We run with both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.5. PGD  L 1

Reference [33], a white-box attack, is similar to PGD Inf, with the difference that  L  is replaced with  L 1 . We use the ProjectedGradientDescent function with  n o r m = 1 , and the default values  e p s _ s t e p = 4 b a t c h _ s i z e = 1 , and  e p s = 30 . We run with both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.6. PGD  L 2

Reference [33], a white-box attack, is similar to PGD Inf, with the difference that  L  is replaced with  L 2 . We use the ProjectedGradientDescent function with  n o r m = 2 , and the default values  e p s _ s t e p = 0.1 b a t c h _ s i z e = 1 , and  e p s = 1 . We run with both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.7. CW Inf

Reference [5], a white-box attack, solves the following optimization problem in its untargeted version:
min δ | | δ | | + c g ( x ) , such that x [ 0 , 1 ] n ,
where g ( x ) = max Z ( x ) a max i a Z ( x ) i , 0
and  Z ( x )  is the pre-softmax classification output. The measure used to evaluate the difference between the ancestor X and adversarial  X a d v  is  L . We use the CarliniLInfMethod function with the default values of the parameters. We ran with both  t a r g e t e d = T r u e  and  t a r g e t e d = F a l s e , corresponding to targeted and untargeted attacks, respectively.

6.8. DeepFool

Reference [34], a white-box attack, is an untargeted attack that calculates the minimum perturbation  δ *  with which to modify X such that its classification label changes, where  δ * = f ( X ) w / | | w | | 2 f ( X ) = w T x + b F = { x : f ( x ) = 0 } . The attack solves the following optimization problem:
arg min δ l | | δ l | | 2 such that f ( x l ) + Δ f ( x l ) T δ l = 0 .
The algorithm stops immediately after the label is changed, and  X a d v = X + δ * . We use the DeepFool function with the default values of the parameters.
The seven attacks EA, FGSM, BIM, PGD Inf, PGD L1, PGD L2, and CW Inf are used both in the untargeted and in the target scenario, and the remaining DeepFool attack is used only in the context of the untargeted scenario. Apart from the black-box EA attack, all others are white-box attacks.

7. The Adversarial Images Obtained by the 8 Attacks

For each CNN  C k  provided in Section 5, we run each of the 8 attacks  a t k  given in Section 6, either for the untargeted scenario or for the target scenario whenever applicable, for the (potentially resized) 100 ancestor images  A q p , referred to in Section 5, and pictured in Figure A1, Appendix A. A successful attack for the untargeted scenario results in the image  D k a t k , untarget ( A q p ) , adversarial for  C k  for that specific scenario. Mutatis mutandis with an adversarial image  D k a t k , target ( A q p )  for the target scenario.
Since there are 8 untargeted and 7 targeted attacks, this amounts to  ( 8 + 7 )  attacks  × 10  CNNs  × 10  ancestor classes  × 10  images per ancestor class. Out of these altogether 15,000 attack runs, 9746 were successful. More precisely, 6727 out of the 8000 untargeted attacks were successful, and there were 3019 successful targeted attacks out of the 7000 attempts, as detailed in Table 2.
Clearly, the number of successful attacks should be statistically relevant. We define this condition as satisfied if an attack succeeds in at least  35 %  of the cases for a given CNN (this value appears as a reasonable trade-off based on the experiments leading to Table 2). This leads us to disregard the targeted attacks performed by FGSM and CW Inf for all CNNs, as well as all attacks (untargeted and targeted) performed by PGD L1 and the untargeted attack of FGSM on  C 1 . The remaining 9580 statistically relevant successful attacks are listed in Table 3. The corresponding 2975 adversarial images for the target scenario and 6505 adversarial images for the untargeted scenario are considered in subsequent experiments.

8. Parameters and Experiments Performed on  ShuffleDetect σ C

In what follows, we essentially consider  ShuffleDetect σ C  for each individual permutation  σ , each CNN  C , each clean image, and each image that is adversarial against  C . Altogether, the method is applied to all (resized if necessary) ancestors  A q p  on the one hand, as well as to all 2975 successful adversarials  D k a t k , target ( A q p )  and all 6505 successful adversarials  D k a t k , untarget ( A q p )  that compose Table 3 on the other hand. The ShuffleDetect parameters are specified below.
Size of patches, number of permutations, and  Ψ C ( t , s , Ω ) . Firstly, we selected  s = 56  based on experiments detailed in [23]. Indeed, Table 4, extracted from [23], shows the average outcome for  2 × 437  adversarial images obtained from 84 common ancestor images (of size  224 × 224 ) for the same 10 CNNs considered here. The shuffling process was performed in [23] only for one permutation  σ  per value of s (hence  t = 1  in this case) to obtain the values of Table 4.
The experiments performed in [23] show that among the four considered possibilities,  s = 56  provides an optimal balance between the proportion of clean ancestors that are correctly declared “clean” ( 67.6 % ), and the proportion of adversarial images that are correctly declared “adversarial” ( 99.6 %  for the adversarial images created by the EA, and  85.9 %  for those created by BIM) by our method. The choice of  s = 56  being made, there are consequently  4 2 = 16  patches of size  56 × 56 , and the symmetric group  S 16  has  16 ! > 2 . 10 13  different permutations.
Secondly, to keep the computations manageable, we selected at random 100 permutations (they are given in Table A3 in Appendix B). For  1 t 100 , one defines  P t  as the set of the first t permutations. One has  P t 1 P t 2  if  t 1 t 2 . In particular, the first permutation  σ 1  is common to all sets  P t , the second permutation  σ 2  is common to all sets  P t  for  t 2 , etc.
Given a set  Ω  of images and  C  a CNN, one defines the function  Ψ C ( t , s , Ω )  as the proportion of images in  Ω  declared  σ -adversarial for s out of the first t permutations. In other words, for t and s such that  1 s t 100 , one has:
Ψ C ( t , s , Ω ) = # { I Ω such that ShuffleDetect σ C ( I ) = 1 for at least s permutations σ P t } # { I Ω } .
Geometrically,  C  and  Ω  being fixed,  Ψ C ( t , s , Ω )  defines a discrete surface. For a given  C , this function provides an assessment of the FPR value of the ShuffleDetect method for  C  by choosing for  Ω  a set of images known to be clean. This function also provides an assessment of the DR value by choosing for  Ω  a set of images known to be adversarial for  C .
As already stated in Section 4, the actual values of FPR and DR are determined by the choice of the threshold ratio  R t h . Its value is fixed as a consequence of the experiments performed on clean images, on the one hand, and adversarial images, on the other hand.
Assessment of the clean images. In the first step, we take for  Ω  the set  Ω c l e a n  of 100 clean ancestors  A q p  represented in Figure A1 (Appendix A). For  C = C , one computes  ShuffleDetect σ C k ( A q p )  for all 100 permutations  σ P 100 . This leads to the 10 histograms represented in Figure A2 in Appendix C. An example of the outcomes is illustrated in Figure 2a for  C 1  = VGG16, where each vertical bar assesses the number of clean images classified as adversarial for a number of permutations given on the x-axis, out of the 100 possible permutations. The notations  [ a , b ]  and  ( a , b ]  indicate that the number of permutations is between a and b, with both included in the former case and a excluded in the latter case. The average outcome (mutatis mutandis) for 10 CNNs is shown in Figure 2b.
Over the 100 clean images, on average, over the 10 CNNs, an image is declared adversarial by  34.7 %  of the 100 considered permutations, as indicated in Figure 2b. Table 5 shows that this percentage varies between  26.4 %  (for  C 7 ) and  44.4 %  (for  C 9 ).
The last row of Table 5 provides an estimate of a realistic FPR, which serves as a lower bound, or an “incompressible” FPR, whatever the choice of the parameter  R t h . On average, over the 10 CNNs,  Ψ C ( 100 , 91 , Ω clean ) = 15 % , and its value varies between  8 %  (for  C 8 ) and  23 %  (for  C 1  and  C 9 ). In this context, we noticed that some individual clean images were declared adversarial by  ShuffleDetect σ C  for all CNNs  C  by a large number (and, therefore, a proportion) of permutations  σ . Indeed, the 7 clean images  A 3 9 A 6 5 A 6 9 A 9 1 A 9 5 A 9 6 , and  A 9 7  are declared adversarial for all CNNs by more than 91 permutations. Whatever the ratio threshold  R t h , these 7 images contribute substantially to the FPR of  ShuffleDetect C , R t h , t  for a specific CNN individually, and a fortiori for the FPR average taken over all CNNs.
Assessment of the adversarial images. In the second step, for  C = C , we take for  Ω  the set  Ω a d v , k s c e n a r i o  of adversarial images  D k a t k , s c e n a r i o ( A q p )  as of Table 3. One computes the values of  ShuffleDetect σ C k  for these images for all 100 permutations  σ P 100 , and one defines
s m i n s c e n a r i o ( k , a d v ) = M a x 1 i 100 { i ; Ψ C k ( 100 , i , Ω a d v , k s c e n a r i o ) = M % }
which captures the optimum index that makes sure that  Ψ C k ( 100 , s m i n s c e n a r i o ( a d v , k ) , Ω a d v , k s c e n a r i o ) = M % , where  M %  is the maximum possible detection rate of adversarial images created by the given attack on the given CNN. Clearly  M % = 100 %  if there are no adversarial images  D k a t k , s c e n a r i o ( A q p )  for which
Dom C k ( D k a t k , s c e n a r i o ( A q p ) ) = Dom C k ( s h σ ( D k a t k , s c e n a r i o ( A q p ) , 56 ) )
for all 100 permutations  σ . While this eventuality does not occur in our experiments with the target scenario, we shall see that it does for the untargeted scenario for many attacks and many CNNs.
We proceed firstly with the target scenario. This leads to the 40 histograms (obtained from 4 targeted attacks performed on the 10 CNNs) represented in Figure A3, Figure A4, Figure A5 and Figure A6 in Appendix C. The following Figure 3 shows the average behavior over the 10 CNNs of the ShuffleDetect method for all 100 permutations in the adversarial images created by each targeted attack. Note that the y-axis indicates the average number of adversarial images for a given attack, and all CNNs are taken together, as derived from Table 3. For example, there are  580 / 10 = 58  adversarial images on average for the target scenario for BIM.
Table 6 details the outcomes for each CNN individually, and provides an assessment of the detection rate DR of the ShuffleDetect method for the detection of adversarial images for the target scenario created by each of the four attacks. More precisely, for each  C k , for each targeted attack  a t k , the table first provides the percentage s of the 100 permutations  σ  for which the shuffled-by  σ  image of an adversarial image, namely  ShuffleDetect σ C k D k a t k , t a r g e t e d ( A q p ) , is declared adversarial on average. Consistently with assessments performed on the clean images, Table 6 provides  Ψ C k ( 100 , 91 , Ω a d v , k t a r g e t ) . Note that the number of elements of  Ω a d v , k t a r g e t  used to compute these values is equal to the corresponding value  β  from Table 3. Finally, Table 6 provides the values of  M %  and of  s m i n t a r g e t ( k , a d v ) .
We proceed with the untargeted scenario. This leads to the 69 histograms (derived from 6 untargeted attacks performed on the 10 CNNs, and from the FGSM untargeted attack performed on 9 CNNs) represented in Figure A7, Figure A8, Figure A9, Figure A10, Figure A11, Figure A12 and Figure A13 in Appendix C. The following Figure 4 provides the average behavior over the 10 (or 9 in the case of FGSM) CNNs of the ShuffleDetect method for all 100 permutations on the adversarial images created by each untargeted attack. Note that the y-axis indicates the average number of adversarial images for a given attack, all relevant CNNs taken together, derived from Table 3. For instance, there are  750 / 9 = 83.3  adversarial images on average for the untargeted scenario for FGSM.
With notations consistent with the already handled case of targeted attacks, Table 7 details the outcome for each CNN individually, and provides an assessment of the DR of the ShuffleDetect method applied to adversarial images for the untargeted scenarios created by each of the seven attacks. The number of elements of  Ω a d v , k u n t a r g e t  used to compute the values of  Ψ C k ( 100 , 91 , Ω a d v , k u n t a r g e t )  is equal to the corresponding value  α  from Table 3.

9. Intrinsic Performance of  ShuffleDetect C , R th , t

Indicators and performance. Since  ShuffleDetect C , R t h , t  is an unsupervised detector, the complexity criterion does not apply. The values of the remaining indicators do depend on the number t of permutations to be considered, and most of them are determined by the selected threshold ratio  R t h .
To assess the inference time latency, the creation of  t = 100  permutations (Step 1 of Algorithm 1) took  0.064  s using the command SymmetricGroup(16) and 100 calls of the command RandomElement on Maple 2022 (this timing could certainly be optimized). Running  ShuffleDetect σ C  for a single permutation  σ  (Steps 8 to 12 of Algorithm 1) takes  0.0784  s/permutation on average (over 100 considered permutations, overall 10 CNNs, and over 100 random clean images because considering them is sufficient to assess this average). The time required by Steps 2, 3, 6, and 14 (all of which are outside the loop of the t permutations) is negligible. The overall inference time latency of ShuffleDetect performed on an image with  t = 100  permutations amounts to  0.064 + 1 × 0.0784 + 100 × 0.0784 = 0.064 + 7.918 = 7.982  s/image on average. On the one hand, the prediction process performed by the CNN (one time in Steps 4 and 5 for the unshuffled image, and  t = 100  times in Steps 9 and 10 for the shuffled images) contributes to  98.02 %  of this time consumption. On the other hand, the shuffling process (Step 8, called  t = 100  times) contributes to  1.98 %  of this time consumption. See Appendix B, Table A4 for detailed information on all CNNs.
One should take into account two positive aspects of the proposed detector. Firstly, the  0.064  s consumed by the creation of the 100 permutations can be mutualized over several calls of the detector for different input images. Secondly, the tasks performed iteratively (Steps (7) to (13)) can be easily distributed; thus, apart from the time required for the creation of the 100 permutations, the algorithm would require only  0.0784 , plus some minor time due to the gathering of the distributed information, and the final computation and comparison.
The Overhead is very limited (and can be optimized). Algorithm 1 shows that the “permanent storage” is limited to the  t = 100  permutations expressed as products of cycles as in Table A3, Appendix B (which actually can be called upon if the permutations are computed once for all images to handle as we do in our experiments), the integer  s t h , and the extracted dominating category  Dom C ( I )  (which amounts to a numbering among the 1000 categories of ImageNet in our case). The “incremental storage” is made of the value 0 or 1 as  σ  progresses throughout the t permutations, hence (at most) t such Boolean values if one wants to keep the whole information. A memory-saving alternative is to keep only the updated N as  σ  progresses throughout the t permutations. The “ephemeral storage” (deleted after each run) is composed of the running images  s h σ ( I , s )  and  Dom C ( s h σ ( I , s ) ) . The computation resources are essentially limited to the creation of t permutations (to be done once at the beginning, as recommended), to  1 + t  calls to the CNN for the classification of  I  and  s h ( I , σ i ) , and to the creation of the (up to) t shuffled images  s h σ ( I , s ) . Finding the dominant category, as is necessary once for  I  and (at most) t times for the shuffled images, amounts to looking for the largest value in the classification output category, which is immediate in a set of 1000 values, as is the case here.
Note that in what precedes, we mention “at most” a few times since one could stop the loop before its natural end. This is the case if after some rounds the running threshold reaches such a value that the remaining rounds, whatever happens, cannot ensure that the threshold ratio  R t h  will not be reached.
The specific value chosen for  R t h  clearly impacts the different indicators of the ShuffleDetect algorithm (but, foremost, FPR and DR). To summarize, the smaller the  R t h , the higher the DR and FPR. However, our experiments show that a high  R t h  leads to a very good DR and a moderate FPR. A b-moll to this statement is that the situation differs according to the nature of the “targeted” or “untargeted” attack, as we shall see now.
For targeted attacks, Table 6 together with Figure A2 led us to consider (for  t = 100  permutations) four choices for the value for  R t h :
  • R t h = 51 %  matches the requirement that most permutations declare an image as adversarial.
  • R t h = 54 %  is motivated by the fact that the smallest  s m i n t a r g e t ( k , a t k )  among the 100 permutations and the four targeted attacks is  54 .
  • R t h = 87 %  is motivated by the fact that the average of the  s m i n t a r g e t ( k , a t k )  among the 100 permutations and the four targeted attacks is  = 87 .
  • R t h = 91 %  as a demanding ratio compromise.
For untargeted attacks, Table 7 together with Figure A7, Figure A8, Figure A9, Figure A10, Figure A11, Figure A12 and Figure A13 (Appendix C) show that (for  t = 100  permutations) using  s m i n u n t a r g e t ( k , a t k )  is irrelevant for the selection of  R t h . Indeed,  s m i n u n t a r g e t ( k , a t k )  is usually small. More precisely,  s m i n u n t a r g e t ( k , a t k ) 35  in all cases, and is  = 9.8  on average, as opposed to what occurs for the target scenario, where  s m i n t a r g e t ( k , a t k ) 54  in all cases, and is  = 87  on average. Therefore, we limit the selection of  R t h  to two values:
  • R t h = 51 %  for the same reason as for the target case.
  • R t h = 91 %  because it makes sense to keep the same demanding  R t h  value for the detector independently on the scenario of the attack, hence the same value for the targeted attack.
The values of FP and FPR depend only on the value of  R t h  (since no attack is considered for their computation), and on the CNN. Note that  FPR = FP / 100  for  t = 100  permutations. One writes  FP a v g  and  FPR a v g  for their respective average values over the 10 considered CNNs. Table 8 provides the corresponding values for  R t h = 51 % , 54 % , 87 % , and  91 %  (the four values used in the context of the target scenario also contain the two values used in the context of the untargeted scenario) for  t = 100  permutations.
Remark. The number of adversarial images against each CNN, created either by targeted or untargeted attacks, is in all cases strictly less than the number of clean ancestor images from which these attacks started. As mentioned at the end of Section 3, this imbalance should be considered to have a fair comparison basis and sound values for the indicators (what we measure is the performance of the indicator, not of the attack). Therefore, in Table 9, Table 10, Table 11 and Table 12 the clean images selected are those that correspond to the adversarial images obtained from them. For instance, since the EA-targeted attack succeeded to create “only” 91 images adversarial against  C 1 , we consider only the exact 91 clean images from which these adversarial images were obtained to assess the FP value.
For targeted attacks, Table 9 and Table 10 show (for  t = 100  permutations) the DR (which coincides with  Ψ C k ( 100 , 100 R t h , Ω a t k , k t a r g e t ) ), TP, FN, precision, recall, and F1 score values per CNN per targeted attack for each of the four selected values of  R t h , as well as their average values.
For untargeted attacks, Table 11 and Table 12 show (for  t = 100  permutations) the DR (which coincides with  Ψ C k ( 100 , 100 R t h , Ω a t k , k u n t a r g e t ) ), TP, FP, FN, precision, recall, F1 scores per CNN per untargeted attack for each selected value of  R t h , as well as their average values.
Conclusion for the intrinsic performance of ShuffleDetect. Regarding targeted attacks, Table 9 and Table 10 show that the difference in values of the indicators obtained when  R t h = 0.54  versus  0.51  (respectively,  0.87  versus  0.91 ) remains marginal.
If one knows the nature of “targeted” and “untargeted” attacks, and/or if one knows which specific attack to expect, one can choose the most appropriate threshold ratio value  R t h . However, one rarely has access to this intelligence in practice.
Consequently, it makes sense to consider a priori only  R t h = 0.51  or  R t h = 0.91  whatever the attack (hence, a fortiori whatever its targeted or untargeted nature). Table 13 provides the values of all indicators per CNN on average over the 4 targeted attacks. It also provides the worst  F 1  value as a proxy of the worst case for our detector. Similar information for untargeted attacks is given in Table 14.
Table 13 and Table 14 show that our detector achieves very good results. For instance (when both  R t h = 0.51  and  0.91  are considered) for the two highly significant indicators made of the detection rate and the  F 1  values:
  • For all targeted attacks, the detection rate is  98.55 , the  F 1  value is  0.76 , and the average values of these indicators are  99.67  and  0.87 , respectively.
  • For untargeted attacks, the detection rate is  51.23 , the  F 1  value is  0.60 , and the average values are  76.77  and  0.75 , respectively.
Recall that a defender does not know the nature (targeted or untargeted) of an attack he is exposed to. For the sake of completeness, Table 15 provides the values of all indicators per CNN in the average overall attacks, targeted and not targeted, for the two values  R t h = 0.51  and  0.91 .
Now, as a defender, it is wise to consider the values of the indicators given in Table 14 for untargeted attacks, since then one is also “on the safe side” for targeted attacks as well.
A remaining issue is whether one can achieve results as those given in Table 14 (allowing one to be “on the safe side”, as pointed out above), say for DR, precision, recall, and F1, with less than 100 permutations. For instance for  C 1 , can one achieve (DR, precision, recall, F1) = (77.5, 0.7, 0.8, 0.7) in less than 100 permutations? Indeed, doing so would clearly speed up the process (see Table A4 to assess time savings per spared permutation).
We performed a series of tests with increasing values of the number of permutations, aimed at indicator values, as those of Table 14. More precisely, we fixed the indicator values as those of Table 14, and we added permutations one by one (following their numbering, as given in Table A3), and stopped when we achieved those fixed indicator values. Note that the minimal number of permutations, with which it makes sense, from a mathematical point of view, to start this process, depends on the value of  R t h .
For  R t h = 0.51 , it makes sense to consider  t 3 , while for  R t h = 0.91 , it makes sense to consider  t 12 . Therefore, for each CNN  C , for each attack  a t k , targeted or untargeted accordingly, starting with the first 3 permutations for  R t h = 0.51  (respectively, the first 12 permutations for  R t h = 0.91 ), we added the subsequent permutations whenever appropriate, and stopped the process when the minimal number  t o p t i m a l , C , a t k  of permutations fulfilling the above criteria was achieved. Table 16 provides the outcome of this experiment.
Finally, which value for  R t h  do we privilege? We considered the DR indicator as the most significant one to make our choice. With this indicator, we concluded that the “democratic” value  R t h = 0.51  is an appropriate and reasonable choice for most applications of ShuffleDetect. In terms of the number of permutations, one can use the number  t o p t i m a l , C = Max a t k { t o p t i m a l , C , a t k } , defined for  R t h = 0.51  in Table 16, according to the CNN  C  considered. This value is convenient for the relevant 4 targeted attacks and 7 untargeted attacks studied here. However, especially in view of the low time and memory price to pay for additional permutations, we consider that a defender who uses 100 permutations is better prepared against unknown attacks. Refinements in this regard are still possible, especially since ShuffleDetect is on the defender’s side, the defender knows which CNNs to protect so that he can adapt accordingly.

10. Performance Comparison of ShuffleDetect and Feature Squeezer (FS)

To assess the extrinsic performance of ShuffleDetect, we compared it with the FS detector [13]. We selected this detector since, similar to ShuffleDetect, it is an unsupervised detector, which also presents no significant complexity issues. The comparison between ShuffleDetect and FS is performed only according to the detection rate, and not according to the other indicators mentioned in Section 3, since, for instance, the value of FPR in [13] is determined by the behavior of FS as compared to another detector (MagNet [35]); hence, it is not an intrinsic value, to the difference of what we do in Section 9 for ShuffleDetect. Therefore, the comparisons of the detectors are performed on the 9480 images of Table 3, adversarial against the 10 considered CNNs (Section 7).
In our experiments, we used multiple squeezers for FS as suggested in [13] (we keep their notations in what follows). The  L 1  norm is used to measure the difference between the prediction by the CNN of the input image and the prediction of the squeezed input image:
s c o r e x , x s q u e e z e d = | | g ( x ) g ( x s q u e e z e d ) | | L 1 ,
where x is the input image and  g ( x )  is the classification vector of the CNN according to the different categories. Multiple feature squeezers are combined in the FS detector. In practice, one computes the maximum distance:
s c o r e j o i n t ( x ) = m a x ( s c o r e x , x s q 1 , s c o r e x , x s q 2 , s c o r e x , x s q 3 )
The values of the parameters of the FS squeezers are chosen as the optimal values recommended in [13]:
  • Color depth reduction: the image color depth is decreased to 5 bits.
  • Median smoothing: the filter size is set to  2 × 2 .
  • Non-local means: the search window size is set to  11 × 11 , the patch size is set to  3 × 3 , and the filter strength is set to 4.
  • The threshold is set to  1.2128 .
The image is declared by the FS detector as adversarial if  s c o r e j o i n t ( x ) 1.2128  and is declared clean otherwise.
For  ShuffleDetect C , R t h , t , consistently with the outcomes of Section 9, we set  t = 100 R t h = 0.51  for all CNNs in the experiments (note that the size  s × s  of the patches is kept to  56 × 56  for the images considered here).
Table 17 compares the detection rates of ShuffleDetect and FS for the 9480 adversarial images referred to. For the 2975 adversarial images for the targeted scenario, both detectors demonstrate high success rates. Even if FS achieves DR over  92 % , it is outperformed by ShuffleDetect, which achieves  100 %  in all cases. For the 6505 adversarial images for the untargeted scenario, the success rates of both detectors experience a decline. FS achieves slightly better results than ShuffleDetect for DeepFool and CW Inf, and significantly better results for PGD Inf, BIM, PGD, and L2; it is outperformed by ShuffleDetect, slightly for FGSM, and highly significant for EA. Regarding the overall performance (see the last row of Table 17), ShuffleDetect achieves a higher success rate than FS on average (both scenarios and all CNNs considered).

11. Conclusions

In this paper, we presented ShuffleDetect as a new unsupervised method for the detection of image adversarials against trained CNNs. We provided a complete design and recommendations for the selection of the values of its parameters. Given a CNN and an image potentially resized to fit the CNN’s input size, the steps that essentially compose this new detection method are fairly simple. During the initiation phase, the dominant category in which the CNN sorts the input image is required, the image is split into non-overlapping patches (of fixed sizes, depending on the CNN’s own input size), and a fixed set of appropriate permutations is selected at random. Then a loop is performed according to the successive permutations, where the patches are shuffled with the running permutation, and the dominant category in which the CNN sorts the shuffled image is compared with the outcome for the unshuffled image, leading to a Boolean value. Finally, one assesses the proportion of permutations for which the CNN classifies the shuffled image into a different category than the unshuffled input image. ShuffleDetect declares the image as adversarial if this proportion exceeds a threshold value  R t h  and declares the image clean otherwise.
Our extensive experiments with 10 diverse and state-of-the-art CNNs, trained on ImageNet with images usually resized to  224 × 224 , with 8 attacks (one ’black-box’ and seven ’white-box’), and with 9500 clean and adversarial images for the targeted or untargeted scenario led us to recommend a size of  56 × 56  for the altogether 16 patches, and the “democratic” value  R t h = 0.51 . Although running ShuffleDetect with 100 permutations is perfectly feasible and could be considered a safe option, a smaller number of permutations, varying between 12 and 68 according to the considered CNN, may also lead to a satisfactory detection rate. Additionally, if the defender has more information about the type of attack expected, the number of permutations can be fine-tuned accordingly. This said, and since this type of knowledge occurs rarely, we recommend taking at least 100 permutations.
Apart from the time needed for the creation of a fixed set of permutations, namely  0.064  s to obtain 100 permutations, which can be performed once for all (at least in our implementation), the intrinsic performance of ShuffleDetect on our computers shows an inference time latency of  0.0784  s per image per permutation. The algorithm can be easily parallelized so that the required time for a complete run of the algorithm can be significantly less than the  7.982  s/image when this task is not distributed. Out of this time-consuming process, the classification process of an image by the CNN is  98.02 % ; therefore, the shuffling process itself requires only  1.98 % . The overhead is very limited since its main part, the “permanent storage”, is required essentially only for the 100 permutations (the storage per permutation is a sequence of groups of distinct integers between 1 and 16 in our case), and the dominating category of the unshuffled image. Among the main indicators of a detector, the most relevant ones are the false positive rate and the detection rate. With  R t h = 0.51  and 100 permutations, as well as the average overall considered CNNs and images, ShuffleDetect achieves an average FPR of  32.7 % , an average DR of  100 %  for the adversarial images obtained by targeted attacks, and  87.79 %  of those obtained by untargeted attacks. Our study also provides the scores of the other relevant indicators, i.e., TP, FP, FN, precision, recall, and F1.
While performing a thorough comparison with other detectors requires overcoming the difficult challenges outlined in Section 3, in order to make sound comparisons under the same conditions, we performed this task for one detector, namely FS, and showed that, on average, ShuffleDetect achieves better detection rates than FS.
Independent of the outcome of any comparison process with other detectors, our ShuffleDetect method could be used as a first line of defense before applying more sophisticated, time-consuming, and overhead-consuming detection methods than ShuffleDetect.
As a potential area for future research, it would be worthwhile to assess the effectiveness of ShuffleDetect using CNNs trained on Cifar10 and MNIST datasets. This could result in determining the optimal patch size as a ratio to the image size. Additionally, it would be beneficial to explore the optimal patch size for images containing very small or very large objects.

Author Contributions

All authors: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank Uli Sorger for the fruitful discussions on this subject.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. The 100 ancestor images  A q p  used in the experiments.  A q p  pictured in the qth row and qth column ( 1 p , q 10 ) is randomly chosen from the ImageNet validation set of the ancestor category  c a q  specified on the left of the qth row.
Figure A1. The 100 ancestor images  A q p  used in the experiments.  A q p  pictured in the qth row and qth column ( 1 p , q 10 ) is randomly chosen from the ImageNet validation set of the ancestor category  c a q  specified on the left of the qth row.
Applsci 13 04068 g0a1
Table A1. The original sizes  ( h × w )  of the 100 ancestor images  A q p  before resizing with the bilinear interpolation function.
Table A1. The original sizes  ( h × w )  of the 100 ancestor images  A q p  before resizing with the bilinear interpolation function.
Ancestor Images  A q p  and Their Original Size  ( h × w )
  c a q   p 12345678910
  q
abacus1(206, 250)(960, 1280)(262, 275)(598, 300)(377, 500)(501, 344)(375, 500)(448, 500)(500, 500)(150, 200)
acorn2(374, 500)(500, 469)(375, 500)(500, 375)(500, 500)(500, 500)(375, 500)(374, 500)(461, 500)(333, 500)
baseball3(398, 543)(240, 239)(180, 240)(333, 500)(262, 350)(310, 310)(404, 500)(344, 500)(375, 500)(285, 380)
broom4(113, 160)(150, 150)(333, 500)(500, 333)(497, 750)(336, 500)(188, 250)(375, 500)(334, 500)(419, 640)
brown bear5(500, 333)(286, 490)(360, 480)(298, 298)(413, 550)(366, 500)(400, 400)(348, 500)(346, 500)(640, 480)
canoe6(500, 332)(450, 600)(500, 375)(375, 500)(406, 613)(600, 400)(1067, 1600)(333, 500)(1536, 2048)(375, 500)
hippopotamus7(375, 500)(1200, 1600)(333, 500)(450, 291)(525, 525)(375, 500)(500, 457)(424, 475)(500, 449)(339, 500)
llama8(500, 333)(618, 468)(500, 447)(253, 380)(500, 333)(333, 500)(375, 500)(375, 500)(290, 345)(375, 500)
maraca9(375, 500)(375, 500)(470, 627)(151, 220)(250, 510)(375, 500)(99, 104)(375, 500)(375, 500)(500, 375)
mountain bike10(375, 500)(500, 375)(375, 500)(333, 500)(500, 375)(300, 402)(375, 500)(446, 500)(375, 500)(500, 333)

Appendix B

Table A2. For a  224 × 224  image, grid of its 16 patches of size  56 × 56 , represented as  P i , j  (left grid), and as  P 1 , , P 16  (right grid) as used by the permutations  σ k .
Table A2. For a  224 × 224  image, grid of its 16 patches of size  56 × 56 , represented as  P i , j  (left grid), and as  P 1 , , P 16  (right grid) as used by the permutations  σ k .
Left GridRight Grid
P 1 , 1 P 1 , 2 P 1 , 3 P 1 , 4 P 1 P 2 P 3 P 4
P 2 , 1 P 2 , 2 P 2 , 3 P 2 , 4 P 5 P 6 P 7 P 8
P 3 , 1 P 3 , 2 P 3 , 3 P 3 , 4 P 9 P 10 P 11 P 12
P 4 , 1 P 4 , 2 P 4 , 3 P 4 , 4 P 13 P 14 P 15 P 16
Table A3. For t up to 100 rounds, the list of random permutations  σ r  for  1 r 100 . Each  σ r  is represented as the product of cycles operating on 16 patches of a  224 × 224  image.
Table A3. For t up to 100 rounds, the list of random permutations  σ r  for  1 r 100 . Each  σ r  is represented as the product of cycles operating on 16 patches of a  224 × 224  image.
t = 100
Round  r Permutation  σ r Round  r Permutation  σ r
1(1,13,4,6)(2,14,10)(3,11,12,5,9)(7,16,15,8)51(1,9,10,8,13,6,2,15,5,14,4,7,11)(3,16)
2(1,8,9,2,6,11,15,12)(3,4,5,10,7)(14,16)52(1,11,15,4,10,2,3,5,12,9,13,8,16,7)(6,14)
3(1,2,10,12)(3,11,16,15)(4,6,13,7,14,9,5)53(1,12,3,7,2,5,6,15,16,14,4,10)(8,13)(9,11)
4(1,12,7,6,9,5,13,16)(2,4,14,10)(3,11,15,8)54(1,2,13,12,7)(3,6,4,8)(9,10)(11,15)
5(3,5,14,16,7,4,12,6,13,11)(8,15,9,10)55(1,6,3)(2,12,14,4,15,7)(5,9)(8,13,10,11,16)
6(1,7,4,9,2,5)(3,14)(6,16,8,13,10,15,12,11)56(1,8,3,4,13,10,9,16,5,2,7,11,12)(6,15)
7(1,7,15,5,10,4,2,13,14,12,6,9)(3,11,16,8)57(1,5,12,9,15,4,7,11,2,10,6,16,8,3,14)
8(1,2,5)(3,11,16,10,12,9,7,6,15,4,13,8)58(1,12)(2,6,13,10,7,8)(3,15,5,16,11,9)(4,14)
9(1,7,15,8,13,5,9,11)(2,12)(3,16,14,4)(6,10)59(2,11,13,6)(3,12,10,7,16,4)(5,8)(9,15,14)
10(1,16,8,15,4,5,6)(3,14,13)(7,12)(9,10,11)60(1,13,15,8,4,14,5,9,12,7,10,11,16,3,6,2)
11(1,8,10,13,9,6,2)(3,12,5,15,14,4,7)(11,16)61(1,2,14,6,10,7)(4,5,12,9,8,16,11)
12(1,4,14,16,5,6,11,13,15,9)(2,12,10,3,8)62(1,11)(2,7,4,5,10,12,14,9)(3,6,8,13,15,16)
13(1,5,14,13,10)(2,6,7,4,8)(3,15,11,9,16,12)63(1,9,14,15,11,5,8,10,2,4,3,12,16,13,6,7)
14(1,16,9,4,3,2,5,7,6,11,12,10,8,15,14,13)64(2,11,12,10,5)(3,16,14,13,4,8,6,15)(7,9)
15(1,16,5,13,8,6)(2,15,14,10,11,12,9,3,7,4)65(1,5,12,3,2,6,11,13,16,14)(7,10,15)
16(1,14,12,2,13,7,10,8,3,15,11,6,16,4)66(1,15,7,11,12,2)(3,10,4,14,5,8,6,16,9,13)
17(1,2,5,13)(4,11,8,10,16,14,15)(6,7)(9,12)67(1,4)(2,6,15,11,12,16)(3,5,14)(7,8)(9,10)
18(1,12,13,16,3,8,10,2,11,14,7,4,15,6)68(1,13,6,14,2,10,5,15,11,9,4,12,8,3,7,16)
19(1,8,4,16,3,13,6,7,15)(2,12)(5,14,11)(9,10)69(1,9,15,6,8,10,11,2,12,16,4,13,14,7)(3,5)
20(1,14,15,5)(2,4,12,13)(3,8,16,11)(6,7)(9,10)70(1,2,6,8,3)(4,12)(5,7,13,10,15)(9,11,14,16)
21(1,2,6)(3,8,14,10,13,12)(5,9,16,15)71(2,10,16,6,13,3,14,12)(4,5,8,15,7,9,11)
22(1,3,11,14,2,10)(4,12,6,7,15,5,16,9)(8,13)72(1,8,13,7)(2,10,15,6,14,9,3,16,5,11)
23(1,4,11,9,14,7,2,5,3,8,6)(10,15)73(1,5,16,12,6,2,8,11,4,10,9,13,14)(7,15)
24(1,12,8,7)(2,4,5,14,6,9,3,13,16)(10,15,11)74(1,6,16,13,11,5,14,4,3,9,15,2,8,10,7)
25(1,14,6,4,10,16,5,13,12,2,8,15,9,3,7,11)75(1,12,9,6,15,4,5,14,2,3)(7,11,16)(8,13)
26(1,15,5)(2,13,4,9,16,8,11,12,3,6,10,14,7)76(1,11,15,16,9)(2,12,5,3,8,13,6)(7,10,14)
27(1,10,8,12,14,7,2)(3,13,11,5,6)(4,15)77(2,8,15,10,16,9,12,7,4)(3,5,11,14)(6,13)
28(1,8,11,7,16,5,6,12,4,14)(2,15,3,10,9)78(1,15,11,8,16,5,2,12,3,13,6,10,14)(4,9)
29(1,5,3,12,15,11)(2,14,10,6,8,9,7,13,16)79(1,16,13,5,3,10,6,4,15,2,11)(7,14,9)
30(1,10,8,15)(3,9,7,4,12)(5,11,6)80(2,10,13,11,15,6,5,8,3,16,4,7,9,14)
31(1,3,10,6,9,7,16,2,8)(4,5,14)(12,13)81(1,5,15,2,16,10,9,14,11,4,12,6,3)
32(1,2,11,16,10,15)(3,14,6,5,9)(4,7,13,12)82(1,13,5,10,2,15,11,4,16,7,12,9,14,3,8,6)
33(1,16,14,13,10,7,12,3,6,11,9,5)(2,4)83(1,14,8,9,15,3,5,2,7,10,4,12,6,11,16)
34(1,6,14)(2,10,3,15,9,12,11,4,16,13,8,7)84(1,15,8,9,4,3,16,6,7,14,5,12,2,10,13,11)
35(1,2)(3,15,16,13,12,4,5,6,7,9,10,11)(8,14)85(1,9,3,13)(4,11,15,12)(5,16,6,10,7,8,14)
36(3,12,8,6,7,10,16,5,15,13)(4,9)86(1,9,15,8,13,14,6,11,7)(2,10,12,3,16,5,4)
37(1,3,10,4,15,8,16,12,13,7,14,9,2)87(1,3,9,7,6,4,5)(10,12,14,16,11)(13,15)
38(1,4,16)(2,9,5,13,10,14,3,11,8,7)(12,15)88(1,9,8,12,14,5,10,6,15,4,3)(2,11,16,13)
39(1,4,5,2,11,10,12,9,14,15,3,16,13)(7,8)89(1,13,2,9,16)(3,14,11,8,7,15,6)(5,12,10)
40(1,9,8,15,5,10,11,12,4,14,2,3,13,16,6,7)90(1,8,16,2,6,3,10,14,7,13,4,9,12,5,11)
41(1,8,9,11,16,4)(2,13,14,15,7,12)(3,10,5)91(1,5,16,6,10,3,11,15,9,12,14,8,7,2,4)
42(1,9,4,15,14,5)(2,11,12,3,6,10,13)(7,8,16)92(1,10,16,11,4,8,5,12,13,3,14,9)(2,7,15)
43(2,8,14,9,7,16,12,10,13,6,15,3,11,4,5)93(1,4,2,13,6,9,14,3,10,8,16,11,15,7)
44(1,11,12,14,2,13,8,9,3,10,6)(5,15,16)94(1,16,15,3,9,2,6,7,11,4)(5,8,14,12)(10,13)
45(1,3,16,4)(2,5,6,15,7,11)(8,9,10)(13,14)95(3,10,13,15,12,9,14,16,7,5,4,6,8,11)
46(1,6,12,10,8,15,5)(2,4,16,3,13)(7,14,9,11)96(1,6,15,4,5,3,16,13,9,10,12,2,8,7)
47(1,7,14,3,4,16,8,13)(2,9)(5,12,6,11)(10,15)97(1,14,2,7,3,13,8,16,5,11,15,4,6,10,9,12)
48(1,8,14,6,11,13,3,10,12,16,2,15,5,7,4)98(1,13,3,16)(2,11,6,14,5)(4,9,10,7,12,8,15)
49(1,8,12,10,11,6,9,15)(3,13,4,7)(5,16,14)99(1,6,13,5,12,15,2)(3,14,8)(7,11)(9,16,10)
50(1,5,16,2,11,4,13,15,12,3,8,7,14,6)100(1,12,11,8,2,3)(4,14,16,7,10,6)(5,13,15)
Table A4. The duration, in s, of each of the main steps of Algorithm 1 for each CNN.
Table A4. The duration, in s, of each of the main steps of Algorithm 1 for each CNN.
Per Permutation
C Steps: 8–12ShufflingPredictingShuff%Pred%
Step: 8Steps: 9–10
C 1 0.09550.00140.09411.48398.511
C 2 0.11760.00140.11621.22898.767
C 3 0.05780.00160.05632.72797.264
C 4 0.09330.00160.09171.66498.330
C 5 0.12620.00160.12461.23398.763
C 6 0.06600.00160.06442.44997.542
C 7 0.08440.00170.08281.97598.018
C 8 0.10170.00170.10001.67898.316
C 9 0.02130.00150.01986.93093.047
C 10 0.01960.00150.01827.56392.411
AVG0.07840.00150.0771.97898.015

Appendix C

Figure A2. Shuffling test results of 100 clean (ancestor) images on  C = C  for  1 k 10  over 100 permutations.
Figure A2. Shuffling test results of 100 clean (ancestor) images on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a2
Figure A3. ShuffleDetect results for adversarial images generated by the EA-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A3. ShuffleDetect results for adversarial images generated by the EA-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a3
Figure A4. ShuffleDetect results for adversarial images generated by the BIM-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A4. ShuffleDetect results for adversarial images generated by the BIM-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a4
Figure A5. ShuffleDetect results for adversarial images generated by the PGD Inf-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A5. ShuffleDetect results for adversarial images generated by the PGD Inf-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a5
Figure A6. ShuffleDetect results for adversarial images generated by the PGD L2-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A6. ShuffleDetect results for adversarial images generated by the PGD L2-targeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a6
Figure A7. ShuffleDetect results for adversarial images generated by the EA-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A7. ShuffleDetect results for adversarial images generated by the EA-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a7
Figure A8. ShuffleDetect results for adversarial images generated by the FGSM-untargeted attack on  C = C  for  2 k 10  over 100 permutations.
Figure A8. ShuffleDetect results for adversarial images generated by the FGSM-untargeted attack on  C = C  for  2 k 10  over 100 permutations.
Applsci 13 04068 g0a8
Figure A9. ShuffleDetect results for adversarial images generated by the BIM-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A9. ShuffleDetect results for adversarial images generated by the BIM-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a9
Figure A10. ShuffleDetect results for adversarial images generated by the PGD Inf-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A10. ShuffleDetect results for adversarial images generated by the PGD Inf-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a10
Figure A11. ShuffleDetect results for adversarial images generated by the PGD L2-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A11. ShuffleDetect results for adversarial images generated by the PGD L2-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a11
Figure A12. ShuffleDetect results for adversarial images generated by the CW Inf-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A12. ShuffleDetect results for adversarial images generated by the CW Inf-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a12
Figure A13. ShuffleDetect results for adversarial images generated by the Deep Fool-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Figure A13. ShuffleDetect results for adversarial images generated by the Deep Fool-untargeted attack on  C = C  for  1 k 10  over 100 permutations.
Applsci 13 04068 g0a13

References

  1. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
  2. Chakraborty, A.; Alam, M.; Dey, V.; Chattopadhyay, A.; Mukhopadhyay, D. Adversarial Attacks and Defences: A Survey. arXiv 2018, arXiv:1810.00069. [Google Scholar] [CrossRef]
  3. Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar]
  4. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1810.00069. [Google Scholar]
  5. Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), IEEE, San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
  6. Wiyatno, R.; Xu, A. Maximal Jacobian-based Saliency Map Attack. arXiv 2018, arXiv:1808.07945. [Google Scholar]
  7. Tramèr, F.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. The Space of Transferable Adversarial Examples. arXiv 2017, arXiv:1704.03453. [Google Scholar]
  8. Liu, Y.; Chen, X.; Liu, C.; Song, D. Delving into Transferable Adversarial Examples and Black-box Attacks. arXiv 2016, arXiv:1611.02770. [Google Scholar]
  9. Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2137–2146. [Google Scholar]
  10. Narodytska, N.; Kasiviswanathan, S.P. Simple Black-Box Adversarial Perturbations for Deep Networks. arXiv 2016, arXiv:1612.06299. [Google Scholar]
  11. Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting adversarial samples from artifacts. arXiv 2017, arXiv:1703.00410. [Google Scholar]
  12. Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; McDaniel, P. On the (statistical) detection of adversarial examples. arXiv 2017, arXiv:1702.06280. [Google Scholar]
  13. Xu, W.; Evans, D.; Qi, Y. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arXiv 2017, arXiv:1704.01155. [Google Scholar]
  14. Liang, B.; Li, H.; Su, M.; Li, X.; Shi, W.; Wang, X. Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Dependable Secure Comput. 2018, 18, 72–85. [Google Scholar] [CrossRef] [Green Version]
  15. Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
  16. Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Austin, TX, USA, 2006. [Google Scholar]
  17. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
  18. Varrette, S.; Bouvry, P.; Cartiaux, H.; Georgatos, F. Management of an Academic HPC Cluster: The UL Experience. In Proceedings of the 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), IEEE, Bologna, Italy, 21–25 July 2014; pp. 959–967. [Google Scholar]
  19. Topal, A.O.; Chitic, R.; Leprévost, F. One evolutionary algorithm deceives humans and ten convolutional neural networks trained on ImageNet at image recognition. ASC, 2022; under review. [Google Scholar]
  20. Aldahdooh, A.; Hamidouche, W.; Fezza, S.A.; Déforges, O. Adversarial example detection for DNN models: A review and experimental comparison. Artif. Intell. Rev. 2022, 55, 4403–4462. [Google Scholar] [CrossRef]
  21. Ma, X.; Li, B.; Wang, Y.; Erfani, S.M.; Wijewickrema, S.N.R.; Houle, M.E.; Schoenebeck, G.; Song, D.; Bailey, J. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality. arXiv 2018, arXiv:1801.02613. [Google Scholar]
  22. Ma, S.; Liu, Y. Nic: Detecting adversarial samples with neural network invariant checking. In Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019), San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
  23. Chitic, R.; Topal, A.O.; Leprévost, F. Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box. Appl. Sci. 2022, 12, 7339. [Google Scholar] [CrossRef]
  24. Leprévost, F. How Big is Big? How Fast is Fast? A Hands—On Tutorial on Mathematics of Computation; Amazon: New York, NY, USA, 2020. [Google Scholar]
  25. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. The ImageNet Image Database. 2009. Available online: http://image-net.org (accessed on 20 September 2022).
  26. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  28. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  29. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  30. Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv 2018, arXiv:1807.11626. [Google Scholar]
  31. Agrafiotis, D. Chapter 9—Video Error Concealment. In Academic Press Library in Signal Processing; Theodoridis, S., Chellappa, R., Eds.; Elsevier: Amsterdam, The Netherlands, 2014; Volume 5, pp. 295–321. [Google Scholar] [CrossRef]
  32. Nicolae, M.; Sinn, M.; Minh, T.N.; Rawat, A.; Wistuba, M.; Zantedeschi, V.; Molloy, I.M.; Edwards, B. Adversarial Robustness Toolbox v1.0.0. arXiv 2018, arXiv:1807.01069. [Google Scholar]
  33. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar]
  34. Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2574–2582. [Google Scholar]
  35. Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 135–147. [Google Scholar]
Figure 1. 224 × 224  image  I  is divided into 4 patches of size  112 × 112  (top picture). The patches are shuffled around according to the permutation  σ = ( 1 , 4 ) ( 2 , 3 ) S 4 , leading to  s h σ ( I , 112 )  (bottom picture). Both  I  and  s h σ ( I , 112 )  are sent to the CNN to extract the output vector.
Figure 1. 224 × 224  image  I  is divided into 4 patches of size  112 × 112  (top picture). The patches are shuffled around according to the permutation  σ = ( 1 , 4 ) ( 2 , 3 ) S 4 , leading to  s h σ ( I , 112 )  (bottom picture). Both  I  and  s h σ ( I , 112 )  are sent to the CNN to extract the output vector.
Applsci 13 04068 g001
Figure 2. ShuffleDetect performed on 100 clean (ancestor) images with 100 permutations.
Figure 2. ShuffleDetect performed on 100 clean (ancestor) images with 100 permutations.
Applsci 13 04068 g002
Figure 3. Average outcome over the 10 CNNs of ShuffleDetect performed with 100 permutations on the adversarial images created for the target scenario by EA, BIM, PGD Inf, and PGD L2.
Figure 3. Average outcome over the 10 CNNs of ShuffleDetect performed with 100 permutations on the adversarial images created for the target scenario by EA, BIM, PGD Inf, and PGD L2.
Applsci 13 04068 g003
Figure 4. Average outcome over all relevant CNNs of ShuffleDetect performed with 100 permutations on the adversarial images created for the untargeted scenario by EA, FGSM, BIM, PGD Inf, PGD L2, CW Inf, and DeepFool.
Figure 4. Average outcome over all relevant CNNs of ShuffleDetect performed with 100 permutations on the adversarial images created for the untargeted scenario by EA, FGSM, BIM, PGD Inf, PGD L2, CW Inf, and DeepFool.
Applsci 13 04068 g004
Table 1. For  1 p 10 , the second column lists the ancestor category  c a p  and its ordinal  1 a p 1000  among the categories of ImageNet. Mutatis mutandis in the third column with the target category  c t p  and ordinal  t p .
Table 1. For  1 p 10 , the second column lists the ancestor category  c a p  and its ordinal  1 a p 1000  among the categories of ImageNet. Mutatis mutandis in the third column with the target category  c t p  and ordinal  t p .
p ( c a p , a p ) ( c t p , t p )
1(abacus, 398)(bannister, 421)
2(acorn, 988)(rhinoceros beetle, 306)
3(baseball, 429)(ladle, 618)
4(broom, 462)(dingo, 273)
5(brown bear, 294)(pirate, 724)
6(canoe, 472)(saluki, 176)
7(hippopotamus, 344)(trifle, 927)
8(llama, 355)(agama, 42)
9(maraca, 641)(conch, 112)
10(mountain bike, 671)(strainer, 828)
Table 2. For each attack  a t k , and each  C k , the number of successful runs performed on the 100 ancestors are presented. The results are given as a pair  ( α , β )  or as a single value  α , depending on whether  a t k  is performed for both the untargeted and the targeted scenarios (assessed, respectively, by the values of  α , β  in the pair), or only the untargeted scenario (assessed by the single value of  α ). The successful attacks on each individual CNN are given in the last row with obvious notations.
Table 2. For each attack  a t k , and each  C k , the number of successful runs performed on the 100 ancestors are presented. The results are given as a pair  ( α , β )  or as a single value  α , depending on whether  a t k  is performed for both the untargeted and the targeted scenarios (assessed, respectively, by the values of  α , β  in the pair), or only the untargeted scenario (assessed by the single value of  α ). The successful attacks on each individual CNN are given in the last row with obvious notations.
atk C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Total
EA(96, 91)(97, 90)(99, 88)(98, 84)(98, 79)(99, 85)(97, 89)(98, 86)(99, 97)(99, 97)(980, 886)
FGSM(11, 0)(83, 3)(82, 2)(81, 3)(80, 2)(86, 3)(77, 4)(80, 2)(92, 13)(89, 9)(761, 41)
BIM(93, 43)(91, 38)(96, 57)(96, 52)(93, 46)(98, 56)(95, 73)(95, 50)(95, 87)(94, 78)(946, 580)
PGD Inf(93, 49)(91, 38)(96, 57)(96, 52)(93, 46)(98, 56)(95, 73)(95, 50)(95, 87)(94, 78)(946, 586)
PGD L1(26, 0)(28, 1)(19, 0)(17, 1)(12, 0)(19, 1)(15, 0)(10, 0)(33, 0)(32, 0)(211, 3)
PGD L2(93, 90)(91, 88)(97, 94)(99, 92)(96, 89)(99, 94)(98, 94)(97, 86)(96, 97)(95, 99)(961, 923)
CW Inf(94, 0)(95, 0)(98, 0)(99, 0)(98, 0)(100, 0)(97, 0)(99, 0)(93, 0)(94, 0)(967, 0)
DeepFool949792979410094979694955
Total(600, 273)(673, 258)(679, 298)(683, 284)(664, 262)(699, 295)(668, 333)(671, 274)(699, 381)(691, 361)(6727, 3019)
Table 3. For each attack  a t k , and each  C k , the number of successful runs performed on the 100 ancestors are presented, for which at least  35 %  were terminated successfully. The results are given as a pair  ( α , β )  or as a single value  α , depending on whether  a t k  is performed for both the untargeted and the targeted scenarios (assessed, respectively, by the values of  α , β  in the pair), or only the untargeted scenario (assessed by the single value of  α ). The statistically relevant successful attacks on each individual CNN are given in the last row with obvious notations.
Table 3. For each attack  a t k , and each  C k , the number of successful runs performed on the 100 ancestors are presented, for which at least  35 %  were terminated successfully. The results are given as a pair  ( α , β )  or as a single value  α , depending on whether  a t k  is performed for both the untargeted and the targeted scenarios (assessed, respectively, by the values of  α , β  in the pair), or only the untargeted scenario (assessed by the single value of  α ). The statistically relevant successful attacks on each individual CNN are given in the last row with obvious notations.
atk C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Total
EA(96, 91)(97, 90)(99, 88)(98, 84)(98, 79)(99, 85)(97, 89)(98, 86)(99, 97)(99, 97)(980, 886)
FGSM 838281808677809289750
BIM(93, 43)(91, 38)(96, 57)(96, 52)(93, 46)(98, 56)(95, 73)(95, 50)(95, 87)(94, 78)(946, 580)
PGD Inf(93, 49)(91, 38)(96, 57)(96, 52)(93, 46)(98, 56)(95, 73)(95, 50)(95, 87)(94, 78)(946, 586)
PGD L2(93, 90)(91, 88)(97, 94)(99, 92)(96, 89)(99, 94)(98, 94)(97, 86)(96, 97)(95, 99)(961, 923)
CW Inf949598999810097999394967
DeepFool949792979410094979694955
Total(563, 273)(645, 254)(660, 296)(666, 280)(652, 260)(680, 291)(653, 329)(661, 272)(666, 368)(659, 352)(6505, 2975)
Table 4. Percentages of shuffled images  s h σ ( A q p , s )  (first percentage),  s h σ ( D k E A ( A q p ) , s )  (second percentage), and  s h σ ( D k B I M ( A q p ) , s )  (third percentage) for which the predicted class is c.
Table 4. Percentages of shuffled images  s h σ ( A q p , s )  (first percentage),  s h σ ( D k E A ( A q p ) , s )  (second percentage), and  s h σ ( D k B I M ( A q p ) , s )  (third percentage) for which the predicted class is c.
sNumber of Patches c = c a c { c a , c t } c = c t
161960.4, 0.1, 0.199.6, 99.9, 99.90.0, 0.0, 0.0
324918.0, 9.2, 5.382.0, 90.8, 94.40.0, 0.0, 0.3
561667.6, 39.3, 15.832.4, 60.3, 70.10.0, 0.4, 14.1
112488.4, 62.3, 22.311.6, 33.2, 35.90.0, 4.5, 41.8
Table 5. For each  C k , the number (=percentage) of clean ancestors  A q p  declared adversarial for s out of 100 permutations. The first row shows the average number of permutations for which this occurs. The last row, the sum of the two previous ones, provides an estimate of the FPR, which serves as a lower bound for ShuffleDetect per CNN via the assessment of  Ψ C ( 100 , 91 , Ω clean ) .
Table 5. For each  C k , the number (=percentage) of clean ancestors  A q p  declared adversarial for s out of 100 permutations. The first row shows the average number of permutations for which this occurs. The last row, the sum of the two previous ones, provides an estimate of the FPR, which serves as a lower bound for ShuffleDetect per CNN via the assessment of  Ψ C ( 100 , 91 , Ω clean ) .
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Average
s41.640.231.735.932.426.826.426.644.440.534.7
( 95 100 ] 181181310886161110.9
( 90 95 ] 52652532744.1
Ψ C ( 100 , 91 , Ω clean ) 231314181213118231515
Table 6. For each  C k , for each targeted attack  a t k , percentage s of the 100 permutations  σ  for which the shuffled-by  σ  image of an adversarial image, namely  ShuffleDetect σ C k D k a t k , t a r g e t e d ( A q p ) , is declared adversarial on average, assessment of  Ψ C k ( 100 , 91 , Ω a d v , k t a r g e t ) , of the maximum possible detection rate  M % , and of  s m i n t a r g e t ( k , a d v ) .
Table 6. For each  C k , for each targeted attack  a t k , percentage s of the 100 permutations  σ  for which the shuffled-by  σ  image of an adversarial image, namely  ShuffleDetect σ C k D k a t k , t a r g e t e d ( A q p ) , is declared adversarial on average, assessment of  Ψ C k ( 100 , 91 , Ω a d v , k t a r g e t ) , of the maximum possible detection rate  M % , and of  s m i n t a r g e t ( k , a d v ) .
Targeted Attacks C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Average
EA-targeted10010010099.899.899.999.910010099.999.9
Ψ C k ( 100 , 91 , Ω EA , k t a r g e t ) 100100100100100100100100100100100
M % 100100100100100100100100100100100
s m i n t a r g e t ( k , E A ) 100100100939596981001009697.8
BIM-targeted10099.799.899.499.299.699.699.999.999.299.6
Ψ C k ( 100 , 91 , Ω BIM , k t a r g e t ) 1001001009897.898.298.610010098.799.1
M % 100100100100100100100100100100100
s m i n t a r g e t ( k , B I M ) 10092977264857799985884.2
PGD Inf-targeted99.999.799.899.499.299.699.699.999.999.299.6
Ψ C k ( 100 , 91 , Ω PGD Inf , k t a r g e t ) 1001001009897.898.298.610010098.799.1
M % 100100100100100100100100100100100
s m i n t a r g e t ( k , P G D I n f ) 9992977264857799985884.1
PGD L2-targeted99.799.699.899.699.599.699.599.799.999.299.6
Ψ C k ( 100 , 91 , Ω PGD L 2 , k t a r g e t ) 10010010098.998.897.898.998.810097.999.1
M % 100100100100100100100100100100100
s m i n t a r g e t ( k , P G D L 2 ) 9592967262857887985481.9
Table 7. For each  C k , for each untargeted attack  a t k , the percentage s of the 100 permutations  σ  for which the shuffled-by  σ  image of an adversarial image, namely  ShuffleDetect σ C k D k a t k , u n t a r g e t e d ( A q p ) , is declared adversarial on average, the assessment of  Ψ C k ( 100 , 91 , Ω a d v , k u n t a r g e t ) , of the maximum possible detection rate  M % , and of  s m i n u n t a r g e t ( k , a d v ) .
Table 7. For each  C k , for each untargeted attack  a t k , the percentage s of the 100 permutations  σ  for which the shuffled-by  σ  image of an adversarial image, namely  ShuffleDetect σ C k D k a t k , u n t a r g e t e d ( A q p ) , is declared adversarial on average, the assessment of  Ψ C k ( 100 , 91 , Ω a d v , k u n t a r g e t ) , of the maximum possible detection rate  M % , and of  s m i n u n t a r g e t ( k , a d v ) .
Untargeted Attacks C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Average
EA-untargeted86.086.991.194.490.692.293.395.091.790.791.2
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 69.764.978.783.678.577.785.586.785.878.779.0
M % 10010098.910010010010010098.998.999.6
s m i n u n t a r g e t ( k , E A ) 21522111243247.5
FGSM-untargetedNA75.384.889.282.281.584.286.482.284.183.3
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) NA45.763.474.053.756.968.858.763.061.760.7
M % NA97.598.798.798.898.798.798.997.799.798.6
s m i n u n t a r g e t ( k , F G S M ) NA21241116123177.5
BIM-untargeted67.068.784.190.183.279.286.786.881.677.880.6
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 40.840.664.57566.656.174.762.168.444.659.3
M % 97.894.598.998.997.897.998.998.997.898.998.0
s m i n u n t a r g e t ( k , B I M ) 1139931021126.0
PGD Inf-untargeted67.068.684.190.083.279.286.786.881.677.880.6
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 40.839.564.57566.656.174.762.168.444.659.2
M % 97.894.598.998.997.897.998.998.997.898.998.0
s m i n u n t a r g e t ( k , P G D I n f ) 1139931021126.0
PGD L2-untargeted66.959.678.687.880.374.382.981.975.669.175.9
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 43.030.752.568.659.351.566.353.651.037.851.4
M % 96.792.397.998.997.997.998.998.996.897.897.4
s m i n u n t a r g e t ( k , P G D L 2 ) 111318238134.1
CW Inf-untargeted79.482.090.491.588.486.590.891.283.987.887.3
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 61.758.976.577.773.467.077.374.764.565.969.7
M % 97.898.998.998.998.99998.998.997.898.998.7
s m i n u n t a r g e t ( k , C W I n f ) 7115182410242712515.2
DeepFool-untargeted90.590.793.095.692.893.393.894.891.592.192.8
Ψ C k ( 100 , 91 , Ω EA , k u n t a r g e t ) 76.578.381.588.682.98382.983.581.282.982.1
M % 10010098.91001009910010098.998.999.5
s m i n u n t a r g e t ( k , D e e p F o o l ) 123516122851211212.4
Table 8. Table of FP per CNN for each selected value of  R t h ; FPR is deduced from FP by the formula  FPR = FP / 100 .
Table 8. Table of FP per CNN for each selected value of  R t h ; FPR is deduced from FP by the formula  FPR = FP / 100 .
R th C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 FP avg
513838303433262526403732.7
543736293330252325403631.4
872413161914151312291917.4
91231314181213118231515
Table 9. Targeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores for CNN, per attack, for selected values of  R t h = 0.51  and  0.54 , and their corresponding averages.
Table 9. Targeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores for CNN, per attack, for selected values of  R t h = 0.51  and  0.54 , and their corresponding averages.
R th Targeted
Attack
Metrics C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.51EADR100100100100100100100100100100100
TP9190888479858986979788.6
FP3737283229252325393731.2
FN00000000000
Precision0.710.700.750.720.730.770.790.770.710.720.73
Recall11111111111
F10.830.820.850.830.840.870.880.870.830.830.84
BIMDR100100100100100100100100100100100
TP4338575246567350877858
FP2323201921181715353122.2
FN00000000000
Precision0.650.620.740.730.680.750.810.760.710.710.71
Recall11111111111
F10.780.760.850.840.80.850.890.860.830.830.82
PGD InfDR100100100100100100100100100100100
TP4938575246567350877858.6
FP2323201921181715353122.2
FN00000000000
Precision0.680.620.740.730.680.750.810.760.710.710.71
Recall11111111111
F10.800.760.850.840.800.850.890.860.830.830.83
PGD L2DR100100100100100100100100100100100
TP9088949289949486979992.3
FP2323203129262424383627.4
FN00000000000
Precision0.790.790.820.740.750.780.790.780.710.730.76
Recall11111111111
F10.880.880.900.850.850.870.880.870.830.840.86
0.54EADR100100100100100100100100100100100
TP9190888479858986979788.6
FP3635273126242124393629.9
FN00000000000
Precision0.710.720.760.730.750.770.800.780.710.720.74
Recall11111111111
F10.830.830.860.840.850.870.880.870.830.830.84
BIMDR100100100100100100100100100100100
TP4338575246567350877858
FP2222191919181614353021.4
FN00000000000
Precision0.660.630.750.730.700.750.820.780.710.720.72
Recall11111111111
F10.790.770.850.840.820.850.900.870.830.830.83
PGD InfDR100100100100100100100100100100100
TP4938575246567350877858.6
FP2222191919181614353021.4
FN00000000000
Precision0.690.630.750.730.700.750.820.780.710.720.72
Recall11111111111
F10.810.770.850.840.820.850.90.870.830.830.83
PGD L2DR100100100100100100100100100100100
TP9088949289949486979992.3
FP2222193026252223383526.2
FN00000000000
Precision0.800.800.830.750.770.780.810.780.710.730.77
Recall11111111111
F10.880.880.900.850.870.870.890.870.830.840.86
Table 10. Targeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.87  and  0.91 , and their corresponding averages.
Table 10. Targeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.87  and  0.91 , and their corresponding averages.
R th Targeted
Attack
Metrics C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.87EADR100100100100100100100100100100100
TP9190888479858986979788.6
FP2313161812151212291916.9
FN00000000000
Precision0.790.870.840.820.860.850.880.870.760.830.83
Recall11111111111
F10.880.930.910.900.920.910.930.930.860.900.9
BIMDR10010010098.197.898.298.610010098.799.1
TP4338575145557250877757.5
FP17911151111119271813.9
FN00011110010.5
Precision0.710.800.830.770.800.830.860.840.760.810.8
Recall1110.980.9780.9820.986110.9870.99
F10.830.880.900.860.880.890.910.910.860.880.87
PGD InfDR1001001009897.898.298.610010098.799.1
TP4938575145557250877758.1
FP17911151111119271813.9
FN00011110010.5
Precision0.740.800.830.770.800.830.860.840.760.810.8
Recall1110.980.9780.9820.986110.9870.99
F10.850.880.900.860.880.890.910.910.860.880.88
PGD L2DR10010010098.998.898.998.910010098.999.4
TP9088949188939386979891.8
FP179111912151211281915.3
FN00011110010.5
Precision0.840.900.890.820.880.860.880.880.770.830.85
Recall1110.980.980.980.98110.980.99
F10.910.940.940.890.920.910.920.930.870.890.91
0.91EADR100100100100100100100100100100100
TP9190888479858986979788.6
FP231314171113108231514.7
FN00000000000
Precision0.790.870.860.830.870.860.890.910.800.860.85
Recall11111111111
F10.880.930.920.900.930.920.940.950.880.920.91
BIMDR100.00100.00100.0098.0097.8098.2098.60100.00100.0098.7099.10
TP4338575145557250877757.5
FP17991510997221412.1
FN00011110010.5
Precision0.710.800.860.770.810.850.880.870.790.840.81
Recall1110.980.9780.9820.986110.9870.99
F10.830.880.920.860.880.910.920.930.880.900.89
PGD InfDR1001001009897.898.298.610010098.799.1
TP4938575145557250877758.1
FP17991510997221412.1
FN00011110010.5
Precision0.740.800.860.770.810.850.880.870.790.840.82
Recall1110.980.970.980.98110.980.99
F10.850.880.920.860.880.910.920.930.880.900.89
PGD L2DR10010010098.998.897.898.998.810097.999.1
TP9088949188929385979791.5
FP1799181113108221513.2
FN00011211020.8
Precision0.840.900.910.830.880.870.900.910.810.860.87
Recall1110.9890.9880.9780.9890.98810.9790.99
F10.910.940.950.90.930.920.940.940.890.910.91
Table 11. Untargeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.51 , and their corresponding averages.
Table 11. Untargeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.51 , and their corresponding averages.
R th Untargeted
Attack
Metrics C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.51EADR88.590.793.996.996.995.996.997.993.993.994.5
TP8588939595959496939392.7
FP3838293433262526393632.4
FN119633432665.3
Precision0.690.690.760.730.740.780.780.780.700.720.74
Recall0.880.900.930.960.960.950.960.970.930.930.94
F10.770.780.830.820.830.850.860.860.790.810.82
FGSMDR 80.789.092.586.287.285.792.585.886.587.3
TP 67737569756674797772.7
FP 33283129232223343028.1
FN 16961111116131210.5
Precision 0.670.720.700.700.760.750.760.690.710.72
Recall 0.800.890.920.860.870.850.920.850.860.87
F1 0.720.790.790.770.810.790.830.760.770.78
BIMDR66.672.586.493.786.083.688.492.684.284.083.8
TP6266839080828488807979.4
FP3432283229242224333128.9
FN31251361316117151515.2
Precision0.640.670.740.730.730.770.790.780.700.710.73
Recall0.660.720.860.930.860.830.880.920.840.840.83
F10.640.690.790.810.780.790.830.840.760.760.77
PGD InfDR66.672.586.493.786.083.688.492.684.284.083.8
TP6266839080828488807979.4
FP3432283229242224333128.9
FN31251361316117151515.2
Precision0.640.670.740.730.730.770.790.780.700.710.73
Recall0.660.720.860.930.860.830.880.920.840.840.83
F10.640.690.790.810.780.790.830.840.760.760.77
PGD L2DR68.861.580.491.983.377.786.788.678.173.679.0
TP6456789180778586757076.2
FP3431273229242224343228.9
FN293519816221311212519.9
Precision0.650.640.740.730.730.760.790.780.680.680.72
Recall0.680.610.800.910.830.770.860.880.780.730.79
F10.660.620.760.810.770.760.820.820.720.70.75
CW InfDR79.786.393.893.991.890.093.896.988.189.390.3
TP7582929390909196828487.5
FP3433283230252124313028.8
FN1913668106311109.2
Precision0.680.710.760.740.750.780.810.800.720.730.75
Recall0.790.860.930.930.910.900.930.960.880.890.90
F10.730.770.830.820.820.830.860.870.790.80.81
DeepFoolDR94.693.895.696.997.896.097.897.994.793.695.8
TP8991889492969295918891.6
FP3637273328252225373230.2
FN56432422563.9
Precision0.710.710.760.740.760.790.800.790.710.730.75
Recall0.940.930.950.960.970.960.970.970.940.930.95
F10.80.80.840.830.850.860.870.870.80.810.83
Table 12. Untargeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.91 , and their corresponding averages.
Table 12. Untargeted attacks—DR as a percentage, and TP, FP, FN, precision, recall, F1 scores per CNN per attack for each selected value of  R t h = 0.91 , and their corresponding averages.
R th Untargeted
Attack
Metrics C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.91EADR69.764.978.783.678.577.785.586.785.878.778.9
TP6763788277778385857877.5
FP231313181213118221414.7
FN2934211621221413142120.5
Precision0.740.820.850.820.860.850.880.910.790.840.83
Recall0.690.640.780.830.780.770.850.860.850.780.78
F10.710.710.810.820.810.80.860.880.810.80.8
FGSMDR 45.763.474.053.756.968.858.763.061.760.7
TP 38526043495347585550.6
FP 111316912108171111.9
FN 45302137372433343432.8
Precision 0.770.800.780.820.800.840.850.770.830.8
Recall 0.450.630.740.530.560.680.580.630.610.6
F1 0.560.70.750.640.650.750.680.690.70.68
BIMDR40.840.664.575.066.656.174.762.168.444.659.3
TP3837627262557159654256.3
FP20111316912108161112.6
FN5554342431432536305238.4
Precision0.650.770.820.810.870.820.870.880.800.790.8
Recall0.400.400.640.750.660.560.730.620.680.440.58
F10.490.520.710.770.750.660.790.720.730.560.67
PGD InfDR40.839.564.575.066.656.174.762.168.444.659.2
TP3836627262557159654256.2
FP20131316912108161112.8
FN5555342431432436305238.4
Precision0.650.730.820.810.870.820.870.880.800.790.8
Recall0.400.390.640.750.660.560.740.620.680.440.58
F10.490.50.710.770.750.660.790.720.730.560.66
PGD L2DR43.030.752.568.659.351.566.353.651.037.851.4
TP4028516857516552493649.7
FP20111216912108171212.7
FN5363463139483345475946.4
Precision0.660.710.800.800.860.800.860.860.740.750.78
Recall0.430.300.520.680.590.510.660.530.510.370.51
F10.520.420.630.730.690.620.740.650.60.490.6
CW InfDR61.758.976.577.773.467.077.374.764.565.969.7
TP5856757772677574606267.6
FP20111316101298141112.4
FN3639232226332225333229.1
Precision0.740.830.850.820.870.840.890.900.810.840.83
Recall0.610.580.760.770.730.670.770.740.640.650.69
F10.660.680.80.790.790.740.820.810.710.730.75
DeepFoolDR76.578.381.588.682.983.082.983.581.282.982.1
TP7276758678837881787878.5
FP22121217121298201213.6
FN2221171116171616181617
Precision0.760.860.860.830.860.870.890.910.790.860.84
Recall0.760.780.810.880.820.830.820.830.810.820.81
F10.760.810.830.850.830.840.850.860.790.830.82
Table 13. Average for all indicators (worst case for F1) per CNN over all 4 targeted attacks.
Table 13. Average for all indicators (worst case for F1) per CNN over all 4 targeted attacks.
Targeted Attacks
R th C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.51DR100100100100100100100100100100100
TP68.363.574.070.065.072.882.368.092.088.074.4
FP26.526.522.025.325.021.820.319.836.833.825.8
FN00000000000
Precision0.710.680.760.730.710.760.800.770.710.720.73
Recall11111111111
F10.820.810.860.840.820.860.890.870.830.830.84
F1 W o r s t 0.780.760.850.830.800.850.880.860.830.830.82
0.91DR10010010098.7398.698.5599.0399.710098.8399.34
TP68.2563.57469.2564.2571.7581.567.75928773.93
FP18.51010.2516.2510.5119.57.522.2514.513.03
FN0000.750.7510.750.25010.45
Precision0.770.840.870.80.840.860.890.890.790.850.84
Recall1110.990.980.990.990.9910.990.99
F10.870.910.930.880.910.920.930.940.880.910.90
F1 W o r s t 0.830.880.920.860.880.910.920.930.880.90.89
Table 14. Average for all indicators (worst case for F1) per CNN over all 7 untargeted attacks.
Table 14. Average for all indicators (worst case for F1) per CNN over all 7 untargeted attacks.
Untargeted Attacks
R th C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.51DR77.579.789.494.289.787.791.194.187.086.487.7
TP72.873.784.389.783.785.385.189.082.981.482.8
FP35.033.727.932.329.624.422.324.334.431.729.6
FN21.018.410.05.49.411.98.15.412.312.711.5
Precision0.70.70.70.70.70.80.80.80.70.70.7
Recall0.80.80.90.90.90.90.90.90.90.90.9
F10.70.70.80.80.80.80.80.80.80.80.8
F1 W o r s t 0.640.620.760.810.770.760.820.820.720.700.7
0.91DR55.4251.2368.877.568.7164.0475.7468.7768.959.4665.86
TP52.1747.716573.8664.4362.4370.8665.2965.7156.1462.36
FP20.8311.7112.7116.431012.149.857817.4311.7113.08
FN41.6744.4329.2921.2928.7134.7122.5729.1429.433831.92
Precision0.700.780.830.810.860.830.870.880.790.810.82
Recall0.550.510.680.770.680.640.750.680.690.590.65
F10.610.600.740.780.750.710.800.760.720.670.71
F1 W o r s t 0.490.420.630.730.690.620.740.650.60.490.61
Table 15. Average for all indicators (worst case for F1) per CNN over all attacks.
Table 15. Average for all indicators (worst case for F1) per CNN over all attacks.
All Attacks
R th C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 Avg
0.51DR88.7389.8694.6897.1194.8693.8695.5597.0793.5093.2193.84
TP70.5468.6179.1479.8674.3679.0283.7078.5087.4384.7178.59
FP30.7530.1124.9328.7727.2923.0921.2722.0235.5932.7327.65
FN10.509.215.002.714.715.934.072.716.146.365.74
Precision0.690.680.750.730.720.770.790.770.710.720.73
Recall0.880.900.940.970.950.940.950.970.930.930.94
F10.760.760.830.830.810.840.860.860.800.800.82
F1 W o r s t 0.710.690.810.820.790.810.850.840.780.770.78
0.91DR77.7175.6184.4088.1183.6681.3087.3884.2484.4579.1482.60
TP60.2155.6169.5071.5564.3467.0976.1866.5278.8671.5768.14
FP19.6710.8611.4816.3410.2511.579.687.7519.8413.1113.05
FN20.8322.2114.6411.0214.7317.8611.6614.7014.7119.5016.19
Precision0.740.810.850.810.850.840.880.890.790.830.83
Recall0.770.750.840.880.830.810.870.840.840.790.82
F20.740.750.830.830.830.810.870.850.800.790.81
F1 W o r s t 0.660.650.780.800.790.770.830.790.740.700.75
Table 16. For  R t h = 0.51  and  0.91 , the optimal number of permutations  t o p t i m a l , C , a t k  per CNN and attack, and the optimal number of permutations  t o p t i m a l , C  per CNN valid for all tested attacks (potentially relevant to assess unknown attacks).
Table 16. For  R t h = 0.51  and  0.91 , the optimal number of permutations  t o p t i m a l , C , a t k  per CNN and attack, and the optimal number of permutations  t o p t i m a l , C  per CNN valid for all tested attacks (potentially relevant to assess unknown attacks).
Optimal Number  t optimal , C , atk  of Permutations per CNN and Attack
R th ScenarioAttacks C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10
0.51Untargeted              EA3333333333
FGSM 319133997135
BIM331151119317135
PGD Inf331151119317135
PGD L23737515662779
CW Inf3525175352373
DeepFool537337968333
TargetedEA12121212121212121212
BIM12121212121212121212
PGD Inf12121212121212121212
PGD L212121212121212121212
t o p t i m a l , C  per CNN12122517371968271312
0.91UntargetedEA12121212121212121212
FGSM 121212121212121212
BIM121212121212121210012
PGD Inf121212121212121210012
PGD L212121212121212121212
CW Inf12122312341212121212
DeepFool12341212121212121212
TargetedEA12121212121212121212
BIM12121212121212121212
PGD Inf12121212121212121212
PGD L212671212121212121212
t o p t i m a l , C  per CNN126723123412121210012
Table 17. Performance comparison of ShuffleDetect and FS regarding detection rates.
Table 17. Performance comparison of ShuffleDetect and FS regarding detection rates.
ScenarioAttacksDetectors C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8