In this section, the literature related to adversarial sample attack is first reviewed; then, the research related to probability density estimation in probabilistic generative models is introduced, and finally the advantages and application scenarios of the score-matching (SM) method are presented, with an explanation of why it can be used for adversarial sample attacks.
2.1. Adversarial Sample Attack Methods
Adversarial sample attack methods can be divided into targeted and non-targeted attacks according to the presence or absence of a specific target to be attacked; white-box and black-box attacks according to whether the attacker knows the target model; and gradient-based attacks, optimization-based attacks, transfer-based attacks, decision-based attacks, and other attacks according to the attack methods [
1].
Gradient-based attacks: Goodfellow et al. [
6] were the first to propose a gradient-based attack method, named the fast gradient sign method (FGSM), which added perturbations along the reverse direction of the gradient to make the loss function change and eventually led to model misclassification. However, the attack success rate was low due to the single-step nature of the attack with large computational perturbation errors. To address this problem, Kurakin et al. [
7] proposed the iterative fast gradient sign method (I-FGSM), which subdivided the single-step perturbation into multiple steps and restricted the image pixels to the effective region by clipping, thus improving the attack success rate. I-FGSM tended to overfit to the local extremes, which affected the transferability of the adversarial samples; thus, Dong et al. [
8] proposed the momentum iterative fast gradient sign method (MI-FGSM), which introduced the momentum idea to stabilize the gradient update direction while crossing the local extrema. Projected gradient descent (PGD) [
9] was also a development based on the above method, which greatly improved the attack effect by adding a layer of random initialization processing and increasing the number of iterations. The alternative diverse-inputs iterative fast gradient sign method (
-FGSM) [
10] was designed based on the preprocessing of the image, which enhanced the transferability and stability of the method. Besides FGSM and its variants, Papernot et al. [
11], inspired by the saliency map concept [
12], proposed the Jacobian-based saliency map attack (JSMA) method, which used gradient information to calculate the pixel positions that had the greatest impact on the classification results and added perturbations to them.
Optimization-based attacks: The essence of adversarial sample attack algorithms is to find relatively small perturbations and generate an effective adversarial sample to implement the attack, so the process of adversarial sample generation can be defined as an optimization problem to be solved. The box-constrained limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method proposed by Szegedy et al. [
2] was a prototype of an optimization-based attack method, which used a quasi-Newtonian method of numerical optimization for the solution. Carlini and Wagner proposed the (C&W) method [
13] as the most classical optimization method, which defined different objective functions and could increase the space size of the optimal solution by changing the variables in the objective function, thus significantly improving the success rate of the attack. Unlike the above attack methods, Su et al. [
14] proposed a one-pixel method that required only one pixel point to be modified for a successful attack and used a differential evolutionary optimization algorithm to generate the perturbation by determining the location of the single pixel point to be modified.
Transfer-based attacks: An attacker uses a white-box attack approach to perform an attack on the surrogate model of the target model to generate a transferable adversarial sample and successfully attack the target model. Querying the target model to obtain a similar training dataset to generate a surrogate model is the main method of obtaining a surrogate model [
15]. Li et al. [
16] selected the most informative sample for querying through an active learning strategy, which further reduced the query cost while improving the model training quality. Inspired by a data enhancement strategy, Xie et al. [
10] quickly and effectively expanded the dataset by transformations such as cropping and rotating the training dataset; thus, the overfitting phenomenon of surrogate models was solved. Wu et al. [
17] introduced the concept of model attention and used the attention-weighted combination of feature mappings as a regularization term to further address the overfitting phenomenon of surrogate models. Li et al. [
18] found that multiple integrated surrogate models did not need to have large variability; thus, they used existing surrogate models to generate several different virtual models for integration, which significantly enhanced the transferability of adversarial samples and reduced the training cost of alternative models. Hu et al. [
19] proposed new multi-stage ensemble adversarial attacks based on model scheduling and sample selection strategies.
Decision-based attacks: In transfer-based attack methods, querying the target model is an essential step, and the attack will fail when the query to the target model is restricted. In contrast, decision-based black-box attack methods successfully get rid of the reliance on querying the target model by random wandering and are more in line with actual attack scenarios. In simple terms, the attacker first obtains the initial adversarial sample with a large perturbation value and uses it as a basis to search for smaller perturbation values near the model decision boundary to obtain the final adversarial sample. Hence, how to determine the search direction for smaller perturbation values and how to improve its search efficiency are the two aspects that need to be focused on. Dong et al. [
20] used the covariance matrix adaptation evolution strategy (CMA-ES) [
21] to model the search direction on the decision boundary with local geometry, thus reducing the search dimension and improving the search efficiency. Brunner et al. [
22] proposed a biased decision boundary search framework to find better search directions by restricting the decision boundary of the search to perturbations with a higher attack success rate. Shi et al. [
23] proposed the customized adversarial boundary (CAB) approach by exploring the relationship between initial perturbations and search-improved perturbations, which was able to obtain smaller values of adversarial perturbations. Rahmati et al. [
24] observed that the decision boundaries of deep neural networks usually have a small mean curvature near the data samples and accordingly proposed the geometric framework for decision-based black-box attacks (GeoDA) with a high query efficiency.
Other attacks: Attack methods based on generative adversarial networks (GANs) [
25] can generate adversarial samples without knowing information about the target model. GANs use a generator to generate adversarial samples and then feed the adversarial samples to a discriminator to ensure that the differences between the adversarial samples and the source images are small enough. An external target model determines the difference between the predicted label and the true label of the adversarial sample. The attack is successful if the final adversarial sample obtained is true and natural and misclassifies the target model. The above method is named AdvGAN [
26]. Subsequently, Jandial et al. [
27] changed the input of the generator from the source image to its potential feature vector, which reduced the GAN training time and significantly improved the attack success rate. Based on the idea of hyperplane classification, Moosavi et al. [
28] proposed the DeepFool method by calculating the shortest distance between the source sample and the classification boundary of the target model. Later, Moosavi et al. [
29] went on to propose the universal adversarial perturbation (UAP) method, which generated adversarial perturbations with a strong generalization capability by calculating the shortest distance between the source sample and multiple classification decision boundaries of the target model. Similar to the idea of the JSMA method for finding salient graphs, Zhou et al. [
30] used class activation mapping (CAM) to filter out important features of images and generate adversarial samples by content perception means to achieve low-cost and highly transferable adversarial attacks. Jon et al. [
31] introduced a probabilistic framework to maliciously control the probability distribution of the classes so as to result in more ambitious and complex attacks.
In general, gradient-based and optimization-based attacks are white-box attacks, which require mastering the architecture of the target model and have less transferability. Transfer-based attacks only need to reconstruct a surrogate model of the target model by constant querying, and so have high transferability. Decision-based black-box attacks successfully get rid of the dependence on querying the target model by random wandering and have higher transferability.
2.2. Probability Density Estimation Methods
Probability density distribution estimation is originally applied to probabilistic generative models and can be used to model different raw data. One can estimate the true distribution by observing finite samples and sample a new independent and identically distributed sample. Its working principle is mainly based on maximum likelihood estimation. On the one hand, for the estimation of explicit probability distributions with parameters
, the model provides a likelihood of
to the
m training samples, and the maximum likelihood principle chooses the parameter
that maximizes that probability; on the other hand, for the estimation of implicit probability distributions, the maximum likelihood can be approximated as the solution of the parameter
that minimizes the Kullback–Leibler divergence [
32] between the model distribution and the data distribution. Therefore, likelihood-based generative models can be divided into implicit models and explicit models [
33].
Implicit models: Typical representatives are GANs and generative stochastic networks (GSNs) [
34], the core purpose of which is to make the distribution
of data generated by the model approximate the true distribution
of the original data. GANs do not explicitly model the probability density function
and cannot be solved by the maximum likelihood method; however, they can directly use the generator to sample from the noise and output samples and force the minimum distance between
and
to be learned with the help of a discriminator. GSNs differ from GANs in that they need to use Markov chains for sampling after reaching a smooth distribution, and the huge computational effort makes them difficult to extend to high-dimensional spatial data.
Explicit models: For models with an explicitly defined probability density distribution
, the process of maximum likelihood is relatively straightforward, involving the substitution of the probability density distribution into the expression of the likelihood and the updating of the model in the direction of the increasing gradient, but the challenge is to define the model in such a way that it can express the complexity of the data and facilitate the calculation. Tractable explicit models define a probability density distribution that is easy to compute, and the main representative of this model is the fully-visible Bayes network (FVBN) [
35], which uses the chain rule of probability and transforms it into the form of the joint product of conditional probabilities, but the disadvantage is that the generation of element values depends on the previous element values, which is inefficient. The masked autoencoder for distribution estimation (MADE) [
36] and pixel recurrent neural network (PixelRNN) [
37] both belong to this class of models. Approximate explicit models avoid the above limitation of needing to set a probability density function that is easy to solve and use several approximate methods to solve the maximum likelihood instead. The variational autoencoder (VAE) [
38] transforms solving the maximum likelihood into solving the extreme value problem with the ELBO (evidence lower bound) by variational approximate inference. The Markov chain Monte Carlo (MCMC) approach [
39] is a method that uses Markov chains to simplify the computational process of Monte Carlo random sampling to obtain approximate results. The energy-based model (EBM) [
40] is a substitute for the maximum likelihood that constructs an energy function to estimate the degree of matching between sample
x and label
y. It is not a specific model, but a class of models. Energy-based learning provides a unified framework for many probabilistic and non-probabilistic learning methods and can be viewed as an alternative to probabilistic estimation for prediction, classification, or decision making. By not requiring proper normalization, energy-based methods avoid the problems associated with estimating normalization constants in probabilistic models. In addition, the absence of normalization conditions allows more flexibility in the design of the model. Most probabilistic models can be considered as special types of EBM, such as RBMs [
41] and DBNs [
42]. Diffusion models are different from the above methods. Inspired by nonequilibrium thermodynamics, diffusion models define Markov chains of diffusion steps, gradually add random noise to the original data, and then use deep neural networks to learn the inverse diffusion process to construct the desired data samples from the noise. Their diffusion process is derived by fixed mathematical formulas, and the real learning process is the inverse process. Denoising diffusion probabilistic models (DDPMs) [
43] are a typical representative of these models.
2.3. Score Matching Methods
Considering a subset of
from the true distribution
of source data, a probabilistic generative model based on the maximum likelihood estimation must find
to approximate
. Take EBM as an example. The probability density function is modeled as
, where
denotes the energy of sample
x, with a lower energy corresponding to a higher probability. This is a non-normalized probability and can be trained by a deep neural network.
is a normalization constant depending on
so as to guarantee
. However, since
involves all the data in the probability distribution and is difficult to solve, in order to make maximum likelihood training feasible, the likelihood-based generative model must restrict its model structure or approximate
, which is more computationally expensive. Fortunately, this problem is cleverly circumvented by the score function
[
44], which is the logarithmic gradient of the probability density and points to the gradient field in the direction of the fastest growth of the probability density function. According to
, the score function eliminates
by taking the derivative, which makes the solution easier.
Hyvarinen et al. [
44] first proposed the score-matching (SM) method to solve unstandardized statistical models by estimating the difference between
and
. Since the solution of the SM method involves the calculation of
, i.e., the Hessian matrix about
, which involves multiple backpropagations and was computationally intensive, Song et al. [
45] proposed the sliced score-matching (SSM) method on this basis, which projected the high-dimensional vector field of
onto the low-dimensional randomly sliced vector
v of a simple distribution (e.g., multivariate standard Gaussian distribution or uniform distribution) for the solution. Additionally, the vector problem was scalarized, requiring only one backpropagation, which greatly reduced the computational effort.
Since the SM method estimates by solving for , and the adversarial sample generation process precisely requires gradient information, the SM method can be applied. Compared with traditional adversarial sample attack methods that rely only on the decision boundary guidance of classifiers, our method considers the probability distribution of samples, overcomes the limitations of different structural classifiers, effectively improves the transferability, and provides a reasonable explanation from the mathematical theory and visualization perspectives.