1. Introduction
Deepfake [
1] constructs generator models based on generative adversarial networks (GANs) to forge images. Receiving real images as input, the deepfake model can output fake images by, for example, changing hair color. Deepfake has played an important role in the entertainment and culture industry, bringing many conveniences to life and work. However, malicious users may take advantage of this technology to produce fake videos and news, misleading face recognition systems and seriously disrupting the social order [
2,
3].
In order to cope with deepfake tampering with images, a large number of studies focus on constructing deepfake detection models [
4,
5,
6,
7,
8,
9,
10], which can detect whether an image is faked. However, this technology can only ensure the authenticity of the image, but it cannot guarantee the integrity of the image. Moreover, even if an image is confirmed a fake, negative impacts are caused on the people concerned or on society because the image has already been widely circulated. More direct interventions should therefore be taken to ensure that images are not tampered with though deepfake from the source.
Some studies propose the use of the adversarial attack [
11] to interfere with the work of the deepfake model. The main idea of the adversarial attack is adding a perturbation, imperceptible to the naked eye, to the original example, generating the adversarial example, which can mislead the deep learning models to produce a quite different output. The adversarial attack was originally used to destroy security systems such as face recognition, which posed a huge challenge to the security of deep learning models. However, if the object of an adversarial attack is turned into a malicious model such as through deepfake, the meaning of the adversarial attack becomes dramatically opposite: to disrupt the normal operation of malicious models to guarantee information security. As shown in
Figure 1, the specific operation involves adding a perturbation, imperceptible to the naked eye, to an image when users post the image online so that when the attacker obtains the image, the fake version generated through deepfake will have obvious distortions or deformations, which can be easily identified as forgery.
In the current study, however, the generalization of the adversarial attack against the deepfake model is very limited: an adversarial example generated for a specific deepfake model is unable to produce equal attack effect on other models [
12]; furthermore, even in the same model, an adversarial example generated in a particular domain cannot achieve effective attack in other domains [
13] (by setting the corresponding conditional variables, deepfake models can generate multiple domains of forged images, such as the hair color or the gender of the face image). Without the knowledge of what deepfake model will be employed or what conditional variables will be set to tamper with images, the adversarial attack methods currently studied have great limitations in practice.
In order to improve the generalization of the adversarial attack, that is, to generate the adversarial samples in each domain of multiple models of given images, this paper proposes a framework of CrossDomain and Model Adversarial Attack (CDMAA): Any gradientbased adversarial example generation algorithm can be used for an adversarial attack, such as the IFGSM [
14]. In the backpropagation phase, the algorithm uniformly weights the loss function with different condition variables in the model to extend the generalization of the adversarial example between various domains. The Multiple Gradient Descent Algorithm (MGDA) [
15] is used to calculate the weighted sum of the gradients of each model to ensure the generalization of adversarial examples between various models. Finally, we propose a penaltybased gradient regularization method to further improve the success rate of adversarial attacks. CDMAA can expand the attack range of the generated adversarial example and ensure that the images are not tampered with and forged by multiple deepfake models.
2. Related Work
According to the category of model input, some deepfake models input random noise to synthesize images which were entirely nonexistent before [
16], such as ProGAN [
17], StyleGAN [
18], etc. Other deepfake models input real images to achieve the image translation from domain to domain. For example, StarGAN [
19], AttGAN [
20] and STGAN [
21] can translate the facial images in domains by setting different conditional variables, such as hair color, age, etc. Unsupervised models, such as CycleGAN [
22] and UGATIT [
23], can only translate images to a single domain, which can be considered a special case of multidomain translation models with a total domain number of 1. This paper focuses on image translation deepfake models and performs the adversarial attack on them to interfere with their normal functions and protect real images from being tampered with.
The adversarial attack was initially applied to classification models [
24]. Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM) [
25]. The FGSM sets the distance between the model output of the adversarial example and the model output of the original example as the loss function. The gradient of the loss function with respect to the input indicates a direction where the output difference between the adversarial example and the original example ascends fastest. Therefore, the FGSM adds the gradient in the original example to generate an effective adversarial example. Kurakin et al. proposed the iterative FGSM (IFGSM) [
14], which iteratively performs gradient backpropagation to reduce the step size of updating adversarial examples and improving their efficiency. Many studies have since proposed various adversarial attack algorithms to optimize the efficiency of adversarial attacks, such as PGD [
26], which uses random noise to initialize the adversarial examples, the MIFGSM [
27], which uses momentum to update the gradient, and APGD [
28], which automatically decreases the step size.
Kos et al. [
29] first extended adversarial attacks to generation models. Yeh et al. [
30] first proposed to attack the deepfake model. They used PGD to generate adversarial examples against CycleGAN, pix2pix [
31], etc., which can distort the output of these models. Lv et al. [
32] proposed that higher weight should be given to the face part of the images when calculating the loss function so that the output distortion generated by the adversarial examples is concentrated on the face to achieve a better effect of interfering deepfake models. Dong et al. [
33] explored the adversarial attacks on encoder–decoderbased deepfake models and proposed to use the loss function with respect to latent variables in encoders to generate the adversarial examples. These studies generate adversarial examples only for certain models and do not take into account that models can output fake images of different domains by setting different condition variables, so the generalization of adversarial attacks is quite limited.
Ruiz et al. [
13] considered the generalizability of adversarial attacks across different domains. They verified that the adversarial example generated in a particular domain cannot achieve effective attack in other domains of the model and proposed two methods of iterative traversal and joint summation to generate adversarial examples that are effective for each domain. However, they did not consider the generalization between different models of the adversarial examples. Since the differences between models are much larger than the differences between domains within models, the simple method of iterative traversal or joint summation cannot be equally effective for attacks between different models.
Fang et al. [
34] considered the generalizability of adversarial attacks across models. They verified that the adversarial examples against a particular model are ineffective in attacking other models and proposed a method of weighting the loss functions of multiple models to generate adversarial examples against multiple deepfake models, where the weighting factors are found by a line search. However, the tuning experiments are extremely tedious because the weighting coefficients need to be found in
$J1$ dimensional parameter space, where
$J$ denotes the number of models. In addition, the coefficients need to be adjusted when attack models are changed, which is quite inefficient.
Compared with the existing work, this paper focuses on extending the generalization of adversarial examples across various domains and models and proposes a framework of CDMAA. CDMAA can generate adversarial examples that can attack multiple deepfake models under all condition variables with higher efficiency.
3. CDMAA
In this paper, we use the IFGSM as the basic adversarial attack algorithm to introduce the CDMAA framework. In the model forward propagation phase, we generate the crossdomain loss function of each model by uniformly weighting the loss function corresponding to each conditional variable. In the phase of model backward propagation to calculate the gradient, we use the MGDA to generate a crossmodel perturbation vector from the gradient of each crossdomain loss function. The perturbation vector is used to iteratively update the adversarial example to improve its generalizability across multiple models and domains.
3.1. IFGSM Adversarial Attack Deepfake Model
Given an original image
$\mathit{x}$, its output of the deepfake model is
$G\left(\mathit{x},c\right)$, where
$G$ denotes the deepfake model and
c denotes the conditional variable. The IFGSM generates the adversarial example
$\tilde{\mathit{x}}$ by the following steps:
where
${\mathit{x}}_{t}$ denotes the adversarial example after
t iterations,
t does not exceed the number of iteration steps
T,
$a$ denotes the step size,
$sign$ is the symbolic function,
$\epsilon $ denotes the perturbation range and the
$clip$ function restricts the size of the adversarial perturbation not to exceed the perturbation range in
${l}_{p}$norm, i.e.,
so that the difference between the adversarial example and the original image is sufficiently small to ensure that the original image is not significantly modified,
L denotes the loss function, which uses the mean squared loss (MSE) to measure the distance between the output of the adversarial example
$G\left({\mathit{x}}_{t},c\right)$ and the output of the original image
$G\left(\mathit{x},c\right)$ [
30]:
where
D denotes the dimensionality of the model output, i.e., the
$length\cdot width\cdot channels$ of the output image.
The adversarial example is updated towards the optimization goal:
The generated adversarial example is considered to have successfully attacked the deepfake model G under the condition variable c when the loss function L keeps increasing and reaches a certain threshold $\tau $, i.e., the output image has a sufficiently noticeable distortion.
3.2. CrossDomain Adversarial Attack
To extend the generalizability of the adversarial examples across various domains of the model, i.e., the optimization objective (5) is modified as
where
${c}_{i}$ denotes the
ith conditional variable of model
G and
K denotes the total number of conditional variables.
The gradient for each of the loss functions in the above optimization objectives is calculated by
where
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{i}$ indicates the optimization direction for maximizing the distortion of the output of model
G with condition variable
${c}_{i}$ for the current adversarial example
${\mathit{x}}_{t}$.
Since the backbone network is fixed in the model, changing only the condition variables has less impact on the model output, resulting in the loss functions
${L}_{i}$ and their corresponding gradients
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{i}$ of different condition variables being very similar, i.e., the
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{i}$ have approximately the same direction, as shown in
Figure 2a. Therefore, we integrate a crossdomain gradient
$\mathit{g}\mathit{r}\mathit{a}\mathit{d}$ by simply uniformly weighting
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{i}$:
$\mathit{g}\mathit{r}\mathit{a}\mathit{d}$ is obtained by integrating the loss functions corresponding to each conditional variable so that they indicate a common direction of maximizing the loss function of each domain. Using $\mathit{g}\mathit{r}\mathit{a}\mathit{d}$ to update the adversarial example can achieve the optimization objective of (6).
Consider the following equation:
That is, we can uniformly weight the loss function ${L}_{i}$ corresponding to each condition variable ${c}_{i}$ to obtain a crossdomain loss function $\sum _{i=1}^{K}{L}_{i}$ and then calculate the gradient of it with respect to ${\mathit{x}}_{t}$, which is the crossdomain gradient $\mathit{g}\mathit{r}\mathit{a}\mathit{d}$. It ensures that only one backpropagation step is performed for each model so that time consumption is reduced.
3.3. CrossModel Adversarial Attack
We further extend the generalizability of the adversarial examples across models, i.e., the optimization objective in (6) is modified:
where
${G}^{\left(j\right)}$ denotes the
jth deepfake model and
J denotes the total number of deepfake models.
The group of crossdomain gradients has been obtained from
Section 3.2:
where
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}$ denotes the crossdomain gradient of the
jth model. Considering that these gradients come from different models, the large differences between models lead to a low similarity between the gradients, as shown in
Figure 2b. Simply iterative traversing or uniform weighting these gradients is prone to a large fluctuation in the optimization process and generates an ineffective adversarial example [
35].
In order to derive a crossmodel perturbation vector
$\mathit{w}$ from the group of gradients
$\mathit{g}\mathit{r}\mathit{a}\mathit{d}$ to update the adversarial example, the CDMAA framework draws on the idea of the Multiple Gradient Descent Algorithm (MGDA) to give an idea for finding
$\mathit{w}$:
The space
$\overline{U}$ that the vector
$\mathit{u}$ values in satisfies:
Theorem 1. The solution$\mathit{w}$in (12) is the optimization direction in which the loss function corresponding to each model is increasing for the current adversarial example, i.e., it satisfies:
Proof. Equation (12) is equivalent to the following optimization problem:
To solve this extreme value problem of a multivariate function under the linear constraint, construct the Lagrange function:
since
$\mathit{u}={\displaystyle \sum _{j=1}^{J}{a}^{\left(j\right)}\cdot \mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}}$; hence
$\frac{\partial \mathit{u}}{\partial {a}^{(j)}}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}$ and
According to the Lagrange multiplier, the equation
is a necessary condition for
$\mathit{w}$ obtaining the minimum of
$\mathit{u}$; hence,
considering that
In the actual adversarial attack scenario, since the dimension
D is much larger than the number
J of the gradients in
$\mathit{g}\mathit{r}\mathit{a}\mathit{d}$, it is almost impossible for these gradients to be linearly dependent. Hence, their linear combination
$\mathit{w}\ne 0$ and
Uniting
$\sum _{j=1}^{J}{a}^{\left(j\right)}}=1,\mathit{w}\cdot \mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}=\frac{\lambda}{2$, there is
Simultaneously, (19), (20) and (14) are proven. □
Since the vector product of $\mathit{w}$ and the gradient of all model loss functions is positive, optimizing the adversarial example with $\mathit{w}$ ensures the whole improvement of loss functions in each model, i.e., the optimization objective (10), which can improve the generalization of the adversarial examples in various models.
3.4. Gradient Regularization
In
Section 3.3, if the gradients group
$\mathit{g}\mathit{r}\mathit{a}\mathit{d}$ is regularized as
and then the MGDA is used on the regularized gradients group
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{nor}$ to find a perturbation vector
$\mathit{w}$, the result of (14) holds because
where
${S}^{(j)}>0$ is the regularization factor.
Common regularization methods include L2 regularization: ${S}^{(j)}={\Vert \mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}\Vert}_{2}$, which scales the gradients to the unit vector; logarithmic gradients regularization: ${S}^{(j)}={L}^{(j)}$, which reduces the gradients by the factor of their corresponding loss function value.
Due to the large difference in norms of each of the crossdomain gradients $\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}$ which are calculated from various models, the resulting vector $\mathit{w}$ is expected to be mostly influenced by the gradients of small norms in $\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}$. In addition, without some constraints and guidance methods, the generated adversarial example will form an obvious “attack preference” due to the different vulnerability of deepfake models, only achieving high attack effect on the vulnerable models, which eventually leads to a large difference of different models.
To lead the crossmodel perturbation vector in the direction of improving the effectiveness of attacks on models that are not vulnerable to adversarial attacks and maximize the success rate of adversarial attacks against all models, we propose a penaltybased gradient regularization method:
where
${L}^{(j)}$ denotes the crossdomain loss function of the
jth model and
$\varsigma $ is a very small positive number to prevent the zerodenominator error of
${S}^{(j)}$ when
${L}^{(j)}=0$. (The value of the loss function
${L}^{(j)}$ is 0 inevitably in the first iteration of the IFGSM since the current adversarial example is the same as the original image).
The significance of using this gradient regularization is as follows:
According to (19), the
$\mathit{w}$ derived from the regularized gradient
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{nor}$ satisfies
Consider the firstorder Taylor expansion of the loss function
${L}^{\left(j\right)}\left({\mathit{x}}_{t}\right)=L\left({G}^{\left(j\right)}\left({\mathit{x}}_{t},c\right),{G}^{\left(j\right)}\left(\mathit{x},c\right)\right)$ at the
tth iteration:
where the first approximately equal sign ignores the effect of taking the sign function for
$\mathit{w}$ on the result, and the second approximately equal sign ignores the remainder of the firstorder Taylor formula for approximation.
Uniting (26) and (27), there is
The last equal sign of the above equation can be held by taking a sufficiently small $\varsigma $. Since $\frac{a\lambda}{2}$ is a constant, each model’s value of the loss function change $\mathsf{\Delta}{L}^{\left(j\right)}$ is inversely proportional to their current corresponding loss function value ${L}^{\left(j\right)}$, implying that the smaller the value of the loss function, the larger the optimization gain can be obtained. In practical adversarial attacks, the adversarial examples achieve successful attacks on some vulnerable models after a small number of iterative steps, as their corresponding loss functions have reached the threshold. It is meaningless to further improve these loss functions. Using this regularization can make the adversarial example mainly optimized in the direction of improving the loss functions that have not reached the threshold, which can improve the attack effect on their corresponding models and pursue a higher comprehensive attack success rate.
3.5. CDMAA Framework
In summary, this paper proposes a framework of adversarial attacks on multiple domains of multiple models simultaneously. Using the IFGSM adversarial attack algorithm as an example, the procedure of CDMAA is as follows (Algorithm 1):
Algorithm 1 CDMAA 
Input: original image $\mathit{x}$, iterative steps T, perturbation magnitude $\epsilon $, step size a, deepfake model group ${G}^{(j)},j=1,\dots ,J$ Output: adversarial example $\tilde{\mathit{x}}$ Initialization: ${\mathit{x}}_{0}=\mathit{x}$ 1.
For$t=0$to$T1$do  2.
For$j=1$to$J$do  3.
${L}^{\left(j\right)}=\frac{1}{{K}^{\left(j\right)}}{\displaystyle \sum _{i=1}^{{K}^{\left(j\right)}}L\left({G}^{\left(j\right)}\left({\mathit{x}}_{t},{c}_{i}\right),{G}^{\left(j\right)}\left(\mathit{x},{c}_{i}\right)\right)}$  4.
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}={\nabla}_{{\mathit{x}}_{t}}{L}^{\left(j\right)}$  5.
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{nor}^{(j)}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}/\frac{1}{\mathrm{max}\left({L}^{(j)},\varsigma \right)}$  6.
End for  7.
$\mathit{w}=\underset{\mathit{a}\in \overline{A}}{argmin}{\Vert {\displaystyle \sum _{j=1}^{J}{a}^{\left(j\right)}\cdot \mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{nor}^{{}^{\left(j\right)}}}\Vert}_{2}^{2}$  8.
${\mathit{x}}_{t+1}=clip\left({\mathit{x}}_{t}+a\cdot sign\left(\mathit{w}\right)\right)$  9.
End for  10.
$\tilde{\mathit{x}}={\mathit{x}}_{T}$

Step 3 is the use of the uniformly weighting method to obtain the crossdomain loss function, which is sufficiently effective due to the similarity of the gradients between domains (
Section 3.2). It needs only one backpropagation step to calculate gradient in the following step, while the MGDA needs
${K}^{\left(j\right)}$ backpropagation steps to calculate the gradients in each domain, thus ensuring good efficiency.
Step 7 is the use of the MGDA to obtain the crossmodel perturbation vector, where a simple uniformly weighting method is less effective due to the low similarity of gradients between models (
Section 3.3). Therefore, the MGDA is used to achieve better attacks at the expense of time. We use the Frank–Wolfe method [
36] to approximately calculate the minimal norm vector in the convex hull of
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{nor}$, which has a well convergence in such cases that the number of dimensions is much larger than the number of vectors [
35,
37].
Figure 3 shows the overview of CDMAA. It is noted that CDMAA is not necessarily applied on the IFGSM, although this paper uses the IFGSM to introduce CDMAA. The main idea of CDMAA is to obtain a perturbation vector from gradients in multiple domains and models and then update the adversarial examples to ensure their ability of attacking multiple models and domains. Therefore, CDMAA can be applied to any gradientbased adversarial attack algorithms, such as the MIFGSM and APGD.
4. Experiment and Analysis
To verify the effectiveness of the proposed CDMAA framework, we conduct adversarial attack experiments against deepfake models and analyze the results. In
Section 4.1, we introduce the deepfake model, hyperparameters and evaluation criteria used in the experiments. In
Section 4.2, we use CDMAA to attack four models at the same time and show the result of adversarial attacks. In
Section 4.3, we conduct ablation experiments to show the impact of CDMAA components on the attack and compare with the methods used in the existing work.
4.1. Deepfake Models, HyperParameters and Evaluation Metrics
We prepared four deepfake models—StarGAN, AttGAN, STGAN and UGATIT—for the adversarial attack experiments, which are chosen in similar existing work [
12,
13,
34]. StarGAN and AttGAN adopt the officially provided pretraining models, which are training respectively in five domains—black hair, blonde hair, brown hair, gender and age—as well as in 13 domains, such as bald head and bangs on the celebA dataset. STGAN uses the model trained on the celebA dataset in five domains—bangs, glasses, beard, slightly opened mouth and pale skin—which are rare attributes in original images. We selected these domains to prevent the possibility that the STGAN output will be the same as the input when the original picture already contains the attributes of the corresponding domains; in which case, the experiment results will be affected since the model is unable to effectively forge the images even without adversarial examples [
34]. UGATIT realizes the translation of images from a single domain to another, so it can be regarded as a special case of multidomain deepfake when the total number of conditional variables
${K}_{\mathrm{U}\text{}\mathrm{GAT}\text{}\mathrm{IT}}=1$. To unify the dataset in the experiments, we used the UGATIT model to translate from celebA to the anime dataset, which is trained on official codes.
The adversarial attack algorithm uses the IFGSM, in which the hyperparameters refer to the settings in the existing work, $T=80,\epsilon =0.05\mathrm{and}a=0.01$, except where noted. The test data are $N=100$ randomly sampled pictures in the celebA dataset (ensure that the pictures used in each contrast experiment are the same).
The value of the loss function
is used to quantify the output distortion of the
nth adversarial example
${\mathit{x}}^{\left(n\right)}$ to model
${G}^{\left(j\right)}$ under the condition variable
${c}_{i}$. The following evaluation criteria [
13] are considered to evaluate the effectiveness of the adversarial attack:
where
$avg\_{L}^{\left(j\right)}$ represents the average value of the loss function of
N adversarial examples in each domain for model
${G}^{\left(j\right)}$ and
$attack\_rat{e}^{\left(j\right)}$ represents the proportion of the loss function of
N adversarial examples in each domain reaching the threshold
$\tau =0.05$ [
13] for Model
${G}^{\left(j\right)}$, i.e., the attack success rate.
4.2. CDMAA Adversarial Attack Experiment
We used CDMAA framework to attack the four deepfake models StarGAN, AttGAN, STGAN and UGATIT at the same time. The results are shown in
Table 1:
The results show that the generated adversarial examples achieve certain effects on each domain of the four deepfake models. StarGAN and UGATIT are relatively vulnerable to adversarial attacks because the average L values is much greater than the threshold and the success rate of attack is close to 100%, respectively. The success rates of attacks on AttGAN and StarGAN are relatively lower; AttGAN and StarGAN are relatively less affected by the adversarial attack.
In addition, comparing the three groups of experiments (a), (b) and (c), we see the attack effect can be improved by relaxing the limit of algorithm parameters, such as increasing $\epsilon $ or T (at the expense of a more obvious perturbation or a larger computational cost). Comparing the three groups of experiments (a), (d) and (e), we find that using a better adversarial attack algorithm (the MIFGSM is an improvement on the IFGSM and APGD is an improvement on the MIFGSM) can improve the attack effect. Both the MIFGSM and APGD perform J gradient backpropagation in each iteration, which is the same as the IFGSM. Therefore, they have the same algorithm time complexity O(T) and roughly similar computational cost. All this shows that the CDMAA framework is well compatible with adversarial attack algorithms, and the general improvement methods are also applicable to CDMAA.
Figure 4 shows the attack effectiveness of some of the adversarial examples in the above experiment:
Figure 4 shows that the difference between the adversarial example and the original image is so small that the human eye can hardly distinguish it. However, the difference between the output of the deepfake model, i.e., the distortion of the fake image, is large enough to be distinguished. Therefore, using the adversarial example instead of the original image can significantly deform the output of the deepfake model so as to effectively prevent the model from forging pictures.
4.3. Ablation/Contrast Experiments
4.3.1. CrossDomain Gradient Ablation/Contrast Experiment
To verify that the method of uniformly weighted crossdomain gradients used by CDMAA can effectively expand the generalization of adversarial examples between various domains, we carry out the contrast attack experiment, where we keep other components of CDMAA unchanged and only change the way to handle different gradients in domains: (1) Single gradient:
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{1}^{\left(j\right)}$, i.e., use the gradient of only one domain as the crossdomain gradients, without considering the generalization of the generated adversarial examples in other domains. This problem exists in most current studies [
30,
32,
33]. (2)
Iterative gradient: $\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{t\mathrm{mod}{K}_{j}}^{\left(j\right)}$, i.e., iteratively use the gradient of each domain loss function as the crossdomain gradients [
13]. (3) The
MGDA:
$\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}=\underset{i=1,\dots ,{K}_{j}}{MGDA}(\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}_{i}^{\left(j\right)})$, i.e., use the MGDA to generate crossdomain gradients. The results are shown in
Table 2:
Figure 5 shows the visual comparison of the result. Compared with existing research on adversarial attacks against deepfake, which only use single domain gradients or iterative gradients in each domain, the CDMAA framework, using the method of uniform weighting to generate crossdomain gradients, can achieve a higher attack success rate against each model, especially those with more domains, such as AttGAN, and effectively increase the generalization of adversarial attack examples between domains. The
average L of some models using the method of single gradient or iterative gradient can exceed CDMAA, which shows that the effectiveness of the adversarial examples generated by these two methods on each domain varies greatly, which is not as wellbalanced and stable as CDMAA. In addition, compared with using the MGDA to generate crossdomain gradients, the effectiveness of simply using uniform weighting is not quite different, but it can greatly reduce the time consumption (
Section 3.5). Therefore, CDMAA uses the most efficient uniform weighting method to calculate the crossdomain gradients.
4.3.2. CrossModel Perturbation Ablation/Contrast Experiment
To verify that CDMAA uses the MGDA to calculate the crossmodel perturbation vector
w, which can effectively expand the generalization of adversarial examples between various models, we carry out the contrast attack experiment, where we keep other components of CDMAA unchanged and only change the way to process each crossdomain gradient: (1) Single gradient:
$\mathit{w}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(1\right)}$, i.e., only use the crossdomain gradients of one model to update the adversarial example, which is equivalent to ignoring whether the adversarial example has the generalization ability to attack other models. (2)
Iterative gradient:
$\mathit{w}=\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(t\mathrm{mod}J\right)}$, i.e., iteratively use the crossdomain gradients of each model to update the adversarial example [
12]. (3)
Uniform weighting:
$\mathit{w}=\frac{1}{J}{\displaystyle \sum _{j=1}^{J}\mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{\left(j\right)}}$, i.e., use the mean of the crossdomain gradients of each model to update the adversarial example [
34]. The results are shown in
Table 3:
Figure 6 shows the visual comparison of the results. Although the methods of single gradient, iterative gradient and uniform weighting used in the current research can reach a high
average L value on some models, such as StarGAN, their effectiveness on models that are robust against adversarial attacks (such as AttGAN and STGAN) are very poor. In fact, it is meaningless to reach such a high
average L value: as long as the threshold
$\tau =0.05$ is exceeded, the output distortion is obvious enough and a successful adversarial attack is achieved. In contrast, the group “MGDA” can achieve a considerable attack success rate for each model. Therefore, we use the MGDA to calculate crossmodel perturbation to improve the generalization of adversarial examples across models.
4.3.3. Gradient Regularization Ablation/Contrast Experiment
To verify the effectiveness of gradient regularization used in CDMAA, we carry out the attack contrast attack experiment, where we keep other CDMAA components unchanged and only change the gradient regularization method used: (1) Without regularization:
${S}^{(j)}=1$, i.e., the regularization factor is always 1, which is equivalent to not using a regularization method. (2)
L2 regularization:
${S}^{(j)}={\Vert \mathit{g}\mathit{r}\mathit{a}{\mathit{d}}^{(j)}\Vert}_{2}$. (3) Logarithmic gradient regularization [
15]:
${S}^{(j)}={L}^{(j)}$. The results are shown in
Table 4:
Figure 7 shows the visual comparison of the result. On the metric of
attack_rate, the method of penaltybased gradient regularization is superior to other gradient regularizations. It achieves a more uniform attack effect distribution on each model by reducing the effect on the model with large loss function value in exchange for a major attack on the model with small loss function value. In the actual process of adversarial attacks, due to the large gap in the vulnerability of each model to attacks, the use of this gradient regularization method will be more practical.
5. Conclusions and Future Work
In this paper, we propose a framework of an adversarial attack against the deepfake model called CDMAA, which can expand the generalization of the generated adversarial examples in each domain of multiple models. Specifically, using CDMAA to generate adversarial examples can distort fake images, i.e., the output of multiple deepfake models under any condition variables, so as to interfere with the deepfake model and protect the pictures from model tampering. An adversarial attack experiment on four mainstream deepfake models shows that the adversarial examples generated by CDMAA have high attack success rates and can effectively attack multiple deepfake models at the same time. Through ablation experiments, on the one hand, we verify the effectiveness of each CDMAA component; on the other hand, compared with other similar research methods, we verify the superiority of CDMAA.
Since CDMAA needs to use the gradientbased adversarial attack algorithm, future work can focus on how to extend this framework to nogradientsrequired adversarial attack algorithms, such as AdvGAN [
38] or Boundary Attack [
39]. In addition, we will try to extend CDMAA to attack other data types of deepfake, such as video and voice.