Enhance Domain-Invariant Transferability of Adversarial Examples via Distance Metric Attack

: A general foundation of fooling a neural network without knowing the details (i.e., black-box attack) is the attack transferability of adversarial examples across different models. Many works have been devoted to enhancing the task-speciﬁc transferability of adversarial examples, whereas the cross-task transferability is nearly out of the research scope. In this paper, to enhance the above two types of transferability of adversarial examples, we are the ﬁrst to regard the transferability issue as a heterogeneous domain generalisation problem, which can be addressed by a general pipeline based on the domain-invariant feature extractor pre-trained on ImageNet. Speciﬁcally, we propose a distance metric attack (DMA) method that aims to increase the latent layer distance between the adversarial example and the benign example along the opposite direction guided by the cross-entropy loss. With the help of a simple loss, DMA can effectively enhance the domain-invariant transferability (for both the task-speciﬁc case and the cross-task case) of the adversarial examples. Additionally, DMA can be used to measure the robustness of the latent layers in a deep model. We empirically ﬁnd that the models with similar structures have consistent robustness at depth-similar layers, which reveals that model robustness is closely related to model structure. Extensive experiments on image classiﬁcation, object detection, and semantic segmentation demonstrate that DMA can improve the success rate of black-box attack by more than 10% on the task-speciﬁc attack and by more than 5% on cross-task attack. We show that the adversarial examples crafted by distance metric attack can fool the models on image classiﬁcation, object detection, and semantic segmentation. This indicates that distance metric attack can effectively improve the domain-invariant transferability of the adversarial examples.


Introduction
The adversarial examples are crafted by adding the maliciously subtle perturbations to the benign images, which make the deep neural networks being vulnerable [1,2]. It is possible to employ such examples to interfere with real-world applications, thus raising concerns about the safety of deep learning [3][4][5]. While most of the adversarial attacks focus on a single task, we consider that the current vision-based systems usually consist of an ensemble of multiple pipelines with each addressing a certain task, such as object detection, tracking, or classification. Hence, for such a complex vision system, an adversarial example attacking multi-task or multi-model vulnerability is desired but challenging to be designed.
Generally, adversarial attacks can be divided into the white-box and the black-box cases [6]. The white-box attacks are known as attacking with the knowledge of the structure and the parameters of the given model, such as the fast gradient sign method [2], the basic iterative method [7], and the momentum-boosting iterative method [6]. On the contrary, the black-box attacks do not know the information of the model except the model outputs, which describe a more common situation in real-world applications. The success of a black-box attack comes from either of two principles, i.e., the assumption of transferability or the feedback of queries. Hence, we could find two categories of black-box attacks, including transfer-based [8][9][10][11] and query-based [12,13]. While the latter has the problems such as poor attack effects and low query efficiency [14], in this paper, we focus on the transfer-based black-box attack, in which transferability is assumed to be an intriguing property of adversarial examples.
The assumption of transferability comes from the fact that different models are optimised based on similar distributions of training data, which means the adversarial examples generated by a given model can also fool the other unknown models. In details, transferability can be divided into the task-specific transferability and the cross-task transferability, according to the task of the victim model. Specifically, when the victim model and the given model are interested in the same task (e.g., classification), the assumed transferability is task-specific. On the other hand, when the victim model and the given model are interested in different tasks (e.g., classification vs. detection), the cross-task transferability is considered. To design the adversarial examples for multiple tasks, a natural question is: are cross-task transferability and task-specific transferability incompatible?
Regarding the task-specific transferability, it is known that the models are optimised from similar input distributions and similar label distributions, which could be viewed as in the same domain and hence, the characteristics revealed by the models are similar. Instead, the cross-task transferability can be regarded as a heterogeneous domain generalisation problem [15], where the label distributions are quite different although the input distributions are still similar. The heterogeneous domain generalisation problem is a typical problem in training neural networks. Learning the domain-invariant features has been proven as an effective way to solve the above issue [15], which could encourage good generalisation from the source domain to the unknown target domain. In this regard, when the feature extractor is aware of the underlying distribution of the source domain, the adversarial examples are the outliers of the distribution [16,17]. The question is then how to exploit the distribution of the outliers transferable across the domains. As shown in Figure 1, if the feature extractor has a good generalisation ability, the target domain and the source domain are well aligned, which helps to transfer both the benign examples and the adversarial examples. Hence, to enhance the domain-invariant transferability (i.e., both task-specific transferability and cross-task transferability) of the adversarial examples, a natural choice is to craft the adversarial examples based on a well-generalised feature extractor, e.g., pre-trained on ImageNet. As shown in Figure 2, the difference between the adversarial example and the benign image can be reflected in the feature-extraction stage and in the task-related stage of the model. While the task-specific transferability does not matter (since both stages have transferability), the cross-task transferability mostly relies on the transferability on the feature-extraction stage. However, most of the transfer-based attacks developed on image classification rely on the task-specific loss (e.g., the cross-entropy loss), which limits the cross-task transferability of the adversarial examples [18].
In this paper, we propose a novel cross-task attack method called distance metric attack (DMA) to enhance the domain-invariant transferability of the adversarial examples. Different from the normal gradient-based attacks that craft the benign input by maximising the cross-entropy loss, the goal of distance metric attack is to maximise the distance of the latent features between the adversarial example and the benign example. To ensure the basic transferability between different models, we consider the task-specific loss as the attack direction. To reasonably mitigate the effect of the task-specific loss, we use a weight factor to control the trade-off between the direction and the distance. We show that the adversarial examples crafted by distance metric attack can fool the models on image classification, object detection, and semantic segmentation. This indicates that distance metric attack can effectively improve the domain-invariant transferability of the adversarial examples.

Before Generalisation
After Generalisation The rest of the paper is arranged as follows: In Section 2, we review the related work about the adversarial attack on image classification, other vision tasks, and the cross-task case. In Section 3, we introduce the proposed distance metric attack (DMA) method. In Section 4, we present the attack results of DMA compared with multiple baselines on a variety of tasks. Finally, the conclusions are drawn in Section 5.

Related Work
In this section, we briefly review the adversarial attack methods on image classification, object detection, and semantic segmentation. Then, a brief explanation of the cross-task attack is given.

Adversarial Attacks on Image Classification
DNNs have shown vulnerability to the adversarial examples [1,2,20], which has attracted widespread attention. Many effective white-box attacks have been proposed, such as FGSM [2], BIM [7], C&W [21], DeepFool [22], and MIM [6], which rely on the details of the victim model. However, in real-world applications, the model details are often invisible. The transferability of the adversarial example motivates the black-box attacks. Inspired by data augmentation, many attacks enhance the transferability of adversarial examples through input transformations. For example, the diverse input method (DIM) [8] created diverse input patterns by applying random resizing and padding to the input at each iteration before feeding the image into the model for gradient calculation. The translation-invariant method (TIM) [9] optimised an adversarial example through an ensemble of multiple translated images and simplified the complex computation into a single convolutional operation according to the translation invariance of CNN. The scale-invariant method (SIM) [10] enhanced the transferability of adversarial examples by optimising the example with multi-scale copies which, however, yielded a huge cost of computation. In addition, the ILA method [23] aimed at attacking the latent layers, which also provided a new direction for improving the transferability of adversarial examples. Nevertheless, those methods only focused on the image classification task. The transferability of adversarial examples crafted by the above methods was based on the assumption that the victim model and the given model were trained on the same dataset. However, the real scenarios tell us that the real data are always changing and complex.

Adversarial Attacks on Other Vision Tasks
Compared with the image classification task, the adversarial examples for object detection and semantic segmentation are more challenging to be designed. Xie et al. [24] proposed DAG to generate adversarial examples for a wide range of segmentation and detection. Many adversarial patch attacks have been proposed to attack the object detection systems, such as Dpatch [25], person patch [26], and adversarial T-shirt [27]. However, those adversarial attacks require a huge cost of training time. Xiao et al. [28] characterised adversarial examples based on the spatial context information in semantic segmentation. However, the generated adversarial examples are barely transferred among models even in the same task.

Adversarial Defences
Corresponding to adversarial attack, adversarial defence has been developed vigorously in recent years. The methods that integrate the adversarial examples into the training dataset are called adversarial training [1,2], which is a promising adversarial defence scheme. Then, Tramer et al. [29] proposed the ensemble adversarial training, which generated adversarial examples by assembling multiple models. To improve adversarially robust generalisation and exploit robust local features, Song et al. [30] proposed a random block shuffle transformation, which cut up the adversarial example into blocks and then randomly combined those blocks to reassemble the example for adversarial training. However, the computational cost of adversarial training is too high, and adversarial training can only be designed for a single task.
In addition to adversarial training, mitigating the effects of adversarial perturbations is also an effective defence scheme. A set of image transformation methods were proposed by Guo et al. [31], which transformed the image before being input into the classifier. Xie et al. [32] randomly resized and padded the input image to mitigate the adversarial perturbations. However, all these defence schemes are developed for a specific single task.

Adversarial Attacks on Cross-Task
All the above adversarial attacks are designed for a single task, which limits the practicability of adversarial examples. A detection system based on computer vision (CV) techniques has been deeply applied in various security scenarios, which generally involves more than one model. Therefore, it is difficult for the above adversarial attacks for a specific task to attack the real-world CV systems successfully. Lu et al. [18] was the first to propose the cross-task attack (DR), which used the model of image classification to generate adversarial examples that could fool the models of object detection and semantic segmentation. Cross-task attack is a more challenging attack, where the source model is very different from the target models in the aspects of employed data and model structures. However, the DR attack has a low success rate in image classification. The main difference of performance between our proposed DMA and DR is that DMA can achieve a high attack success rate on image classification, object detection, and semantic segmentation. Namely, DMA can effectively enhance the domain-invariant transferability of the adversarial examples.

Notation
Let x and y be the clean image and the corresponding label, respectively. f (x, y) is the cross-entropy loss of the image classifier f (x). The adversarial example x adv is indistinguishable from the clean image x but fools the classifier, i.e., f (x adv ) = y. Following the previous work, we use the L ∞ norm to constrain the adversarial perturbation level as ||x adv − x|| ∞ ≤ . The goal of adversarial attack is to find an adversarial example x adv that maximises the loss f (x adv , y). Regarding the feature space, let the latent feature f l (x) be the l-th layer of the classifier when the input is x. The distance function D( f l (x), f l (x adv )) is used to measure the distance (e.g., L2 distance) between the latent layers of those examples. Thus, the optimisation problem in the normal gradient-based attacks can be written as: arg max

Motivation
The domain-invariant transferability of adversarial examples includes the task-specific transferability and the cross-task transferability. Recent advances of adversarial attacks focus on enhancing the task-specific transferability, where the adversarial examples crafted by the given model can also fool unknown models on the same task. The task-specific transferability of the adversarial examples is due to the given model and the unknown models being trained on the same domain. On the other hand, the cross-task transferability can be regarded as the heterogeneous domain generalisation problem, where the domains have different label spaces [15]. To address the heterogeneous domain generalisation problem, many methods [15,33,34] aim to generate a domain-invariant feature representation. In this case, the whole network is split into the feature extractor and the classifier. To match various classifiers, the feature extractor is trained to be as general as much. Fortunately, the feature extractor pre-trained on ImageNet is a general model.
As can be seen from Figure 2, the difference between the adversarial example and the original image is reflected from the difference in features, which is eventually evolved into the difference in the identification regions. Therefore, the feature extractor based on ImageNet can solve the problem of heterogeneous domain generalisation and can improve the domain-invariant transferability by expanding the distance of the latent features.

Distance Metric Attack
Based on the above analyses, by attacking the feature space of the model, the domaininvariant transferability of the adversarial examples can be enhanced. ILA [23] points out that the adversarial perturbation is constrained by the norm, but the perturbation in the latent features of the model is not constrained. So the perturbation on the latent features can be maximised to perform attack. Motivated by this, we propose distance metric attack (DMA), which aims to maximise the distance between the latent features of the benign image and the adversarial image.
The gradient-based attack algorithms imply that the direction of attack is as important as the magnitude of the perturbation. In this way, DMA can maximise the distance between the latent features of the benign images and the adversarial images. In addition, the cross-entropy loss dependent on the gradient-based attack serves as the attack direction of DMA. Therefore, we define the problem of finding an adversarial example as an optimisation problem:  To solve the problem in Equation (2), we need to calculate both the gradient of the cross-entropy loss with respect to the input x and the gradient of the distance metric loss to the input x. However, the cross-entropy loss limits the cross-task transferability of the adversarial examples. To mitigate the influence of the cross-entropy loss, we set a hyperparameter β >= 1 to flexibly increase the weight of the distance loss. Hence, Equation (2) can be written in detail as: For fair comparisons, we use the MI-FGSM as the optimisation method to craft the adversarial example, which is an efficient iterative gradient-based attack. Therefore, when β = 0, DMA degenerates to the vanilla gradient-based attack (MI-FGSM).
Specifically, the manufacture of adversarial examples in MI-FGSM be formulated as: where g t is the accumulative gradient in the t-th iteration in the attack process, and µ is a decay factor. The DMA algorithm for crafting adversarial examples iteratively is summarised in Algorithm 1, where DMA is combined with MI-FGSM.

Algorithm 1 Distance Metric Attack
Input: A deep model f and the loss function f ; the latent layer f l of the model f ; a benign example x and its ground-truth label y. Input: The maximum perturbation , the number of iteration T, the decay factor µ, and the distance weight β. Output: An adversarial example x adv .
1: α = /T, g 0 = 0, x adv 0 = x; 2: for t = 0 → T − 1 do 3: Get the latent feature f l (x) of the model by inputting x; Obtain the latent feature f l (x adv t ) of the model by inputting x adv t ; 4: Calculate the distance between the latent features D Get the softmax cross-entropy loss f (x adv t , y).

7:
Calculate the gradient ∇ x L(x adv t , x, y).

9:
Update x adv t+1 by x adv t+1 = x adv t + α · sign(g t+1 ) 10: end for 11: return x adv T ; Note that DMA generates adversarial examples based on a highly generalised image classification model, expecting that the adversarial example can fool models that are not only image classification models but also object detection and semantic segmentation models. However, for the image on object detection and semantic segmentation, there are no ground-truth labels for the source model. Before the craft adversarial example, DMA would give the image an alternative label in the source model labels by y = f (x). Then, feeding the original image to Algorithm 1, we get the adversarial examples. At the end, we input the adversarial examples into the target models to get the attack results.

Datasets
In experiments, we evaluate the performance of the proposed method on cross-domain tasks, including image classification, detection, and segmentation. For the image classification task, we randomly choose 1000 images from the ILSVRC 2012 validation set, which are almost correctly classified by all the image classification victim models. For object detection and semantic segmentation, we randomly select 1000 images from the COCO2017 and PASCAL VOC2012 datasets, respectively. All images are resized to the size of 3 × 299 × 299.

Hyper-Parameters
We consider MI-FGSM [6], DIM [8], TIM, DI-TIM [9], ILA [23], and DR attack [18] as the baselines. For the settings of hyper-parameters, we set the maximum perturbation to be = 16 in the pixel range of [0, 255]. All the baselines are iterative attack, where we set the iteration as T = 10 and the step size as α = 1.6. For MI-FGSM, DIM, TIM, and DI-TIM, we set the decay factor as µ = 1.0. For DIM and DI-TIM, we set the transformation probability to 0.7. For TIM and DI-TIM, the kernel size is set to 7 × 7.

The Effect of the Distance Loss and the Factor β
To further gain insight into the performance of DMA, we conduct the ablation studies to examine the effect of various factors. We attack the Inc-v3 model by DMA with four distance losses and different factor values β, which range from 0 to 200. Note that when β = 0, DMA degenerates to MI-FGSM. As shown in Figure 4, we observe that the MSE loss, the L1 loss, and the cosine distance loss can improve the transferability of the adversarial examples compared with MI-FGSM.
It can be found from Figure 4 that in the cases of the MSE loss and the L1 loss, the transferability of the adversarial examples increases with the factor β, which indicates that the task-specific loss not only limited the transferability on cross-tasks but also on cross-models. When the distance loss is the MSE loss and the factor β is 200, DMA exhibits the best transferability on all models. Specifically, on the first picture in Figure 4, it can be seen that as β increases, the success rate gradually increases. When β = 200, the increase in the attack success rate gradually becomes flat. Therefore, we adopt the MSE loss and the factor β = 200 in the following experiments.

The Performance on Attacking Different Layers
To evaluate the robustness of the latent layers in various networks, we compare the transferability of the adversarial examples crafted by DMA on different layers in four normally trained models. As Figure 5 reports, for the models with similar structures, the robustness on the same layer is consistent. This indicates that the robustness of the model is related to the model structure. Interestingly, Inception-Resnet-v2 performs similarly to ResNet-v2 on the black-box attack, with the first few layers of attacks working well. For the white-box attack, layer 6 and layer 7 work better, which is similar to the Inception series.
In the following experiments, for Inc-v3 and Inc-v4, we select the sixth layer as the latent layer. For Res-101, we adopt the third layer as the latent layer. Since the success rate of the previous layers of Inc-Res is insufficient in the white-box setting, we use the sixth layer as the latent layer.

Adversarial Attack on Image Classification
In this section, we present the attack results on the image classification task. To verify the effectiveness of DMA, we use MI-FGSM, TI-DIM, and ILA (where the proxy is crafted by MI-FGSM and TI-DIM, respectively) as the competitors. For cross-task attack, we first evaluate the task-specific transferability of DMA and DR. We report the success rates of MI-FGSM, ILA (where the proxy is crafted by MI-FGSM), and DMA in Table 1, the success rates of TI-DIM, ILA (proxy crafted by TI-DIM), and DMA in Table 2, and the success rates of DR and DMA in Table 3. As shown in Table 1, we observe that the proposed DMA outperforms the baseline attacks in most cases. Compared with the baselines, DMA can significantly improve the task-specific transferability of the adversarial examples by 2-25%. For IncRes-v2, compared with MI-FGSM, the transferability of the adversarial examples generated by DMA is also enhanced.
TI-DIM is one of the best gradient-based adversarial attack methods, which can also be combined with DMA to generate adversarial examples. From Table 2, it can be found that ILA cannot consistently improve the transferability of adversarial examples, while the proxy examples are generated by TI-DIM. However, TI-DI-DMA outperforms TI-DIM by 2% to 15% in most cases. In particular, in the black-box manner, the adversarial examples crafted by TI-DI-DMA achieve the success rate of more than 60% on the normally trained models, with some cases even reaching 80%. The cross-task attacks should not only focus on other tasks but also the current task. Therefore, we also present the DR attack result on image classification. As shown in Table 3, we observe that the adversarial examples crafted by DR yields a low success rate on both white-box and black-box attacks. However, the attack success rate of DMA is higher than DR by a large margin on all models. Note that the empirical result of DR is deeply inferior compared with other adversarial attacks which focus on image classification. DR only focuses on enhancing the cross-task transferability, which implies that the practicality of adversarial examples is questionable.

Cross-Task Attack on Object Detection
We next evaluate the cross-task transferability of the adversarial examples generated by DR, MI-FGSM, TI-DIM, and the proposed DMA in the object detection task. For crosstask attack, all the adversarial examples come from the COCO dataset and are crafted by the models trained on the ImageNet dataset, including Inc-v3, Inc-v4, IncRes-v2, and Res-101. The label y required by MI-FGSM, TI-DIM, and DMA is obtained by inferring the image classification model on the original COCO image, which corresponds to one of the ImagNet labels.
The results of cross-task attack on object detection are presented in Table 4, which shows that compared with MI-FGSM, the attack success rate of the adversarial examples crafted by DMA with different models can lead to an improvement of 0.4-9% in all the object detection models. Except for IncRes-v2, DMA outperforms DR by 2-5%. Although TI-DIM can effectively enhance the specific-task transferability of adversarial examples, it cannot greatly improve the cross-task transferability of adversarial examples. Therefore, DMA is the first attack method that focuses on the domain-invariant transferability of adversarial examples.

Cross-Task Attack on Semantic Segmentation
In this section, we further investigate the cross-task transferability of the adversarial examples generated by DR, MI-FGSM, TI-DIM, and the proposed DMA in the semantic segmentation task. All the adversarial examples are selected from the PASCAL VOC2012 dataset and are crafted by the models trained on the ImageNet dataset, including Inc-v3, Inc-v4, IncRes-v2, and Res-101. Similar to the object detection task, the label y required by MI-FGSM TI-DIM, and DMA is obtained by inferring the image classification model on the original image. The evaluation metric for the semantic segmentation is mIoU, where a lower value indicates a better attack effect.
From Table 5, we observe that DMA effectively reduces mIoU compared to MI-FGSM by 4-19% on the five semantic segmentation networks. With the exception of IncRes-v2, DMA reduces mIoU by 5-13% compared to DR and 2-20% compared to TI-DIM. In addition, among the four source models, the adversarial examples crafted by Resnet-V2 can highly reduce the mIOU of the semantic segmentation model. Meanwhile, four semantic segmentation models are based on ResNet, which indicates that the more similar the source model is to the target structure, the higher the attack success rate is.

Discussions
The image classification model predicts a classification score on the whole image, while the object detection and semantic segmentation models focus on the localisation and classification of the objects in the image. Hence, it is undoubtedly difficult to attack the object detection and semantic segmentation models using the adversarial examples generated by the image classification model. However, as we describe in Section 3.2, the domain-invariant features facilitate the attack by our model. Figure 6 shows the sample results of object detection and semantic segmentation models. We are surprising to find that compared to other methods, DMA can interfere with the results of the model by adding semantics and objects. The addition of semantics and objects, in a real CV system, is enough to create a barrier to recognition. However, the victim models still have a strong ability to detect the original semantics and objects, which is a limitation of DMA. Future work can revolve around how to reduce the detection of benign semantics and object by models.

Conclusions
In this paper, we extend the transferability of adversarial examples to the domaininvariant transferability (both the task-specific transferability and the cross-task transferability) of adversarial examples. Relying on the well-generalised features pre-trained on ImageNet, we propose the distance metric attack (DMA) method, which maximises the distance of the latent features between the adversarial example and the benign example.
The adversarial examples crafted by DMA are highly transferable to various models on different tasks. Extensive experiments on image classification, object detection, and semantic segmentation indicate that the model robustness is highly related to the model structure.
In addition, it is demonstrated that DMA can improve the success rate of black-box attack by more than 10% on specific-tasks and by more than 5% on cross-tasks compared with the state-of-the-art competitors.