Next Article in Journal
A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms
Next Article in Special Issue
Improved Low-Depth SHA3 Quantum Circuit for Fault-Tolerant Quantum Computers
Previous Article in Journal
Numerical and Experimental Determination of the Wind Speed Value Causing Catastrophe of the Scissor Lift
Previous Article in Special Issue
MalwD&C: A Quick and Accurate Machine Learning-Based Approach for Malware Detection and Categorization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive Learning

1
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
2
Wenzhou Institute, Shanghai University, Wenzhou 325088, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(6), 3531; https://doi.org/10.3390/app13063531
Submission received: 10 February 2023 / Revised: 2 March 2023 / Accepted: 7 March 2023 / Published: 9 March 2023
(This article belongs to the Special Issue Advanced Technologies in Data and Information Security II)

Abstract

:
As a self-supervised learning paradigm, contrastive learning has been widely used to pre-train a powerful encoder as an effective feature extractor for various downstream tasks. This process requires numerous unlabeled training data and computational resources, which makes the pre-trained encoder become the valuable intellectual property of the owner. However, the lack of a priori knowledge of downstream tasks makes it non-trivial to protect the intellectual property of the pre-trained encoder by applying conventional watermarking methods. To deal with this problem, in this paper, we introduce AWEncoder, an adversarial method for watermarking the pre-trained encoder in contrastive learning. First, as an adversarial perturbation, the watermark is generated by enforcing the training samples to be marked to deviate respective location and surround a randomly selected key image in the embedding space. Then, the watermark is embedded into the pre-trained encoder by further optimizing a joint loss function. As a result, the watermarked encoder not only performs very well for downstream tasks, but also enables us to verify its ownership by analyzing the discrepancy of output provided using the encoder as the backbone under both white-box and black-box conditions. Extensive experiments demonstrate that the proposed work enjoys quite good effectiveness and robustness on different contrastive learning algorithms and downstream tasks, which has verified the superiority and applicability of the proposed work.

1. Introduction

The rapid development of deep learning (DL) has benefited a lot from a large number of diverse labeled datasets, which, however, simultaneously has become a factor hindering the further development of DL. It is due to the reason that the cost of collecting sufficient high-quality datasets with correct labels is prohibitively expensive. Fortunately, the emergence of self-supervised learning (SSL) [1] to train a powerful encoder with the unlabeled samples can effectively overcome the above obstacle. Contrastive learning [2,3], as one mainstream self-supervised learning techniques, has achieved great success in various tasks. For example, SimCLR [4] and MoCo [5] have already enabled the SSL encoders to outperform traditional supervised learning-based encoders in several downstream tasks. However, training the encoders still requires lots of unlabeled data and consumes a great deal of computing resources. Many malicious users are likely to steal the pre-trained encoders to obtain illegal income and even bring serious security risks [6,7]. Therefore, we need to find reliable solutions to protect the intellectual property of the pre-trained encoders.
As a major technique to protect the intellectual property of DL models, model watermarking [8,9,10,11] has recently been widely studied. It can be roughly divided into two categories: white-box watermarking and black-box watermarking. The former [12,13] embeds a watermark into the internal parameters [14], feature maps [15] or structures [16] of a DL model, which needs white-box access to the watermarked model. The latter [17] usually uses backdooring [18,19] or similar techniques, such as adversarial attack [20], to mark a model. The ownership can be verified by analyzing the classification results of the marked model in correspondence to a set of carefully crafted samples. Compared with white-box watermarking, black-box watermarking is more desirable for applications, since in most cases, the model owner has no access to the internal details of the target model. However, for protecting the aforementioned encoders, the lack of a priori knowledge of downstream tasks makes it quite difficult to craft the special samples for black-box watermarking. It indicates that we cannot simply extend the existing black-box watermarking schemes to the encoders. As a result, we urgently need to develop novel watermarking schemes specifically for the pre-trained encoders.
In this paper, we introduce AWEncoder, a copyright protection method for the pre-trained encoders in contrastive learning via an adversarial watermark. In the proposed method, through optimizing an adversarial perturbation [21], we determine such a perturbation according to the pre-trained encoder that it can cluster the perturbed training samples around a randomly selected key image in the embedding space and can also be used as the secret watermark. The watermark is then embedded into the pre-trained encoder via contrastive learning by further optimizing a joint loss involving watermark embedding. In this way, even if the downstream task of the watermarked encoder is unknown to us, we can still verify its ownership under the black-box condition. Experimental results demonstrate that the ownership can be reliably verified under both white-box and black-box conditions, which is quite helpful for applications.
In summary, the main contributions of this paper include:
  • We propose a novel adversarial watermarking strategy for the pre-trained encoders in the embedding space, which is more effective for watermark verification compared with previous black-box watermarking methods.
  • Unlike conventional watermarking methods assuming that the downstream task of the watermarked model is the same as the original one, the proposed method does not require a priori knowledge of the downstream task. As a result, the proposed method enables us to verify the ownership under white-box and stricter black-box settings, which is more applicable to practice compared with previous ones.
  • Extensive experimental results demonstrate that the proposed method has the satisfactory ability to verify the ownership of the target model and resist common watermark removal attacks, such as model fine-tuning [22] and model pruning [23], which has good application prospects.
The rest of the structure is organized as follows. We first provide preliminaries in Section 2, followed by the proposed method in Section 3. Experimental results and analysis are provided in Section 4. Finally, we conclude this paper in Section 5.

2. Preliminaries

2.1. Contrastive Learning

Contrastive learning is one of the most popular SSL techniques. SSL learns from unlabeled data, which can be regarded as an intermediate form between supervised and unsupervised learning. In this paper, we watermark a pre-trained encoder by contrastive learning. To this end, we consider two representative contrastive learning algorithms, SimCLR [4] and MoCo v2 [3]. We briefly describe them in the following. It is pointed out that the proposed method is not subject to these two algorithms.

2.1.1. SimCLR [4]

SimCLR aims to learn representations by maximizing agreement between differently augmented views of the same sample via a contrastive loss function in the latent space. Briefly, SimCLR consists of three modules that are data augmentation, feature encoder f and projection head g. Data augmentation transforms a sample x X into two augmented views x i and x j treated as a positive pair. The feature encoder f extracts representation vectors from augmented examples, for example, h i = f ( x i ) for the augmented sample x i . The projection head g maps representations to the space where the contrastive loss is applied, for example, z i = g ( h i ) = g ( f ( x i ) ) . With N > 0 samples in a mini-batch, we are able to collect 2 N augmented samples. Two augmented samples determined from the same sample constitute a positive pair. Otherwise, it is treated as a negative pair. Let sim ( u , v ) = u T v / | | u | | | | v | | represent the dot product between 2 normalized u and v (that is, cosine similarity). The loss function for a positive pair ( z i , z j ) and a negative pair ( z i , z k ) is then defined as [4]:
L i , j = log exp sim z i , z j / τ k = 1 2 N I [ k i ] · exp sim z i , z k / τ ,
where τ denotes a temperature parameter and I [ k i ] { 0 , 1 } is an indicator function equal to 1 if, and only if k i .

2.1.2. MoCo v2 [3]

As an improved version of MoCo, MoCo v2 consists of a query encoder f q ( x ; θ q ) and a momentum key encoder f k ( x ; θ k ) . The better performance of MoCo v2 is attributed to a large dictionary K = { k 1 , k 2 , . . . , k | K | } , which are the feature vectors of previous batches as the negative samples and are continuously updating. A batch of N samples will be encoded by f q as feature vectors and simultaneously encoded by f k as different augmentations. Suppose that there is a single key k + in the dictionary that an encoded query q matches, where the contrastive loss function is defined as [3]:
L = log exp q T k + / τ i = 1 | K | exp q T k i / τ .
By minimizing the above contrastive loss function, the parameters of f q , that is, denoted by θ q , are updated by back-propagation, but the parameters of f k , that is, θ k , are updated by:
θ k λ θ k + ( 1 λ ) θ q ,
where λ [ 0 , 1 ) is a momentum coefficient.

2.2. Problem and Threats

Mainstream black-box watermarking methods mainly focus on classification-based models such as [24,25]. The zero-bit watermark is often embedded into the model by enforcing the model to learn the mapping relationship between the carefully crafted samples and the pre-determined labels. However, with the rise of contrastive learning, the pre-trained encoders are treated as feature extractors for various downstream tasks, and the pre-training process is relying on the SSL strategy rather than label-based supervised learning [26,27]. It indicates that traditional black-box watermarking techniques are not suitable for the pre-trained encoders in contrastive learning.
On the other hand, from the viewpoint of the adversary, he steals the encoder through an unauthorized method and adds new layers to build the specific downstream task. The adversary may not train the encoder from scratch, but he is able to modify the encoder, such as fine-tuning and pruning. In this case, even the owner of the encoder does not know what datasets and tasks will be used downstream. As a result, when watermarking an encoder, it is very necessary for the embedded watermark to be transferable and robust. Moreover, it is quite desirable that the ownership of the encoder can be verified under both the white-box condition and black-box condition. This has motivated the authors to propose a novel adversarial watermarking method.

3. Proposed Method

Figure 1 shows the general framework of the proposed method. In the following, we provide the technical details.

3.1. Watermark Generation

The first step is to generate an adversarial perturbation w adv as the watermark using the pre-trained encoder E θ . Since our task is encoder-oriented, the traditional perturbation optimization strategy based on classification labels is inapplicable. To deal with this problem, we use the embedding of a randomly selected image x tar (called key image) extracted by E θ , that is, E θ ( x tar ) , to generate w adv with a clean dataset D . This enables the embedding of a perturbed image denoted by E θ ( x i + w adv ) (where x i D , i { 1 , 2 , . . . , | D | } ) cluster around E θ ( x tar ) . In other words, the distance between E θ ( x tar ) and E θ ( x i + w adv ) is expected to be as low as possible. To achieve this goal, we minimize the following loss during perturbation optimization:
L adv = E x i D 1 sim ( E θ ( x i + w adv ) , E θ ( x tar ) ) .
By back-propagation, w adv can be generated without changing the parameters of E θ . It is obvious that different x tar results in different w adv . By using x tar as a key, it is difficult for the adversary to forge the watermark. Some watermark generation methods by linearly superimposed fixed patterns [28] may also perturb the embedding of an image, but their effect of changing the behavior of the encoder is weaker than our method.

3.2. Watermark Embedding

After generating w adv , the next step is to embed w adv into the pre-trained E θ . It is realized by further training E θ according to a combined loss L comb , which consists of two components, that is, the contrastive loss L con and the watermarking loss L wat . In other words, we have L comb = L con + α L wat , where α is a parameter balancing the loss and we use α = 40 by default. On one hand, L con has already been defined in Section 2, referring to Equations (1) and (2). Notice that L con needs to be adjusted when other contrastive learning algorithms are applied. On the other hand, L wat is defined as the Kullback–Leibler (KL) divergence [29] between the adversarial embedding and the non-adversarial embedding processed with the softmax function σ , that is,
L wat = E x i D KL ( σ ( E θ ( x i ) ) , σ ( E θ ( x i + w adv ) ) ) ,
where x i is sampled from the augmented dataset D . Thus, E θ can be watermarked by updating its parameters during training. We will use E θ + to represent the marked version of E θ .

3.3. Watermark Verification

When the defender verifies whether the suspicious encoder infringes the intellectual property of the watermarked encoder, white-box scenario and black-box scenario can be considered. In the white-box scenario, the defender has the access to the target encoder E θ E θ + . Namely, the defender can directly obtain the output of E θ for watermark verification. We use the average Jensen–Shannon (JS) divergence between a set of clean images and the corresponding adversarial images for similarity analysis:
T sim = 1 1 | D | i = 1 | D | JS ( σ ( E θ ( x i ) ) , σ ( E θ ( x i + w adv ) ) ) ,
where D is the clean dataset used for watermark verification. The watermark will be successfully verified if T sim is less than a threshold t s , namely, we hope to keep T sim as high as possible. In our experiments, | D | is set to 1000 by default.
In a black-box scenario, given a suspicious downstream model M , the defender wants to verify whether M is developed from E θ + . The defender should build a clean dataset D * related to the downstream task. Then, the classification performance for the downstream task is analyzed by:
T cls = 1 1 | D * | i = 1 | D * | I [ M ( x i * ) M ( x i * + w adv ) ] ,
where function I returns 1 when the inside condition is true, or 0 otherwise. The watermark will be successfully verified if T cls is less than a threshold t c , namely, we hope to keep T cls as high as possible. In our experiments, | D * | is set to 1000 by default.
Remark 1.
As with many existing adversarial attacking methods, w a d v is generated by using back-propagation. The strength of w a d v is constrained by a threshold ϵ. To ensure that w a d v ϵ , w a d v is projected on the norm-ball around x i with radius ϵ, we set -norm and ϵ = 15 by default. With this perturbation strength, it can effectively cluster the perturbed images around the key image x t a r in the embedding space. The implementation can be found in the released source code.

4. Experimental Results and Analysis

In our experiments, two benchmark datasets CIFAR-10 [30] and ImageNet [31] are used to pre-train the encoder. Additionally, three benchmark datasets STL-10 [32], GTSRB [33] and ImageNet are used for the downstream task of the encoder. For ImageNet, we randomly select 30 semantic categories out for training and another 10 categories different from training out for testing. For the watermark generation phase, the pre-trained encoders are ResNet-18 and ResNet-50 [34] as the base model for SimCLR and MoCo v2 and the strength of the adversarial watermark is set as ϵ = 15. For the watermark embedding phase, we set the batch size to 50 for SimCLR and 32 for MoCo v2. Meanwhile, the total number of epochs is set to 50 and the learning rate is set to 0.003. Additionally, the key image is randomly selected from ImageNet, which belongs to the semantic category “Plane” (see Figure 1) and this image does not appear in the training set. We conduct a simulation with PyTorch, accelerated by a single RTX 3080 GPU. To validate and reproduce our experiments, a code is released via https://github.com/fc88zhang/AWEncoder (accessed on 18 July 2022).

4.1. Effectiveness

We first provide examples for the adversarial image in Figure 1 Phase III. As shown in Figure 2, due to the diversity, different images result in different degradation of the adversarial image, but the visual quality is all satisfactory, verifying the applicability of adversarial perturbation. It is admitted that one may use other perturbation strategies, which is not our main focus.
Then, we evaluate the effectiveness of the watermarked encoder in both black-box and white-box scenarios. The encoders are pre-trained on two kinds of training datasets and contrastive learning algorithms. We compare WPE [26] and the proposed AWEncoder in three downstream tasks with black-box access. The results are shown in Table 1, where “CE” is short for clean encoder and “WE” is short for watermarked encoder. The fifth column shows the classification accuracy on the downstream task before and after watermarking. The last column gives T cls before/after watermarking. It is inferred that although there is a slight decrease in classification accuracy for the downstream task after watermarking, by assessing the difference between T cls for CE and WE, AWEncoder is more discriminative.
In the white-box scenario, we verify the ownership by contrasting the embedding similarity of clean images and watermarked images. Table 2 presents how the similarity score of WE is much higher than that of CE, which indicates that the similarity score can be used for reliable verification. When the watermark is generated with an incorrect key image (“dog” in Figure 1 is used in our experiment), T sim is much lower than the correct one. It indicates that AWEncoder has high security.
In addition, the selection of key images is also an important factor for the effectiveness of watermarking. In the experiment, we randomly select an image from the semantic category of “Plane” as the key image. In order to avoid the influence caused by the randomness of the key image, we randomly select several images from the “Plane”, and further select images from other semantic categories (such as “Cat” and “Dog”) as the key images to test the effectiveness of the watermark. The experimental results are shown in Table 3. Selecting different key images has little influence on the effectiveness of watermarking, so in our experiment, we randomly select Plane 1 in the category of “Plane” as the key image. We provide examples of different key images used in our experiments, as shown in Figure 3.

4.2. Uniqueness

To evaluate uniqueness, we generate forged watermarks by different methods including replacing Plane with Dog as the key image (see Figure 1), changing the value of ϵ , and replacing the pre-trained encoder based on ResNet-18 with a surrogate pre-trained encoder based on ResNet-50 [34]. Table 4 shows the results, in which the similarity score and the classification score of the correct watermark are much higher than that of the incorrect one. It indicates that the proposed method provides superior performance on watermarking uniqueness.

4.3. Robustness

In practice, adversaries may perform removal attacks, such as fine-tuning and pruning, to erase the watermark. To quantify the robustness of AWEncoder, we consider two common removal methods, that is, fine-tuning all layers (FTAL) and retraining all layers (RTAL). FTAL fine-tunes the entire encoder with the training dataset and RTAL retrains the entire encoder with the downstream training dataset. In addition, we remove the parameters with the smallest L1-norms to prune the encoder. We also verify the robustness of AWEncoder in both white-box and black-box settings. The results in Table 5 and Table 6 demonstrate that in the white-box setting, pruning has a small effect on the watermarked encoder. Although fine-tuning increases T sim to a certain extent, AWEncoder can still effectively verify the ownership by applying a suitable threshold. In brief summary, AWEncoder is capable of resisting common attacks.
For the black-box scenario, we compare AWEncoder with WPE by applying different attacks. The results are shown in Table 7 and Table 8. It can be easily inferred that both pruning and fine-tuning will compromise the watermarking performance; however, AWEncoder is more robust than WPE against the removal attacks, which verifies the superiority of AWEncoder.

5. Conclusions

In this paper, we proposed AWEncoder, an effective copyright protection technique for the pre-trained encoder in contrastive learning, which cannot only be applied in both white-box and black-box scenarios, but also transferred to several downstream tasks. Compared to the existing encoder watermarking, AWEncoder significantly improves the effectiveness and robustness. In the future, we will extend the proposed method to federated learning and ensemble learning, which are another two popular learning strategies widely applied in deep learning.

Author Contributions

Conceptualization, T.Z.; methodology, T.Z., G.S. and H.W.; software, T.Z. and G.S.; validation, G.S. and X.L.; supervision, H.W.; project administration, G.H. and H.W.; funding acquisition, H.W., X.L. and G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC) under Grant No. 61902235, the Scientific and Technological Innovation Plan of Shanghai STC, No. 21511102605 Wuxi Municipal Health Commission Translational Medicine Research Project, No. ZH202102.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jing, L.; Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4037–4058. [Google Scholar] [CrossRef] [PubMed]
  2. Grill, J.-B.; Strub, F.; Altch, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent: A new approach to self-supervised learning. Proc. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
  3. Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
  4. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
  5. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar]
  6. Orekondy, T.; Schiele, B.; Fritz, M. Knockoff Nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4949–4958. [Google Scholar]
  7. Chandrasekaran, V.; Chaudhuri, K.; Giacomelli, I.; Jha, S.; Yan, S. Exploring connections between active learning and model extraction. In Proceedings of the 29th USENIX Conference on Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 1309–1326. [Google Scholar]
  8. Fan, L.; Ng, K.W.; Chan, C.S. Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks. Proc. Neural Inf. Process. Syst. 2019, 32, 4714–4723. [Google Scholar]
  9. Zhang, J.; Chen, D.; Liao, J.; Fang, H.; Zhang, W.; Zhou, W.; Cui, H.; Yu, N. Model watermarking for image processing networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12805–12812. [Google Scholar]
  10. Chen, J.; Wang, J.; Peng, T.; Sun, Y.; Cheng, P.; Ji, S.; Ma, X.; Li, B.; Song, D. Copy, Right? A testing framework for copyright protection of deep learning models. arXiv 2021, arXiv:2112.05588. [Google Scholar]
  11. Wu, H.; Liu, G.; Yao, Y.; Zhang, X. Watermarking neural networks with watermarked images. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 2591–2601. [Google Scholar] [CrossRef]
  12. Uchida, Y.; Nagai, Y.; Sakazawa, S.; Satoh, S. Embedding watermarks into deep neural networks. In Proceedings of the ACM International Conference on Multimedia Retrieval, Bucharest, Romania, 6–9 June 2017; pp. 269–277. [Google Scholar]
  13. Namba, R.; Sakuma, J. Robust watermarking of neural network with exponential weighting. In Proceedings of the ACM Asia Conference on Computer and Communications Security, Auckland, New Zealand, 9–12 July 2019; pp. 228–240. [Google Scholar]
  14. Wang, J.; Wu, H.; Zhang, X.; Yao, Y. Watermarking in deep neural networks via error back-propagation. In Proceedings of the IS&T Electronic Imaging, Media Watermarking, Security and Forensics, Burlingame, CA, USA, 26–30 January2020; pp. 1–8. [Google Scholar]
  15. Rouhani, B.D.; Chen, H.; Koushanfar, F. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 485–497. [Google Scholar]
  16. Zhao, X.; Yao, Y.; Wu, H.; Zhang, X. Structural watermarking to deep neural networks via network channel pruning. In Proceedings of the IEEE Workshop on Information Forensics and Security, Montpellier, France, 7–10 December 2021; pp. 1–6. [Google Scholar]
  17. Adi, Y.; Baum, C.; Cisse, M.; Pinkas, B.; Keshet, J. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In Proceedings of the 27th USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018; pp. 1615–1631. [Google Scholar]
  18. Gu, T.; Liu, K.; Dolan-Gavitt, B.; Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 2019, 7, 47230–47244. [Google Scholar] [CrossRef]
  19. Jia, J.; Liu, Y.; Gong, N.Z. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. arXiv 2021, arXiv:2108.00352. [Google Scholar]
  20. Merrer, E.L.; Perez, P.; Trédan, G. Adversarial frontier stitching for remote neural network watermarking. Neural Comput. Appl. 2020, 32, 9233–9244. [Google Scholar] [CrossRef] [Green Version]
  21. Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1765–1773. [Google Scholar]
  22. Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Crete, Greece, 10–12 September 2018; pp. 273–294. [Google Scholar]
  23. Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient convnets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
  24. Jia, H.; Choquette-Choo, C.A.; Chandrasekaran, V.; Papernot, N. Entangled watermarks as a defense against model extraction. In Proceedings of the 30th USENIX Security Symposium, Virtual, 11–13 August 2021; pp. 1937–1954. [Google Scholar]
  25. Zhang, J.; Gu, Z.; Jang, J.; Wu, H.; Stoecklin, M.P.; Huang, H.; Molloy, I. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the ACM Asia Conference on Computer and Communications Security, Melbourne, VI, Australia, 10–14 July 2018; pp. 159–172. [Google Scholar]
  26. Wu, Y.; Qiu, H.; Zhang, T.; Li, J.; Qiu, M. Watermarking pre-trained encoders in contrastive learning. arXiv 2022, arXiv:2201.08217. [Google Scholar]
  27. Cong, T.; He, X.; Zhang, Y. SSLGuard: A watermarking scheme for self-supervised learning pre-trained encoders. arXiv 2022, arXiv:2201.11692. [Google Scholar]
  28. Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
  29. Zhang, H.; Yu, Y.; Jiao, J.; Xing, E.; Ghaoui, L.E.; Jordan, M. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7472–7482. [Google Scholar]
  30. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Personal communication; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
  31. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
  32. Coates, A.; Ng, A.; Lee, H. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 June 2011; pp. 215–223. [Google Scholar]
  33. Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. The German traffic sign recognition benchmark: A multi-class classification competition. In Proceedings of the International Joint Conference Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 1453–1460. [Google Scholar]
  34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Figure 1. General framework for the proposed method. We use the same dataset for Phase I, Phase II and Phase III (a) so that the watermark can be successfully embedded into E θ while avoiding degrading the encoding performance of the encoder. In Phase II, each clean image is augmented into two images, from which only one randomly selected image was used for generating the corresponding adversarial image that will be used for watermark embedding.
Figure 1. General framework for the proposed method. We use the same dataset for Phase I, Phase II and Phase III (a) so that the watermark can be successfully embedded into E θ while avoiding degrading the encoding performance of the encoder. In Phase II, each clean image is augmented into two images, from which only one randomly selected image was used for generating the corresponding adversarial image that will be used for watermark embedding.
Applsci 13 03531 g001
Figure 2. Some examples for the adversarial image: (a,d,g) are clean images randomly selected from GTSRB, ImageNet, and STL-10, respectively; (b,e,h) are adversarial with SimCLR; (c,f,i) are adversarial with MoCo v2.
Figure 2. Some examples for the adversarial image: (a,d,g) are clean images randomly selected from GTSRB, ImageNet, and STL-10, respectively; (b,e,h) are adversarial with SimCLR; (c,f,i) are adversarial with MoCo v2.
Applsci 13 03531 g002
Figure 3. Some examples of the key images: (a,c,d) are different key images randomly selected in “Plane” and (b,e) are the key images selected in “Cat” and “Dog”.
Figure 3. Some examples of the key images: (a,c,d) are different key images randomly selected in “Plane” and (b,e) are the key images selected in “Cat” and “Dog”.
Applsci 13 03531 g003
Table 1. Effectiveness evaluation under the black-box condition.
Table 1. Effectiveness evaluation under the black-box condition.
Pre-Trained
Dataset
EncoderDownstream
Dataset
MethodAccuracy
CE/WE
T cls
CE/WE/|CE-WE|
ImageNetSimCLR
(ResNet-18)
ImageNetWPE74.5%73.4%25.5%86.4%60.9%
AWEncoder74.5%71.1%30.5%96.1%65.6%
GTSRBWPE77.8%77.5%31.7%90.4%58.7%
AWEncoder77.8%75.5%16.8%88.9%72.1%
STL-10WPE64.7%64.1%27.3%91.6%64.3%
AWEncoder64.7%61.2%13.7%93.9%80.2%
CIFAR-10MoCo v2
(ResNet-50)
ImageNetWPE73.0%72.9%19.3%80.1%60.8%
AWEncoder73.0%70.5%30.9%94.9%64.0%
GTSRBWPE84.5%83.7%14.9%84.2%69.3%
AWEncoder84.5%84.0%12.8%90.2%77.4%
STL-10WPE70.5%68.9%11.5%80.9%69.4%
AWEncoder70.5%68.2%17.6%92.7%75.1%
Table 2. Effectiveness evaluation under the white-box condition.
Table 2. Effectiveness evaluation under the white-box condition.
Model T sim
Correct WatermarkIncorrect Watermark
SimCLRCE0.140.11
WE0.890.27
MoCo v2CE0.260.25
WE0.980.33
Table 3. Effectiveness valuation of different images as the key image.
Table 3. Effectiveness valuation of different images as the key image.
Downstream
Dataset
SettingSimCLR
WE
MoCo v2
WE
T cls T sim T cls T sim
GTSRBPlane 188.9%0.8990.2%0.98
Plane 287.8%0.8989.6%0.97
Plane 387.7%0.8890.0%0.98
Dog87.4%0.8689.5%0.97
Cat88.0%0.8890.1%0.98
Table 4. Uniqueness evaluation due to different settings.
Table 4. Uniqueness evaluation due to different settings.
Downstream
Dataset
EncoderSettingSimCLR ( ImageNet ) MoCo v2 ( CIFAR-10 ) 
T cls T sim T cls T sim
GTSRBPre-trained
encoder
Plane, ϵ = 1588.9%0.8990.2%0.98
Plane, ϵ = 2019.4%0.1724.6%0.25
Dog, ϵ = 1529.4%0.2230.6%0.27
Surrogate
encoder
Plane, ϵ = 1533.3%0.2737.1%0.31
Table 5. Robustness against pruning (under white-box condition).
Table 5. Robustness against pruning (under white-box condition).
Pruning Ratio T sim
SimCLR
CE/WE
MoCo v2
CE/WE
-0.140.890.260.98
0.20.160.880.280.96
0.40.200.840.290.92
0.60.250.760.330.85
0.80.280.700.350.80
Table 6. Robustness against fine-tuning (under white-box condition).
Table 6. Robustness against fine-tuning (under white-box condition).
Fine-Tuning T sim
SimCLR
CE/WE
MoCo v2
CE/WE
-0.140.890.260.98
FTAL0.170.740.300.84
RTAL0.240.680.340.76
Table 7. Robustness in black-box verification against pruning.
Table 7. Robustness in black-box verification against pruning.
Downstream
Dataset
Pruning
Ratio
MethodSimCLR (ImageNet) 
Accuracy
CE/WE
T cls
CE/WE/|CE-WE|
GTSRB0.2WPE79.8%80.3%30.8%84.5%53.7%
AWEncoder79.8%78.0%19.9%85.4%65.5%
0.4WPE77.9%77.2%29.6%79.2%49.6%
AWEncoder77.9%76.3%19.5%80.8%61.3%
0.6WPE71.2%70.5%27.9%68.0%40.1%
AWEncoder71.2%68.5%20.5%79.6%59.1%
0.8WPE64.9%64.8%27.5%65.7%38.2%
AWEncoder64.9%63.8%22.1%78.4%56.3%
Downstream
dataset
Pruning
ratio
MethodMoCo v2 ( CIFAR-10 ) 
GTSRB0.2WPE84.0%83.6%15.6%81.3%65.7%
AWEncoder84.0%82.9%14.8%85.4%70.6%
0.4WPE76.9%78.8%19.4%74.2%54.8%
AWEncoder76.9%77.5%16.3%80.1%63.8%
0.6WPE71.0%72.1%17.5%61.9%44.4%
AWEncoder71.0%70.0%23.4%76.3%52.9%
0.8WPE64.6%65.7%18.4%53.7%35.3%
AWEncoder64.6%62.6%22.1%73.3%51.2%
Table 8. Robustness in black-box verification against fine-tuning.
Table 8. Robustness in black-box verification against fine-tuning.
Fine-
Tuning
Downstream
Dataset
MethodSimCLR (ImageNet) 
Accuracy
CE/WE
T cls
CE/WE/|CE-WE|
FTALImageNetWPE74.8%74.3%22.3%74.0%51.7%
AWEncoder74.8%72.1%30.6%92.4%61.8%
GTSRBWPE65.7%66.2%32.8%61.7%28.9%
AWEncoder65.7%63.2%23.9%82.4%58.5%
STL-10WPE63.4%62.7%31.0%86.7%55.7%
AWEncoder63.4%60.8%14.6%89.3%74.7%
RTALImageNetWPE94.5%94.3%21.6%60.5%38.9%
AWEncoder94.5%92.7%39.8%81.4%41.6%
GTSRBWPE98.5%98.9%29.7%55.0%25.3%
AWEncoder98.5%97.6%37.6%78.4%40.8%
STL-10WPE83.1%82.2%24.6%50.3%25.7%
AWEncoder83.1%82.7%34.1%80.0%45.9%
Fine-
tuning
Downstream
dataset
MethodMoCo v2 ( CIFAR-10 ) 
FTALImageNetWPE78.3%77.2%18.2%77.6%59.4%
AWEncoder78.3%76.9%30.1%90.9%60.8%
GTSRBWPE89.1%87.5%15.9%55.8%39.9%
AWEncoder89.1%87.7%19.3%85.8%66.5%
STL-10WPE71.3%70.5%13.2%67.9%54.7%
AWEncoder71.3%70.1%18.8%81.1%62.3%
RTALImageNetWPE97.1%96.6%22.3%61.8%39.5%
AWEncoder97.1%96.9%35.2%75.8%40.6%
GTSRBWPE98.0%97.3%16.0%50.7%34.7%
AWEncoder98.0%97.1%33.8%82.1%48.3%
STL-10WPE89.2%87.5%18.7%43.5%24.8%
AWEncoder89.2%85.2%34.1%75.2%41.1%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, T.; Wu, H.; Lu, X.; Han, G.; Sun, G. AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive Learning. Appl. Sci. 2023, 13, 3531. https://doi.org/10.3390/app13063531

AMA Style

Zhang T, Wu H, Lu X, Han G, Sun G. AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive Learning. Applied Sciences. 2023; 13(6):3531. https://doi.org/10.3390/app13063531

Chicago/Turabian Style

Zhang, Tianxing, Hanzhou Wu, Xiaofeng Lu, Gengle Han, and Guangling Sun. 2023. "AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive Learning" Applied Sciences 13, no. 6: 3531. https://doi.org/10.3390/app13063531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop