Active Defense for Deepfakes Using Watermark-Guided Original Face Recovery

Guo, Yizhi; Liu, Ziqiao; Li, Yantao; Feng, Bingwen

doi:10.3390/electronics15030625

Open AccessArticle

Active Defense for Deepfakes Using Watermark-Guided Original Face Recovery

by

Yizhi Guo

,

Ziqiao Liu

,

Yantao Li

and

Bingwen Feng

^*

College of Cyber Security, Jinan University, Guangzhou 510632, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 625; https://doi.org/10.3390/electronics15030625

Submission received: 14 December 2025 / Revised: 21 January 2026 / Accepted: 27 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue Image Processing for Intelligent Electronics in Multimedia Systems)

Download

Browse Figures

Versions Notes

Abstract

At present, active defense strategies based on digital watermarking mainly rely on post-event watermark extraction, which verifies the occurrence of deepfake events by measuring the degree of watermark degradation, or on adversarial watermarks to interfere with image generation. To overcome these limitations, we propose a unified watermarking framework that can restore the original content of images tampered with by deepfakes. This scheme integrates three core components: an encoder for watermark pre-embedding, a decoder for robust watermark extraction, and a face restorer for watermark-guided image restoration. Numerous experiments have shown that this method has achieved good results in terms of extraction accuracy and recovery performance, thereby verifying the effectiveness of this approach.

Keywords:

deepfake; watermarking; proactive defense; robustness; recovery

1. Introduction

With the continuous advancement of deep learning, an increasing number of researchers have become engaged in this field, driving breakthroughs in related technologies across multiple domains, such as remote sensing image scene classification [1], deepfake detection [2,3,4], vulnerability detection [5], etc. Deepfakes are a technology capable of tampering with human faces. After such tampering, facial images may exhibit obvious differences and changes [6,7,8,9,10]. The societal impact of deepfakes is profound, and their potential risks are multifaceted. For instance, malicious actors may manipulate facial images of political figures to disrupt public order or engage in other harmful activities [11,12]. Consequently, developing effective defenses against deepfakes is imperative.

Defense against deepfakes mainly consists of proactive and passive approaches. Passive defense [13,14] focuses on capturing the subtle traces left by facial tampering to achieve security goals. Yu et al. [15] proposed a method that provides auxiliary detection for invisible face forgery through cross-domain comparison. However, passive approaches—due to the lack of a universal model—may not robustly cover images generated by known or unknown deep forgery tools. Lu et al. [16] designed a spatiotemporal model that regards spatial and temporal cues as a whole and synchronously captures forgery traces across frames and regions. Nie et al. [17] proposed a diffusion-based inconsistent pattern learning mechanism, which is specifically designed to uncover subtle contradictions at the temporal level.

However, these passive detection technologies cannot promptly prevent deepfakes from occurring and often can only be verified after the fact. Therefore, proactive detection technologies can play a more effective role in this regard. Proactive deepfake defense is generally achieved by adding perturbations [18,19] to the image or embedding watermarks [20,21].

Watermarks can be further refined into three mainstream types, namely adversarial watermarks, fragile watermarks, and robust watermarks. All three types involve pre-embedded watermarks, which are then used to detect and defend against deepfakes. Among them, adversarial watermarking focuses on interfering with the generation of deepfakes through watermarking, thereby resisting deepfakes at their source. Lv et al. [20] proposed an adversarial watermarking technique, embedding imperceptible perturbations in images, which cause the images to blur and degrade semantic fidelity during the processing of deepfake models, thereby simultaneously enhancing the anti-counterfeiting capabilities of both visual and automatic detection. Fragile watermarking is achieved by deliberately allowing the watermark to be destroyed by a deepfake for post-event comparison and detection. Neekhara et al. [22] designed a watermarking scheme that remains robust to ordinary interference but is vulnerable to deepfakes. Robust watermarks maintain a high extraction rate under any circumstance, making them suitable for post-event tracking. Wu et al. [23] constructed a watermarking scheme that can effectively resist interference in different deepfake scenarios, thereby maintaining the watermark’s robustness.

While the above-mentioned schemes offer some resistance against deepfakes, they generally lack the capability to restore manipulated images. Recent research, however, has begun to address this gap. For example, Yu et al. [24] proposed DFREC, a deepfake identity recovery method that simultaneously reconstructs source and target faces using identity segmentation and dual-path reconstruction, achieving high fidelity and traceability. Similarly, Ai et al. [25] introduced DeepReversion, an end-to-end UNet-based network that learns an inverse mapping from fake to original faces, enabling high-fidelity facial inversion and identity recovery. Shao et al. [26] further advanced the field with SeqFakeFormer, which performs sequential prediction of facial manipulations, allowing both the detection of tampered sequences and high-fidelity restoration of original faces.

It is important to clarify that “restoration” in this context refers primarily to the semantic recovery of facial attributes and overall visual content, rather than pixel-level reconstruction of high-frequency details. Moreover, these approaches typically operate without access to true attribute information, which limits their recovery precision. Critically, they also lack the ability to perform copyright verification. Therefore, a unified framework capable of simultaneous copyright verification, semantic content recovery, and forensic traceability remains urgently needed.

In this paper, we propose a unified watermarking framework that integrates the aforementioned functionalities. Our approach employs an encoder−decoder architecture to imperceptibly embed and reliably extract a robust watermark. A dedicated face restorer is further developed to recover images altered by deepfake manipulations. Crucially, the embedded watermark is designed to maintain high robustness even after deepfake processing. The main contributions of this work are as follows:

1.: A facial image recovery framework leveraging auxiliary watermark information has been developed. The framework restores deepfake images by extracting and utilizing guidance information from watermarks present in the images.
2.: A two-stage training procedure was implemented. First, an encoding and decoding architecture was fine-tuned to enhance the robustness of the watermark against both deepfake manipulations and subsequent image recovery operations. Subsequently, a face restorer network was fine-tuned, enabling it to recover deepfake-corrupted images under the guidance of the embedded watermarks.
3.: A dual watermarking framework is proposed. The first-layer watermark is used to provide the labels required for face recovery and can maintain robustness after various sources of noise interference. The second-layer watermark is used to hide additional information by extending the bit length. This integrated approach enables the watermarking system to support both face recovery and forensic traceability.

2. Related Work

2.1. Deepfakes Based on Attribute Editing

In recent years, the continuous advancement of deep learning techniques has driven the rapid evolution of deepfake technologies. Among them, deepfakes based on facial attribute editing [27,28] have gradually attracted widespread attention. This paradigm typically takes a host image and a target attribute vector as inputs, and outputs a forged image exhibiting noticeable alterations in hair color, facial characteristics, and other identity-related features, thereby achieving identity spoofing. StarGAN [6] is a representative method within this category; it attains high-fidelity generation via adversarial training between a generator and a discriminator. Similar frameworks include AttGAN [8], CafeGAN [7], PA-GAN [29], etc., all of which annotate specific facial attributes and leverage these annotations to guide the synthesis of manipulated images. Inspired by these approaches, our solution employs watermarks as supervisory labels to steer the recovery of deepfake images back to the original host image. As a supplement, here are three representative loss functions in attribute editing:

L_{c l s} = E_{I, V_{t a r}} [- l o g (D i s (V_{t a r} | G (I, V_{t a r})))] + E_{I, V_{o r i}} [- l o g (D i s (V_{o r i} | I))]

(1)

L_{r e c} = E_{I, V_{t a r}, V_{o r i}} [I - G (G (I, V_{t a r}), V_{o r i})]

(2)

L_{a d v} = E_{I} [l o g (D i s (I))] + E_{I, V_{t a r}} [l o g (1 - D i s (G (I, V_{t a r}))]

(3)

Among them, I is the host image,

V_{t a r}

is the target domain label,

V_{o r i}

is the source domain label, G is the deepfake generator, and

D i s

is the deepfake discriminator.

2.2. Proactive Defense Against Deepfakes Based on Digital Watermarking

With the abuse of deepfake technology, its defense countermeasures have become an urgent research topic that must be addressed. Compared with passive detection, active defense mechanisms have received extensive attention due to their advantages of proactive and real-time performance. Against this backdrop, the active defense paradigm based on digital watermarking, as a proactive solution, has become an important research direction in this field. Wu et al. [23] embedded watermarks in images in advance, achieving real-time detection of deepfakes through fragile watermarks, and then achieving post-event tracking of deepfakes by training robust watermarks. Zhang et al. [21] implemented an architectural solution that can first counter deepfakes and still maintain the robustness of the watermark when the countermeasure fails. This achieves dual protection based on the watermark, ensuring that the watermark can be successfully extracted after a deepfake occurs. Nadimpalli et al. [30] proposed an active deepfake detection technology based on GAN visible watermarks. By adding reconstruction regularization to the loss function, the generated fake images come with detectable watermarks. This further enables active detection of deepfakes based on watermarks. Wang et al. [31] designed a robust scheme. By using chaotic encryption and jointly training the encoder−decoder, they embedded identity semantics as watermarks into images, achieving high-precision active detection and source tracing of deepfake face-changing. Furthermore, they proposed a “LampMark” keypoint robust perception watermark. Through end-to-end embedding in real-person images, deepfakes can be actively detected by restoring consistency between the watermark and the test image [32].

Although the current defense schemes can achieve effective deepfake defense, their limitation lies in the fact that once the image is tampered with, it loses its value because it cannot be recovered. Therefore, this research aims to propose a reliable watermarking mechanism. This mechanism should not only utilize the watermark information to guide the restoration of image content, but also ensure the robustness of the watermark throughout the entire process, thereby ensuring the security of the entire procedure.

3. Security Objective

In the envisioned scenario, given that facial images contain highly sensitive information such as identity, physiological features, and behavioral patterns, attackers can generate realistic deepfake content based on them, thereby triggering risks such as identity forgery, reputation damage, and social engineering attacks. Therefore, it is urgently necessary to build a proactive defense mechanism for facial images to weaken or block the information sources of potential deepfake attacks at the data level.

Therefore, we propose a digital watermarking-centered propagation process protection paradigm: embedding a robust watermark that is imperceptible and carries attribute information in the protected image before transmission, and providing the user with a decoder in conjunction. Users can utilize this decoder to extract the embedded watermark, parse its specific bit segments to obtain image attributes, and thereby achieve restoration of the original content. The overall pipeline is illustrated in Figure 1. Mathematically, we have the following:

\begin{matrix} W_{enc} (I_{h}, w) \to I_{w} \end{matrix}

(4)

\begin{matrix} G (I_{w}) \to I_{f} \end{matrix}

(5)

\begin{matrix} W_{dec} (I_{f}) \to w^{'} \end{matrix}

(6)

\begin{matrix} A (w^{'}) \to A_{e x t} \end{matrix}

(7)

\begin{matrix} Gen (I_{f}, A_{e x t}) \to I_{r e s} \end{matrix}

(8)

Among them,

W_{e n c}

is the watermark encoder,

W_{d e c}

is the watermark decoder,

G e n

is the face restorer,

I_{h}

is the host image,

I_{w}

is the encoded image after embedding the watermark,

w

is the embedded watermark,

w^{'}

is the extracted watermark, A is a function corresponding to Equation (12),

A_{e x t}

is the attribute label recovered from the watermark,

I_{f}

is the image tampered with by a deepfake, G is the generator used for deepfake generation, and

I_{r e s}

is the recovered image.

4. Proposed Method

This section elaborates on, as thoroughly as possible, the overall idea, the technical route, and implementation details of the proposed solution.

4.1. Framework Overview

We employ a robust watermarking technique to realize the envisioned protection mechanism. Specifically, we adopt the end-to-end embedding–extraction pipeline proposed in [23] as our backbone framework, given its ability to maintain high watermark fidelity even under facial attribute manipulations. The overall architecture, depicted in Figure 2, comprises three key components: an encoder, a decoder, and a face restorer. The encoder embeds a watermark sequence containing facial attribute labels and copyright information into the original facial image. The decoder then extracts this embedded information from a potentially distorted or manipulated image. Finally, the face restorer reconstructs the original facial appearance using the extracted data.

4.2. Network Architecture

4.2.1. Encoder and Decoder

The encoder works in coordination along two parallel paths to achieve watermark embedding. Specifically, the first path extracts the bottleneck feature representation of the host image with the help of the UNet structure; the second path diffuses the watermark bits to different resolution levels through multi-level diffusion and upsampling modules. In the upsampling path of the UNet, the features extracted by the contraction layer in the first path and the extended watermark features in the second path are fused into the corresponding expanded features. In addition, the SE-block is deeply integrated with the ResBlock in the upsampling stage to efficiently complete the watermark embedding task.

The decoder is composed of a UNet structure, a 1 × 1 convolutional layer, and a downsampling operation in sequence. This decoder has been specially trained to enhance the robustness of the watermark extraction process, ensuring accurate restoration of watermark information even under complex conditions.

4.2.2. Face Restorer

The input to this face restorer is the concatenation tensor of a three-channel image and a conditional vector. First, the initial feature mapping is completed through a 7 × 7 convolution, and then high-dimensional bottleneck features are extracted by downsampling. Multiple residual blocks are stacked to enhance the feature representation. Subsequently, the feature dimensions are restored through transposed convolution upsampling. Finally, the image is output by a

7 \times 7

convolution and an activation function. The conditional vector participates in the feature transformation throughout the process after spatial replication.

Please note that while training the face restorer, in order to improve the quality of the images, we also incorporated a discriminator for adversarial joint training. It is a PatchGAN [33].

4.3. Dual Watermarking Mechanism

In this section, we will elaborate on the proposed dual watermarking mechanism.

4.3.1. Dual Watermarking Construction

Dual watermarking

w = w_{1}, w_{2}, . . ., w_{n}

is composed of the symbol bit

A_{e x t} \in {0, 1}

and the extended bit

A_{c o p y} \in {0, 1}

: the first layer

A_{e x t}

is used for face restoration labels, and the second layer

A_{c o p y}

is used to hide additional information by extending the bits.

The first layer construction. For the first layer, we set the watermark according to the following formula:

w_{1 {st}_{i}} = \{\begin{matrix} + 0.1, & if A_{a t t_{i}} = 1 \\ - 0.1, & if A_{a t t_{i}} = 0 \end{matrix}

(9)

Among them,

A_{a t t_{i}}

is the i-th facial attribute label, and

w_{1 {st}_{i}}

is the i-th first-layer watermark.

The second layer construction. We dynamically allocate the amplitude based on

A_{c o p y}

as follows:

Δ A_{o f f_{i}} = \{\begin{matrix} + 0.025, & if A_{c o p y_{i}} = 1 \\ - 0.025, & if A_{c o p y_{i}} = 0 \end{matrix}

(10)

Among them,

Δ A_{o f f_{i}}

is the offset of the i-th bit, and

A_{c o p y_{i}}

is the i-th bit of

A_{c o p y}

.

Then, substitute the watermark offset and the first-layer watermark into the following formula to obtain the final watermark:

w_{i} = sign (w_{1 {st}_{i}}) (| w_{1 {st}_{i}} | + Δ A_{o f f})

(11)

Here,

w_{1 {st}_{i}}

is the first layer watermark at the i-th bit, and

w_{i}

is the embedded watermark at the i-th bit. It should be noted that—in order to improve the accuracy of subsequent extraction of the second-layer watermark—we will repeat the embedding of each bit five times during the embedding process, that is, a total of six watermark bits will be embedded.

4.3.2. Dual Watermarking Extraction

The first layer extraction. Specifically, after the user extracts the watermark, since the watermark is inevitably subject to slight interference during the embedding and extraction process, in order to improve the restoration accuracy of the image, the user binarizes the received watermark signal: each soft bit estimation undergoes a hard decision with a threshold of 0.5, thereby obtaining a noise-free attribute vector label.

A_{e x t_{i}} = \{\begin{matrix} 1, & if w_{i}^{'} \geq 0 \\ 0, & otherwise \end{matrix}

(12)

Among them,

w_{i}^{'}

is the i-th bit for extracting the watermark, and

A_{e x t_{i}}

is the i-th bit for restoring the facial attributes. Here, we adopt the indices

i \in [0, 4]

to denote the extracted facial-attribute label as

A_{e x t}

.

Subsequently, for the tampered image, the user sends it along with the attribute label

A_{e x t}

into the face restorer. The face restorer then outputs the restored image based on this, making it visually approach the original image as closely as possible to achieve the security goal.

The second layer extraction. The extraction of the second-layer watermark first calculates the value of each bit of

A_{c o e x}

according to the following formula:

A_{c o e x_{i}} = \{\begin{matrix} 1, & if | w_{i}^{'} | > 0.1 \\ 0, & otherwise \end{matrix}

(13)

Among them,

A_{c o e x_{i}}

is the i-th bit of the second-layer watermark for extraction and determination.

Then, since we mentioned in the previous section that each digit is actually repeated five times in the embedding, a majority vote is used to determine the final result, as shown below:

A_{r e s u l t}^{(g)} = \{\begin{matrix} 1, & if \sum_{j = 0}^{4} A_{v o t e}^{(5 g + j)} \geq 3, g = 0, 1, \dots, 5 . \\ 0, & otherwise \end{matrix}

(14)

Among them,

A_{v o t e}

represents every five elements, and

A_{r e s u l t}

is the voting result, that is, the element with more than half of the votes selected from every five is taken as the final result.

4.4. Loss Function and Training Procedure

In this section, we will describe the training measures and loss functions. Besides extracting the embedded watermark bits, the authenticity of facial images altered by deepfake manipulations must also be restored. We draw on the training architecture in [6]. The generator, jointly optimized with an adversarial reconstruction loss and an attribute-classification loss, is capable of reconstructing the original facial appearance when conditioned on the true attribute vector recovered via the watermark. We also leverage its reconstruction loss to aid in image restoration.

Our training is divided into two stages. First, we fine-tune the encoding and decoding parts to ensure that the watermark maintains high robustness in specific environments. Then, we fine-tune the face restorer network to recover the original image and improve the restoration effect in terms of image quality.

4.4.1. Watermark Decoding Stage

We perform fine-tuning based on the framework proposed in [23]. During training, we keep the encoder and discriminator frozen, and introduce a secondary noise simulation step—each encoded image is passed through the same noise perturbation twice consecutively. This strategy enhances the robustness of our fine-tuned model against various deepfake manipulations. The corresponding loss function is defined as follows:

\begin{matrix} I_{n o i s e} = N (N (I_{w})) \end{matrix}

(15)

\begin{matrix} L_{d e} = ∥ w - W_{dec} (I_{n o i s e}) ∥ \end{matrix}

(16)

Here,

N

represents various types of noise, including identity transformation, common signal processing (JpegTest with

Q = 50

, Resize with

p = 50 %

, GaussianBlur with

k = 3, σ = 2

, MedianBlur with

k = 3

, Brightness with

f = 0.5

, Contrast with

f = 0.5

, Saturation with

f = 0.5

, Hue with

f = 0.1

, and GaussianNoise with

σ = 0.1

), adversarial attack (SPSA [34]), and generative attacks (StarGAN, AttGAN, and CafeGAN). Among these, SPSA is a gradient-free adversarial attack strategy configured with the following parameters: steps = 20, learning rate = 0.2/255, and standard deviation

= 0.005

. To ensure that the embedded watermark information resists a series of such perturbations, we apply the noise twice to the image, as shown in Equation (15), aiming to achieve enhanced robustness.

Therefore, the total loss at this stage is as follows:

L_{S_{1}} = L_{d e}

(17)

The pseudocode for the training of watermark decoding is detailed in Algorithm 1.

Algorithm 1: Training of watermark decoding

Input:: Trained $W_{enc} ()$ , host images $I_{h}$ , watermark $w$ , and epoch number $i_{n}$ .
Output:: Trained $W_{dec} ()$ .
1:: for $i = 0$ to $i_{n}$ do
2:: Randomly select a host image $I_{h}$ .
3:: Generate the watermarked image $I_{w}$ using $w$ .
4:: Generate a noised image $I_{n o i s e}$ using $N$ .
5:: Compute $L_{d e}$ using $w$ and $I_{n o i s e}$ .
6:: Optimize $W_{dec} ()$ by minimizing $L_{d e}$ .
7:: end for

4.4.2. Image Recovery Stage

Based on the successful optimization of the watermark extraction task, this paper further focuses on improving image restoration performance. Specifically, we take the Network [6] as the backbone and migrate it to the image restoration scenario through fine-tuning strategies. During the training procedure, we observed that the reconstruction loss introduced by [6] has a significant benefit for maintaining the consistency of details with the structure. Based on this, we explicitly introduce this reconstruction term as an auxiliary supervision signal in the original objective function to strengthen the optimization constraints of the image restoration branch, thereby systematically improving the visual quality of the recovered image.

Inspired by [24], we further introduce the following loss function to assist in restoring images to be closer to their original state during training. This loss term relies on a VGG19 model pre-trained on ImageNet [35] and operates by quantifying the feature-level differences between the original face image and the restored face image:

L_{p c e} = \sum_{n = 1}^{5} {∥b_{n} (I_{h}) - b_{n} (I_{rec})∥}_{2}

(18)

Here,

b_{n}

denotes the feature map from the n-th layer of VGG; the first five layers used are relu1_2, relu2_2, relu3_4, relu4_4, and relu5_4 as in [35].

I rec

represents the restored image.

Therefore, the complete loss function is defined as follows:

L_{S_{2}} = λ_{1} L_{c l s} + λ_{2} L_{r e c} + λ_{3} L_{a d v} + λ_{4} L_{p c e}

(19)

where

L_{c l s}

is the classification loss from Equation (1),

L_{r e c}

is the reconstruction loss from Equation (2),

L_{a d v}

is the adversarial loss from Equation (3), and

L_{p c e}

is the newly introduced auxiliary loss. The coefficients

λ_{1}

,

λ_{2}

,

λ_{3}

, and

λ_{4}

are hyperparameters tuned to ensure high-quality image recovery.

The training pseudocode for this stage is shown in Algorithm 2.

Algorithm 2: Training of image restoration

Input:: Host images $I_{h}$ , watermark $w$ , target domain
: vector $V_{t a r}$ , source domain vector $V_{o r i}$ , and epoch number $i_{n}$ .
Output:: Trained $Gen ()$ .
1:: for $i = 0$ to $i_{n}$ do
2:: Randomly select a host image $I_{h}$ .
3:: Generate a watermarked image $I_{w}$ using the host image $I_{h}$ and watermark $w$ .
4:: Generate a deepfake image $I_{f}$ using the target domain vector $V_{t a r}$ and the watermarked image $I_{w}$ .
5:: Generate a recovered image $I_{r e s}$ using $V_{o r i}$ and the deepfake image $I_{f}$ .
6:: Compute $L_{S_{2}}$ using $V_{o r i}$ , $V_{t a r}$ , and $I_{h}$ .
7:: Optimize $Gen ()$ by minimizing $L_{S_{2}}$ .
8:: end for

5. Experimental Results

5.1. Experimental Setup

5.1.1. Datasets

We conducted our experiments using three publicly available face image datasets: CelebA-HQ [36] (30,000 images), FFHQ [37] (70,000 images), and HumanFace [38] (1000 images). All images, originally provided at

1024 \times 1024

resolution, were uniformly resized to

128 \times 128

pixels to meet computational requirements.

5.1.2. Baseline Methods

Our method was compared against several state-of-the-art watermarking schemes [23,39,40,41]. For fair comparison, we used the official code and pre-trained models released by the respective authors, while maintaining consistent experimental conditions, including image size, watermark length, and noise layer configurations. To evaluate robustness against deepfake attacks, we employed three attribute-manipulation models: StarGAN, CafeGAN, and AttGAN [6,7,8]. Our experiments specifically focused on deepfake-induced tampering involving hair color alteration and gender modification.

5.1.3. Implementation Details

The watermark length for all the compared schemes was fixed at 30 bits, consistent with the design in [23], to ensure fair comparison. For deepfake generation, we employed officially released pre-trained models for all methods except CafeGAN. Since no pre-trained CafeGAN model was available, we retrained it using the original paper’s publicly released code and configurations.

Our experiments utilized a total of 30,000 images. For watermark extraction training, 24,195 images were used for training and 3003 for validation. A separate set of 200 images was randomly selected for testing. The model was fine-tuned for three epochs with a batch size of 16, using the Adam optimizer (

l_{r} = 0.0002

,

β = 0.5

).

For the final evaluation, the dataset was split into 28,000 training images and 2000 test images. To rigorously assess robustness, the test images were processed through three different deepfake algorithms to simulate challenging real-world conditions. A final test set of 2000 images was then constructed from these processed outputs. The image restoration model was fine-tuned for 1000 iterations with a batch size of 32, using the Adam optimizer (

α = 1 \times 10^{- 5}

,

β_{1} = 0.5

,

β_{2} = 0.999

). The learning rate decayed every 1000 iterations. The loss function comprised a weighted combination of multiple terms, with coefficients set as

λ_{1} = 1

,

λ_{2} = 10

,

λ_{3} = 1

, and

λ_{4} = 0.5

.

All implementations were developed in PyTorch 2025.2 and executed on an NVIDIA RTX 3090 GPU.

5.1.4. Evaluation Metrics

A comprehensive set of metrics was employed to evaluate the experimental results. Watermark robustness was assessed using the average bit error rate (BER). For ablation studies and image quality assessment, we utilized the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS).

5.2. Watermark Extraction Accuracy Evaluation

We first evaluate the accuracy of watermark extraction in the absence of distortions. Our framework employs a dual watermarking scheme. The first layer serves as a robust watermark, while the second layer functions as an extended robust watermark. Initially, we individually evaluated the robustness of each watermark layer across different datasets under no-interference conditions, as summarized in Table 1.

As shown in the results, both the first-layer and second-layer watermarks maintain strong robustness. This demonstrates that in the absence of attacks, our scheme can reliably verify facial attributes and image copyright.

5.3. Recovery Evaluation

This section presents the restoration results of images subjected to deepfake interference. The receiver extracts the embedded watermark bits and submits them together with the corrupted images to a face restorer model. The model then returns the recovered images.

We investigated two typical tampering scenarios, namely, color alteration and gender modification, to validate the effectiveness of the proposed method. Specifically, we altered the hair color in a facial image from black to blonde and reversed the gender of the subject. Subsequently, the tampered face image, along with its embedded watermark tags, was jointly fed into the constructed generator for recovery. Representative results are illustrated in Figure 3, Figure 4 and Figure 5. These results demonstrate that both the hair color and the gender of the subject were successfully restored, confirming the efficacy of the proposed approach in handling the aforementioned tampering scenarios.

6. Discussion

6.1. Robustness Analysis

This section begins with an evaluation of the robustness of each watermark layer across different datasets under various distortion conditions. The experimental results are summarized in Table 2. These results indicate that the BER scores for the first-layer watermark in our scheme remain close to zero across all tested datasets and distortions, demonstrating its high robustness. Concurrently, the BER for the second-layer watermark remains stable, but slightly higher than that of the first layer. This confirms that the second-layer watermark is suitable for serving as an extension for concealing additional information.

Subsequently, we evaluate the first-layer watermark under different deepfake reconstructions across multiple datasets. Its BER values under various distortions are listed in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. The experimental results demonstrate that the proposed scheme exhibits exceptional robustness: in most scenarios, its BER remains below

10 %

, significantly outperforming the comparative schemes.

From the perspectives of dataset adaptability and scene adaptability, the proposed scheme maintains a low watermark extraction BER on three mainstream facial datasets across various experimental scenarios involving specific deepfake perturbation-recovery followed by common noise attacks. In StarGAN-related perturbation scenarios—whether after hair color alteration, gender modification, or subsequent common noise attacks—the proposed scheme sustains a low BER on each dataset (primarily between 0.4% and 7%). In corresponding scenarios for AttGAN and CafeGAN involving hair color or gender perturbation-recovery followed by common noise, the BER performance of our scheme remains stable, generally within the range of 1.7% to 14.8%. Even when compared against the best-performing benchmark, our scheme maintains a clear advantage in most settings (with an average BER reduction ranging from approximately 0.02% to 2.3%), fully demonstrating its strong dataset generalization capability and scene adaptability. Remarkably, even after undergoing specific deepfake perturbation and recovery processes, the scheme can effectively withstand subsequent common noise attacks, unaffected by differences in facial data distribution or specific perturbation types.

For scenarios involving specific deepfake perturbations followed by strong interference, such as Gaussian noise or other common noises, the proposed solution still maintains excellent watermark extraction performance. Across all experimental settings, even after images undergo color or transformation perturbations by StarGAN, AttGAN, or CafeGAN followed by restoration and subsequent strong interference (e.g., Gaussian noise, SPSA), the BER of our solution remains significantly lower than that of all benchmark schemes. In contrast, the BER of the comparative schemes typically increases substantially in such complex scenarios, sometimes exceeding 50%. This further verifies that the proposed solution can effectively recover the watermark after specific deepfake perturbations and restoration, exhibiting outstanding robustness.

In summary, the proposed scheme exhibits strong robustness, maintaining stable performance even against subsequent common noise attacks following facial attribute manipulation. This confirms its capability to provide reliable traceability for facial image copyright protection.

6.2. Recovered Images Quality Analysis

This section compares our proposed approach with the image restoration solution GFP-GAN [42]. We have separately evaluated the image restoration quality after being processed by different deepfakes under each dataset, as shown in Table 9.

The results show that although the recovery performance for some specific distortion types is slightly better or worse than that of GFP-GAN, overall, the method proposed in this paper demonstrates superior image recovery performance over GFP-GAN in the vast majority of deepfake distortion types and across all tested datasets. This fully validates that the proposed method has stronger recovery capabilities and generalization in deepfake image restoration.

6.3. Complexity Analysis

As our networks are largely built upon convolutional layers, we provide a complexity analysis of the adopted model in this section. The computational complexity can be derived from the following formula [43]:

O (N e t) = (\sum_{i = 1}^{d} l_{i - 1} \times t_{i}^{2} \times g_{i} \times p_{i}^{2})

(20)

Here, d represents the depth, and

l_{i - 1}

,

t_{i}^{2}

,

g_{i}

, and

p_{i}^{2}

are the number of input channels of the i-th layer, the size of the filters, the number of filters, and the size of the output feature map, respectively.

Thus, the complexity of the encoder in our method is

2.84 \times 10^{9}

, the complexity of the decoder is

1.88 \times 10^{9}

, the complexity of the face restorer is

5.66 \times 10^{9}

, and the complexity of the discriminator is

3.44 \times 10^{10}

.

6.4. Ablation Study

All experimental images and model configurations in this section remain consistent with those described in Section 5.1. For testing, 200 images of size

128 \times 128

were selected from the dataset.

6.4.1. Parameter Sensitivity

When embedding the watermark into the image using the encoder, we also introduce a strength factor to control the embedding intensity. This design balances the visibility and robustness of the watermark, ensuring it can be reliably extracted in various application scenarios without significantly degrading the visual quality of the host image, as written as follows:

I_{w} = I_{h} + γ \cdot (I_{w} - I_{h})

(21)

where

γ

is the strength factor.

To systematically evaluate its impact, we performed a sensitivity analysis on the strength factor parameter. This parameter governs the watermark embedding intensity, thereby directly influencing both the robustness of the watermark and the visual fidelity of the host image. We conducted comparative experiments with three distinct values of the strength factor: 0.5, 1, and 2, as detailed in Table 10 and Table 11.

The experimental results indicate that when the strength factor is set to 0.5, the image quality improves—as evidenced by higher PSNR and SSIM values—at the cost of a reduced watermark extraction rate, which suggests compromised watermark robustness under this setting. In contrast, increasing the strength factor to 2 significantly improves the extraction rate but degrades image quality, indicating that an excessively high embedding intensity harms visual fidelity. Setting the strength factor to 1 achieves an optimal balance, ensuring both reliable watermark extraction and satisfactory visual preservation.

6.4.2. Effectiveness of Watermark on Image Recovery

This section examines the necessity of using watermarks to guide image recovery. In practice, relying on image labels for restoration faces significant limitations. Under the worst-case scenario, users may possess no prior knowledge of the label’s structure, making correct recovery impossible. Even when users are aware of basic formatting constraints, such as the number of digits and the valid value range, randomly generated labels still rarely match the true content. Providing an incorrect label causes the face recovery model to fail, as demonstrated in Figure 6. Therefore, guiding the recovery process through extractable watermarks presents a reliable and necessary alternative. All erroneous labels in this section are assigned a value of 1.

6.4.3. Training Procedure

We first clarify the necessity of retraining the decoder. The proposed framework builds upon SepMark [23] but introduces a decoder retraining stage. To validate the importance of this stage, we quantitatively compare the performance of our watermarking scheme with that of SepMark. As shown in the first two rows of the experimental results in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, our scheme achieves a lower bit error rate (BER) than SepMark in the StarGAN restoration scenario. Moreover, in both AttGAN and CafeGAN restoration scenarios, the proposed scheme consistently demonstrates stronger robustness against various distortions introduced by deepfake manipulations. These results confirm that the retraining stage is essential, as it significantly enhances watermark robustness, ensuring reliable watermark preservation after image restoration operations.

Then, we assess the necessity of the specialized training for the face restorer. As shown in Table 12 (the three values in parentheses denote PSNR, SSIM, and LPIPS, respectively), the images restored by our fine-tuned face restorer achieve significantly better visual quality than those generated by the benchmark method [6]. This performance gain confirms the effectiveness and necessity of the dedicated optimization employed in our training.

7. Conclusions

In this paper, we propose an active defense scheme based on digital watermarking to recover images compromised by deepfake attacks. The scheme employs a trained encoder−decoder model that effectively embeds watermark information into images and enables stable extraction even after various types of interference, including deepfake manipulations. Furthermore, an image recovery model is introduced, which utilizes the extracted watermark information to effectively recover tampered images.

Experimental results demonstrate the high robustness of the proposed watermarking mechanism, confirming the overall feasibility and effectiveness of the defense scheme. In future work, we will focus on enhancing the quality of the recovered images through approaches such as optimizing the loss function and refining the model architecture, thereby improving the scheme’s practicality and visual performance.

Author Contributions

Conceptualization, B.F.; methodology, Y.G. and B.F.; software, Y.G.; validation, Y.G.; formal analysis, Y.G., Z.L., Y.L. and B.F.; investigation, Y.G., Z.L., Y.L. and B.F.; resources, B.F.; data curation, Y.G., Z.L. and B.F.; writing—original draft preparation, Y.G. and B.F.; writing—review and editing, Y.G. and B.F.; visualization, Y.G. and B.F.; supervision, B.F.; project administration, B.F.; funding acquisition, B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62472199, 62261160653, 62441237), the Natural Science Foundation of Guangdong Province, China (Grant No. 2025A1515011601), and the Opening Project of the MoE Key Laboratory of Information Technology (Sun Yat-sen University) (Grant No. 2024ZD001).

Data Availability Statement

The original data presented in this study are all included in the article. If you have any questions, please consult the author for details.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huo, Y.; Gang, S.; Guan, C. FCIHMRT: Feature cross-layer interaction hybrid method based on Res2Net and transformer for remote sensing scene classification. Electronics 2023, 12, 4362. [Google Scholar] [CrossRef]
Hashmi, A.; Shahzad, S.A.; Lin, C.W.; Tsao, Y.; Wang, H.M. AVTENet: A human-cognition-inspired audio-visual transformer-based ensemble network for video deepfake detection. IEEE Trans. Cogn. Dev. Syst. 2025, in press. [Google Scholar]
Katı, B.E.; Küçüksille, E.U.; Sarıman, G. Enhancing deepfake detection through quantum transfer learning and class-attention vision transformer architecture. Appl. Sci. 2025, 15, 525. [Google Scholar] [CrossRef]
Deressa, D.W.; Mareen, H.; Lambert, P.; Atnafu, S.; Akhtar, Z.; Van Wallendael, G. GenConViT: Deepfake video detection using generative convolutional vision transformer. Appl. Sci. 2025, 15, 6622. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, F.; Zhao, B.; Zhou, B.; Xiao, B. VulD-Transformer: Source code vulnerability detection via transformer. In Proceedings of the 14th Asia-Pacific Symposium on Internetware; Association for Computing Machinery: New York, NY, USA, 2023; pp. 185–193. [Google Scholar]
Choi, Y.; Choi, M.; Kim, M.; Ha, J.-W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
Kwak, J.; Han, D.-K.; Ko, H. CAFE-GAN: Arbitrary face attribute editing with complementary attention feature. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 524–540. [Google Scholar]
He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X. AttGAN: Facial attribute editing by only changing what you want. IEEE Trans. Image Process. 2019, 28, 5464–5478. [Google Scholar] [CrossRef] [PubMed]
Pumarola, A.; Agudo, A.; Martinez, A.M.; Sanfeliu, A.; Moreno-Noguer, F. GANimation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 818–833. [Google Scholar]
Chen, R.; Chen, X.; Ni, B.; Ge, Y.; Zhang, J. SimSwap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia (ACM MM); Association for Computing Machinery: New York, NY, USA, 2020; pp. 2003–2011. [Google Scholar]
Huang, Q.; Zhang, J.; Zhou, W.; Liu, J.; Lu, J. Initiative defense against facial manipulation. Proc. AAAI Conf. Artif. Intell. 2021, 35, 1619–1627. [Google Scholar] [CrossRef]
Wang, X.; Huang, J.; Ma, S.; Liu, Y.; Wang, S. Deepfake disrupter: The detector of deepfake is my friend. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14920–14929. [Google Scholar]
Shi, Z.; Liu, W.; Chen, H. Face reconstruction-based generalized deepfake detection model with residual outlook attention. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 1–19. [Google Scholar] [CrossRef]
Cao, J.; Ma, C.; Yao, T.; Chen, X.; Ding, S. End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4113–4122. [Google Scholar]
Yu, Y.; Ni, R.; Yang, S.; Zhao, Y.; Wang, L. Narrowing domain gaps with bridging samples for generalized face forgery detection. IEEE Trans. Multimed. 2023, 26, 3405–3417. [Google Scholar] [CrossRef]
Lu, W.; Liu, L.; Zhang, B.; Liu, J.; Tao, D. Detection of deepfake videos using long-distance attention. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 9366–9379. [Google Scholar] [CrossRef] [PubMed]
Nie, F.; Ni, J.; Zhang, J.; Wang, X.; Liu, H. DIP: Diffusion learning of inconsistency pattern for general deepfake detection. IEEE Trans. Multimed. 2024, 26, 1–14. [Google Scholar] [CrossRef]
Yang, C.; Ding, L.; Chen, Y.; Zhang, H.; Xiang, T. Defending against GAN-based deepfake attacks via transformation-aware adversarial faces. In 2021 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
Qu, Z.; Xi, Z.; Lu, W.; Liu, L.; Tao, D. DF-RAP: A robust adversarial perturbation for defending against deepfakes in real-world social network scenarios. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3943–3957. [Google Scholar] [CrossRef]
Lv, L. Smart watermark to defend against deepfake image manipulation. In 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS); IEEE: Piscataway, NJ, USA, 2021; pp. 380–384. [Google Scholar]
Zhang, Y.; Ye, D.; Xie, C.; Zhang, Y.; Wang, Y. Dual defense: Adversarial, traceable, and invisible robust watermarking against face swapping. IEEE Trans. Inf. Forensics Secur. 2024, 19, 4628–4641. [Google Scholar] [CrossRef]
Neekhara, P.; Hussain, S.; Zhang, X.; Jia, J.; Huang, J. FaceSigns: Semi-fragile watermarks for media authentication. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–21. [Google Scholar] [CrossRef]
Wu, X.; Liao, X.; Ou, B. Sepmark: Deep separable watermarking for unified source tracing and deepfake detection. In Proceedings of the 31st ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1190–1201. [Google Scholar]
Yu, P.; Gao, H.; Fei, J.; Zhang, Y.; Liu, J.; Wang, S. DFREC: DeepFake Identity Recovery Based on Identity-aware Masked Autoencoder. arXiv 2024, arXiv:2412.07260. [Google Scholar]
Ai, J.; Wang, Z.; Huang, B.; Liu, Y.; Zhang, X. DeepReversion: Reversely inferring the original face from the deepfake face. In 2023 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Shao, R.; Wu, T.; Liu, Z. Detecting and recovering sequential deepfake manipulation. In European Conference on Computer Vision (ECCV); Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 712–728. [Google Scholar]
Wu, P.W.; Lin, Y.J.; Chang, C.H.; Lee, H.Y. RelGAN: Multi-domain image-to-image translation via relative attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5914–5922. [Google Scholar]
Li, X.; Zhang, S.; Hu, J.; Yang, J.; Ni, Z. Image-to-image translation via hierarchical style disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 8639–8648. [Google Scholar]
He, Z.; Kan, M.; Zhang, J.; Shan, S.; Chen, X. PA-GAN: Progressive attention generative adversarial network for facial attribute editing. arXiv 2020, arXiv:2007.05892. [Google Scholar] [CrossRef]
Nadimpalli, A.V.; Rattani, A. Proactive deepfake detection using GAN-based visible watermarking. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–27. [Google Scholar] [CrossRef]
Wang, T.; Huang, M.; Cheng, H.; Ma, B.; Wang, Y. Robust identity perceptual watermark against deepfake face swapping. arXiv 2023, arXiv:2311.01357. [Google Scholar]
Wang, T.; Huang, M.; Cheng, H.; Zhang, X.; Shen, Z. Lampmark: Proactive deepfake detection via training-free landmark perceptual watermarks. In Proceedings of the 32nd ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2024; pp. 10515–10524. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Uesato, J.; O’donoghue, B.; Kohli, P.; Oord, A. Adversarial risk and the dangers of evaluating against weak attacks. In Proceedings of the 35th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2018; pp. 5025–5034. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
JoinDatawithme. Humanface of variOus Age Groups. Available online: https://github.com/JoinDatawithme/Humanface_of_various_age_groups (accessed on 13 January 2026).
Jia, Z.; Fang, H.; Zhang, W. MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression. In Proceedings of the 29th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2021; pp. 41–49. [Google Scholar]
Ma, R.; Guo, M.; Hou, Y.; Zhang, C.; Liu, Y. Towards blind watermarking: Combining invertible and non-invertible mechanisms. In Proceedings of the 30th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1532–1542. [Google Scholar]
Fang, H.; Jia, Z.; Ma, Z.; Wang, X.; Zhang, W. PIMoG: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network. In Proceedings of the 30th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2022; pp. 2267–2275. [Google Scholar]
Wang, X.; Li, Y.; Zhang, H.; Shan, Y. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 9168–9178. [Google Scholar]
Shah, B.; Bhavsar, H. Time complexity in deep learning models. Procedia Comput. Sci. 2022, 215, 202–210. [Google Scholar] [CrossRef]

Figure 1. The scenario we envisioned. We can embed watermarks in the images to be disseminated in advance. Then, after the user receives the images, they can recover the images based on the extracted watermarks to resist deepfake attacks.

Figure 2. The framework of the proposed scheme. In the embedding phase, a watermark containing the original attributes is embedded into the host image using

Encoder

. In the extraction and recovery phase, the watermark is extracted using

Decoder

, and the original face is recovered using

Face restorer

.

Figure 2. The framework of the proposed scheme. In the embedding phase, a watermark containing the original attributes is embedded into the host image using

Encoder

. In the extraction and recovery phase, the watermark is extracted using

Decoder

, and the original face is recovered using

Face restorer

.

Figure 3. The image recovery effect of CelebAHQ. The first row is the host image, the second row is the watermark image, the third row is the deepfake image, and the fourth row is the recovered image. (a–f) are StarGAN (blond), AttGAN (blond), CafeGAN (blond), StarGAN (male), AttGAN (male), and CafeGAN (male).

Figure 4. The image recovery effect of FFHQ. The first row is the host image, the second row is the watermark image, the third row is the deepfake image, and the fourth row is the recovered image. (a–f) are StarGAN (blond), AttGAN (blond), CafeGAN (blond), StarGAN (male), AttGAN (male), and CafeGAN (male).

Figure 5. The image recovery effect of HumanFace. The first row is the host image, the second row is the watermark image, the third row is the deepfake image, and the fourth row is the recovered image. (a–f) are StarGAN (blond), AttGAN (blond), CafeGAN (blond), StarGAN (male), AttGAN (male), and CafeGAN (male).

Figure 6. Cases where the restoration failed due to incorrect labels. The first row is the host image, the second row is the watermark image, the third row is the deepfake image, and the fourth row is the recovered image. (a–f) are CelebAHQ (blond), CelebAHQ (male), FFHQ (blond), FFHQ (male), HumanFace (blond), HumanFace (male).

Table 1. Watermark extraction accuracy (BER) under distortion-free conditions.

Dataset	Watermark Extraction Accuracy (BER)
Dataset	First Layer Watermark	Second Layer Watermark
HumanFace	$0.0000 %$	$9.3333 %$
FFHQ	$0.0000 %$	$10.2500 %$
CelebAHQ	$0.0000 %$	$3.4167 %$
Average	$0.0000 %$	$7.6667 %$

Table 2. Watermark extraction accuracy (BER) under various distortions.

Distortion Type	The First Layer Watermark			The Second Layer Watermark
Distortion Type	HumanFace	FFHQ	CelebAHQ	HumanFace	FFHQ	CelebAHQ
Identity	0.0000%	0.0000%	0.0000%	9.3333%	10.2500%	3.4167%
Jpeg	0.1333%	0.2333%	0.2167%	11.5833%	15.9167%	14.9167%
Resize	0.0000%	0.0167%	0.0000%	8.5833%	9.7500%	6.5000%
GaussianBlur	0.0000%	0.0000%	0.0000%	31.6667%	28.0833%	32.1667%
MedianBlur	0.0000%	0.0000%	0.0000%	5.9167%	4.0000%	4.1667%
Brightness	0.0000%	0.0000%	0.0000%	11.3333%	13.3333%	7.4167%
Contrast	0.0000%	0.0000%	0.0000%	11.5000%	14.9167%	9.7500%
Saturation	0.0000%	0.0000%	0.0000%	10.5000%	12.8333%	5.7500%
Hue	0.0000%	0.0000%	0.0000%	13.1667%	12.0000%	8.2500%
GaussianNoise	0.5000%	0.6000%	0.8167%	11.1667%	17.0833%	17.5833%
SPSA	0.4333%	1.1833%	1.4000%	22.0000%	24.2500%	26.0000%
StarGAN (blond)	0.2333%	0.4333%	0.1667%	12.9167%	12.9167%	12.1667%
StarGAN (male)	0.1333%	0.2000%	0.1500%	12.5000%	14.3333%	14.6667%
AttGAN (blond)	2.5667%	2.2667%	1.1000%	22.5000%	22.5000%	20.4167%
AttGAN (male)	1.1833%	0.7333%	0.5833%	22.7500%	20.7500%	16.3333%
CafeGAN (blond)	3.7500%	3.3333%	3.3667%	26.0833%	26.0833%	23.6667%
CafeGAN (male)	0.5333%	0.6500%	0.5333%	17.2500%	17.5000%	17.8333%
Average	0.5569%	0.5676%	0.4902%	15.3382%	16.2647%	14.1765%

Table 3. Watermark extraction accuracy (BER) on the images processed by StarGAN (blond).

Scheme (Dataset)	Identity	JpegTest	Resize	GaussianBlur	MedianBlur	Brightness	Contrast	Saturation	Hue	GaussianNoise	SPSA	Average
HumanFace
Ours (HumanFace)	0.5667%	2.6500%	1.4667%	1.0000%	1.0333%	0.7333%	1.2167%	0.6000%	0.6833%	4.8500%	13.6833%	2.5893%
SepMark (HumanFace)	0.8333%	3.1333%	1.5833%	1.0667%	1.0833%	1.0667%	1.3000%	0.9000%	0.9000%	5.6667%	12.0333%	2.6879%
CIN (HumanFace)	34.8500%	42.2267%	32.8167%	53.5000%	38.5000%	35.6667%	36.5167%	35.4000%	35.6500%	45.8000%	40.5167%	39.2221%
MBRS (HumanFace)	28.8333%	42.7500%	30.0500%	37.9833%	36.8667%	30.6667%	30.3667%	29.8000%	31.0333%	46.7333%	42.3167%	35.2182%
PIMOG (HumanFace)	23.9667%	42.3000%	22.5667%	25.1000%	24.8000%	28.7500%	27.4167%	25.2500%	26.1333%	41.9667%	44.1000%	30.2136%
FFHQ
Ours (FFHQ)	0.5000%	2.9333%	1.1500%	0.8333%	0.7667%	0.6000%	0.9500%	0.5333%	0.4667%	4.5833%	17.3333%	2.7864%
SepMark (FFHQ)	0.5833%	3.0667%	1.0167%	0.8333%	0.7000%	0.7000%	0.9000%	0.5667%	0.6000%	6.1500%	15.7667%	2.8076%
CIN (FFHQ)	33.6000%	37.6167%	47.8667%	46.7667%	40.8167%	34.5667%	35.6667%	33.8500%	34.1667%	46.9667%	41.9500%	39.4394%
MBRS (FFHQ)	25.6333%	40.9333%	27.0167%	36.0333%	34.6500%	27.5500%	27.9500%	26.3500%	28.1667%	47.0000%	42.2833%	33.0515%
PIMOG (FFHQ)	21.1500%	40.9000%	20.6167%	22.3167%	22.0833%	25.9333%	25.2000%	22.4333%	23.0333%	41.1833%	43.5667%	28.0379%
CelebAHQ
Ours (CelebAHQ)	0.4833%	3.1333%	1.2000%	0.9333%	0.7500%	0.7167%	1.4333%	0.4500%	0.7333%	6.1833%	17.7000%	3.0651%
SepMark (CelebAHQ)	0.5000%	3.3500%	1.0000%	0.8667%	0.8000%	0.6167%	1.1000%	0.4833%	0.4833%	6.1833%	16.9667%	2.9409%
CIN (CelebAHQ)	33.2333%	40.5833%	30.3500%	52.9833%	38.1500%	34.5167%	34.4833%	33.0000%	34.5000%	33.5333%	35.0000%	36.3939%
MBRS (CelebAHQ)	50.1000%	43.3333%	29.4333%	37.3333%	34.8333%	29.5000%	30.2999%	25.6500%	27.9000%	33.4167%	34.3667%	34.1970%
PIMOG (CelebAHQ)	22.1667%	39.9500%	21.0667%	23.2667%	22.7500%	26.9333%	25.7833%	23.5000%	23.5333%	35.7833%	33.8167%	27.1409%