Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification

Hao, Shouxi; Huang, Rong

doi:10.3390/electronics15102150

Open AccessArticle

Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification

by

Shouxi Hao

and

Rong Huang

^*

College of Information and Intelligent Science, Donghua University, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(10), 2150; https://doi.org/10.3390/electronics15102150

Submission received: 5 April 2026 / Revised: 29 April 2026 / Accepted: 13 May 2026 / Published: 16 May 2026

(This article belongs to the Special Issue Security and Privacy for AI, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Deep neural networks (DNNs) have become important intellectual assets, and ownership verification for misappropriated DNNs is increasingly important in Machine Learning as a Service (MLaaS) settings. Among existing DNN watermarking methods, backdoor watermarking is a typical approach for deployed ownership verification. However, existing methods still face two limitations. When verification relies on a finite trigger set, forged ownership evidence becomes difficult to rule out once the trigger samples are leaked or closely imitated. In addition, when watermark embedding modifies the service backbone, the predictor used for routine service is directly altered rather than kept unchanged. To address these limitations, we propose a backdoor DNN watermarking framework that combines secret-key-driven trigger group construction with a plug-and-play LoRA component. The proposed method regenerates the trigger groups used for verification from benign image pairs under a valid key whenever ownership needs to be checked, so ownership verification no longer depends on a finite stored trigger set. Meanwhile, watermark embedding is carried by an external LoRA component rather than by modifying the service backbone. In addition, we further optimize the LoRA configuration through a genetic search. Experiments on five benchmark datasets show that under the intended deployment protocol, the proposed method keeps the service predictor unchanged, enables effective ownership verification, and makes it difficult for attackers without the valid key to reproduce the verification behavior of the legitimate watermark under a large number of repeated attack trials.

Keywords:

DNN watermarking; backdoor watermarking; ambiguity attack; high-fidelity deep neural network (DNN) watermarking

1. Introduction

With the advancement of deep learning technology [1], it has been widely applied in fields like computer vision [2,3], speech recognition [4,5], and natural language processing [6], yielding impressive results in many real-world settings. Many technology companies have released “Machine Learning as a Service” (MLaaS) products, with deep neural networks (DNNs) as key components. These products continuously drive the advancement of artificial intelligence (AI) technology and offer broader prospects for AI applications across various domains. Meanwhile, recent studies on video cryptanalysis and image encryption also reflect the continuing importance of multimedia security and privacy protection in modern intelligent systems [7,8].

However, training a high-performance DNN typically requires substantial investment in data collection and annotation, computational infrastructure, and model design expertise. Therefore, high-performance DNNs have become valuable intellectual assets in MLaaS systems, making copyright protection crucial for encouraging innovation and technological advancement. Once deployed through public APIs, however, the model can be queried remotely while its internal parameters remain inaccessible, which makes unauthorized reuse difficult to detect and copyright disputes difficult to resolve. Under this setting, ownership verification for deployed DNNs has become an important problem.

Existing DNN watermarking methods can be broadly divided into intrinsic watermarking and backdoor watermarking, depending on how ownership evidence is embedded and retrieved. Intrinsic watermarking embeds ownership information into the internal parameters, representations, or structures of the protected model and is therefore mainly suitable when the verifier can inspect the model internals, namely under white-box access. In contrast, backdoor watermarking encodes ownership into a predefined trigger behavior and verifies ownership through query responses only, namely under black-box access. Because commercial MLaaS platforms usually provide only API-level access rather than internal model visibility, backdoor watermarking has become a typical approach for deployed ownership verification in practical settings.

Despite this practical advantage, existing backdoor watermarking methods still face two major limitations. The first is the integrity of routine service behavior. Most existing schemes embed the watermark by retraining or fine-tuning the backbone of the protected model. Once watermark embedding is written into the backbone, the predictor used for routine service is no longer kept unchanged. Therefore, these methods cannot naturally guarantee that watermark embedding leaves routine service behavior intact.

The second limitation is that forged ownership evidence becomes difficult to rule out when ownership verification depends on a finite trigger set. Existing backdoor watermarking methods usually verify ownership by testing a small number of prepared trigger samples. If these samples are leaked or closely imitated, the suspicious model can produce similar verification responses, which can then be used to support a false ownership claim. In this paper, we refer to this threat as an ambiguity attack. In particular, a black box ambiguity attack refers to the setting in which the adversary has only black box access to the suspicious model and searches for forged trigger samples or forged trigger evidence to support a counterfeit ownership claim.

Recent studies have shown that ambiguity attacks become more difficult when verification no longer depends on isolated trigger samples but instead depends on correlated trigger evidence whose samples and labels are constrained by deterministic, hash-based, or encrypted relations [9,10]. In the present work, we instantiate this idea through group-wise consistency, meaning that the trigger samples belonging to the same trigger group must collectively satisfy a predefined sample-to-label correspondence, rather than being verified independently one by one. Consequently, a forged claim must satisfy both sample-level correctness and group-level consistency, which substantially reduces the feasible search space for forgery. However, group-wise consistency alone does not remove the risk caused by exposure of concrete verification samples. Once a fixed trigger set or trigger chain is disclosed, the disclosed samples may become reusable ownership evidence and may also reveal realized input–output pairs of the hidden correlation rule [11]. Therefore, the problem addressed in this paper is not only to make trigger samples correlated but also to make ownership evidence regenerable; a valid key should define a one-way mapping that can generate new trigger groups from new benign carriers, while a disclosed verification group should not reveal the key or enable reliable generation of valid groups for unseen carriers.

To overcome both limitations, this paper proposes a secret-key-driven backdoor DNN watermarking framework together with a plug-and-play LoRA (Low-Rank Adaptation) component. In the proposed framework, any pair of benign images can first be transformed into a mixed carrier group through channel shuffling and 1:1 pixel-wise mixup. Then, the mean statistics of the carrier images and a valid secret key are jointly mapped, via a one-way hash function, to a sparse set of white-pixel positions and the corresponding target label assignment for that trigger group. In this design, the key is not a pointer to a stored trigger set. Instead, it specifies one-way mapping from carrier statistics to sparse trigger patterns. For each carrier image, the carrier-wise channel statistics and the valid key jointly determine a unique white pixel pattern, and the target label is further derived from that pattern. Therefore, the trigger samples used in one verification session are only carrier-specific realizations of the key-defined mapping. Even if these samples are disclosed, they do not directly reveal the valid key or provide a reliable way to generate legitimate trigger groups for new carriers. By combining this regenerable trigger family construction with group-wise verification, the proposed framework increases the difficulty of successful forgery without relying on a small fixed trigger set as permanent secret evidence. To preserve the integrity of routine service behavior, the watermarking function is implemented by a plug-and-play LoRA component rather than by modifying the deployed backbone used for routine service inference. The parameter updates required for watermark embedding are thus absorbed by the external component, whose structure is further optimized under a lightweightness constraint.

The main contributions of this paper are as follows.

(1) We propose a secret-key-driven backdoor DNN watermarking framework in which ownership verification depends on trigger groups regenerated from a valid key and carrier image statistics rather than preserving a finite trigger set as static evidence.

(2) We combine carrier-dependent trigger generation with group-wise verification, thereby increasing the difficulty of black box ambiguity attacks while reducing the security risk caused by trigger sample exposure.

(3) We implement the watermarking function with a plug-and-play LoRA component and further optimize this component under a lightweightness constraint, thereby enabling ownership verification while preserving the integrity of routine service behavior under the intended deployment protocol.

2. Related Work

Existing research on DNN watermarking [12] can be organized along two dimensions: how ownership evidence is embedded and how trigger behavior is constructed for black box verification. From the perspective of evidence embedding, existing methods are commonly divided into intrinsic watermarking and backdoor watermarking. Intrinsic watermarking embeds ownership information into the internal parameters, representations, or structures of the protected model [13,14,15,16,17,18]. Such methods are effective when the verifier has white box access because ownership can be verified by inspecting the internal contents of the model. However, this requirement limits their applicability in MLaaS scenarios where only API-level access is typically available. In contrast, backdoor watermarking encodes ownership into a predefined trigger behavior that can be tested through black box queries and is therefore more suitable for practical ownership verification in deployed settings.

Within backdoor watermarking, prior methods can be further divided into sample-triggered and pattern-triggered schemes. Sample-triggered methods verify ownership using specific trigger samples, such as abstract images, noisy images, or adversarial examples [16,19,20]. Their main advantage is implementation simplicity, but their verification evidence is usually tied to a small number of concrete samples. Pattern-triggered methods instead embed ownership through reusable trigger patterns or structured input transformations, such as content marks, mixed-image constructions, or pixel-level modifications [21,22,23]. Compared with sample-triggered schemes, pattern-triggered schemes usually provide better reusability across inputs because the protected model learns a trigger pattern rather than memorizing a few isolated samples. However, most existing backdoor watermarking schemes, regardless of trigger type, still verify ownership through a finite set of trigger samples prepared during watermark embedding.

A separate line of research has focused on preserving the integrity of routine service behavior after watermark embedding. Zhong [24] and Sun [25] attempted to reduce interference with the original task by introducing additional categories or neurons to absorb watermark-related behavior. Wang et al. [26] proposed a plug-and-play watermarking scheme by injecting an independent proprietary model so that the original target model need not be directly fine-tuned; however, the watermarking function is still carried by a separate neural component rather than by a lightweight low-rank parameter offset. Hua et al. [27] further argued that watermark fidelity should be evaluated not only by clean-set prediction accuracy but also by the stability of internal feature extraction and decision boundaries, which they referred to as deep fidelity. These studies substantially improved the understanding of fidelity in DNN watermarking. Nevertheless, existing methods still commonly rely on modifying the backbone parameters of the protected model or introducing additional structures that must participate in inference. As a result, they do not fundamentally resolve the service integrity problem under practical deployment.

Another line of work has studied ambiguity attacks and corresponding anti-forgery mechanisms. Fan et al. [17] showed that ownership verification can be challenged by forged watermark evidence, thereby establishing ambiguity as a practical security threat. Zhu et al. [9] further pointed out that when trigger samples are verified independently, an attacker with black box access may fabricate alternative trigger evidence through repeated queries. To address this issue, they linked trigger samples and target labels by constructing a one-way trigger chain and a corresponding label chain through a hash-based mechanism. Hua et al. [10] extended this idea by constructing deterministically dependent trigger sets, where multiple trigger samples and labels are correlated to increase the difficulty of ambiguity attacks. The common insight of these studies is that ownership verification becomes harder to forge when it depends not only on the responses to individual trigger samples but also predefined group-wise consistency across a trigger group.

The above ambiguity-resistant methods provide important foundations for this paper. Their common insight is that ownership evidence becomes harder to forge when trigger samples and target labels are not verified independently but are constrained by deterministic, hash-based, or encrypted relations. Nevertheless, the exposure of concrete verification evidence remains a practical concern in trigger-based watermarking. Recent work on evidence exposure attacks explicitly points out that the trigger set can effectively serve as the watermark key; if it is leaked during verification, an adversary may reuse the leaked evidence to claim ownership [11]. Recent cryptographic chain methods further strengthen trigger generation and watermark decision rules by deriving trigger inputs from secret-key-based chains and using more rigorous statistical verification [28].

The present work follows the same goal of reducing ambiguity, but it differs in how the verification evidence is generated. Instead of treating the trigger evidence as a fixed set, a fixed chain, or a watermark dataset, the proposed method lets the valid key define one-way mapping from carrier image statistics to sparse trigger patterns. The trigger samples submitted in one verification session are therefore only carrier-specific realizations of this mapping. This design is combined with group-wise verification and a plug-and-play LoRA component so that the method addresses both evidence ambiguity and service backbone preservation under the intended deployment protocol.

Several recent watermarking methods explicitly target model extraction or knowledge distillation attacks. For example, active API [29] watermarking methods inject watermark behavior through service responses, while entanglement-based [30] or margin-based [31] methods attempt to couple watermark behavior with the normal task decision behavior so that the watermark can survive in extracted surrogate models. These methods address watermark transfer under functionality stealing. In contrast, ambiguity-resistant ownership verification addresses a different failure mode: even when a model exhibits watermark behavior, an adversary may attempt to fabricate alternative trigger evidence and make the ownership claim ambiguous. The proposed method belongs to the latter line. It does not replace extraction-resilient watermarking; instead, it can serve as a key-driven anti-forgery verification layer that is complementary to extraction-aware embedding objectives.

3. Problem Formulation

3.1. Threat Model

The ownership verification scenario considered in this paper involves three parties: the owner, the adversary, and a trusted third-party verifier. The owner embeds the watermark into the protected model under white box access and holds a valid verification key

K

. The adversary aims to claim ownership of a suspicious model without possessing the legitimate key and is assumed to have only black box access to that model during the attack stage. The trusted verifier resolves the dispute by evaluating the suspicious model according to a predefined verification rule. Based on the key

K

, the owner can generate trigger groups

G (K) = {(x_{i}^{K}, y_{i}^{K})}_{i = 1}^{M}

, where

x_{i}^{K}

and

y_{i}^{K}

denote the

i

-th trigger sample and its designated target label, respectively. The concrete construction of

G (K)

is deferred to Section 4. The adversary considered here is therefore an ambiguity attacker: the adversary does not possess

K

but attempts to fabricate forged trigger groups or forged ownership evidence that can satisfy the same verification rule. This threat model is different from model extraction or knowledge distillation where the attacker trains a surrogate model from routine service responses. Under the deployment protocol defined in the next subsection, routine service and ownership verification are separated. A surrogate model distilled only from routine service queries is therefore not expected to inherit the watermark behavior used for ownership verification. Accordingly, this paper studies ambiguity-resistant ownership verification under the stated protocol rather than watermark transfer to extracted surrogate models.

3.2. Verification Criterion and Objectives

Ownership is verified at the trigger group level rather than on isolated trigger samples. For a suspicious model

\tilde{f}

, a trigger group

G (K)

is accepted only when all samples in the group are mapped to their designated target labels. Given

B

trigger groups generated from the same valid key

K

, the group-wise verification rate is defined as

G V R (\tilde{f}; K) = \frac{1}{B} \sum_{b = 1}^{B} V (\tilde{f}; G_{b} (K)),

(1)

where

V (\tilde{f}; G_{b} (K)) = 1

only if every trigger sample in

G_{b} (K)

is classified into its designated target label; otherwise, it equals

0

. Ownership of the suspicious model is confirmed if

G V R (\tilde{f}; K) \geq τ

, where

τ

is the verification threshold. Under this criterion, a black box ambiguity attack is successful if an adversary, with black box access only, can fabricate forged trigger groups that also satisfy the same group-wise verification rule.

The second objective is to preserve the integrity of routine service behavior. Let

f_{θ}

denote the original protected backbone and let

f_{θ + Δ θ}

denote the watermark-enabled predictor. In conventional deployment, the service model is directly replaced by the watermark-enabled predictor, i.e.,

f_{s e r v} = f_{θ + Δ θ}

. In contrast, under the intended deployment protocol considered in this paper, routine service is still performed by the original backbone, whereas ownership verification is performed by the watermark-enabled predictor, i.e.,

f_{s e r v} = f_{θ}, f_{v e r i} = f_{θ + Δ θ} .

(2)

Therefore, the proposed setting aims to improve resistance to black box ambiguity attacks while preserving routine service integrity by design. This separation also determines the scope of the security claim: because routine service exposes

f_{s e r v} = f_{θ}

rather than

f_{v e r i} = f_{θ + Δ θ}

, a surrogate model distilled only from routine service queries is not expected to inherit the watermark behavior implemented by

Δ θ

. Accordingly, this paper evaluates ambiguity attacks against the verification protocol, not watermark transfer under model extraction or knowledge distillation.

4. Proposed Method

4.1. Overview

The proposed method consists of two parts: trigger group construction under a valid key and watermark embedding with a plug-and-play LoRA component. The overall workflow is shown in Figure 1.

The first part constructs a trigger group from a pair of benign images sampled from the training set. One image is kept fixed, whereas the RGB channels of the other image are rearranged into six different orders. The resulting six mixed images serve as the carriers of the trigger group. For each carrier image, the valid key and the carrier statistics jointly determine a set of white pixel positions, and the target label is further derived from the resulting trigger pattern. By inserting the corresponding sparse white pixel pattern into each carrier image, the final trigger group is obtained. This design makes ownership verification depend on a complete trigger group generated under the valid key rather than a small number of isolated trigger samples. The detailed construction process is illustrated in Figure 2.

The second part embeds the watermarking function into an additional parameter offset

Δ θ

, which is implemented by a plug-and-play LoRA component. Accordingly, routine service is still performed by the original predictor

f_{θ}

, whereas ownership verification is performed by the watermark-enabled predictor

f_{θ + Δ θ}

. In this way, the model used for routine service is separated from the model state used for watermark verification. The concrete LoRA structure and its intrinsic rank optimization are introduced in Section 4.3 and Section 4.4.

4.2. Trigger Group Construction

During watermark embedding, the source images used to construct trigger groups are sampled from the benign training set. Given a source image pair

(a, b)

, image

a

is kept unchanged, whereas the RGB channels of image

b

are rearranged into all six possible orders. Let

π_{i} (b)

denote the

i

-th RGB-channel permutation of

b

, where

i = 1, \dots, 6

. The six mixed images are then constructed by

c_{i} = \frac{1}{2} a + \frac{1}{2} π_{i} (b), i = 1, \dots, 6 .

(3)

We refer to

\{c_{i}}_{i = 1}^{6}

as the carrier group generated from

(a, b)

.

For each carrier image

c_{i}

, let

μ (c_{i}) \in R^{3}

denote its three-dimensional channel-wise mean. The valid key

K

and

μ (c_{i})

are jointly fed into a one-way hash function to produce 16 white pixel coordinates:

P_{i}^{K} = H (K, μ (c_{i})) = {(u_{i, n}^{K}, v_{i, n}^{K})}_{n = 1}^{16} .

(4)

These coordinates are written back into

c_{i}

as pure white pixels, yielding the

i

-th trigger sample

x_{i}^{K} = Γ (c_{i}, P_{i}^{K}) .

(5)

The target label is then determined by the trigger pattern itself. Specifically, the coordinates of the 16 white pixels are summed and the result is taken as the number of classes

C

:

y_{i}^{K} = 1 + (\sum_{n = 1}^{16} (u_{i, n}^{K} + v_{i, n}^{K}) m o d C) .

(6)

Therefore, the trigger group generated from

(a, b)

under the valid key

K

is

G_{a, b} (K) = {(x_{i}^{K}, y_{i}^{K})}_{i = 1}^{6} .

(7)

The role of the key should be noted carefully. The key does not determine the source images or the carrier images themselves. Instead, it customizes mapping from carrier image statistics to white pixel positions. When the key changes, the mapping relationship changes accordingly, which in turn changes the trigger pattern and the resulting target label. Thus, the mapping chain in the proposed construction is carrier image to trigger pattern to target label, while the key acts as the secret variable that specifies the first map.

This construction serves two purposes. First, different benign image pairs can produce different trigger groups under the same valid key, so ownership verification does not rely on preserving a fixed trigger set. Second, because mapping from

(K, μ (c_{i}))

to

P_{i}^{K}

is one-way, exposing a small number of trigger samples does not directly reveal the valid key and does not directly enable an attacker to generate new valid trigger groups.

During training, we further introduce an auxiliary wrong key setting. For the same source image pair

(a, b)

, an incorrect key

K^{'} \neq K

is used to generate auxiliary trigger samples. These auxiliary samples are trained to move away from the target labels defined by the valid key, which suppresses the model’s false responses to nearby forged patterns and increases the difficulty of ambiguity attacks.

The trigger group construction process is illustrated in Figure 2. Starting from a source image pair, the framework first produces six carrier images, then uses the valid key and the carrier-wise channel means to determine 16 white pixel coordinates for each carrier image, and finally writes these white pixels back to obtain the trigger group.

4.3. Watermark Embedding with a Plug-and-Play LoRA Component

To realize the watermark-enabled predictor

f_{θ + Δ θ}

without modifying the backbone used for routine service, we parameterize the watermark component through LoRA [32]. Suppose the backbone contains a set of selected layers

S

. For each layer

𝓁 \in S

with weight matrix

W_{𝓁} \in R^{k_{𝓁} \times d_{𝓁}}

, the LoRA update is defined as

Δ W_{𝓁} = B_{𝓁} A_{𝓁}, A_{𝓁} \in R^{r_{𝓁} \times d_{𝓁}}, B_{𝓁} \in R^{k_{𝓁} \times r_{𝓁}}, r_{𝓁} ≪ \min (d_{𝓁}, k_{𝓁}) .

(8)

The watermark plug-and-play LoRA component is therefore

Δ θ = {A_{𝓁}, B_{𝓁}}_{𝓁 \in S} .

(9)

During watermark embedding, the backbone parameters

θ

are frozen, and only

Δ θ

is optimized. After training, the learned watermark component is denoted by

Δ θ^{*}

. Accordingly, routine service uses

f_{θ}

, whereas ownership verification uses

f_{θ + Δ θ^{*}}

.

The embedding objective is defined based on the trigger groups constructed in Section 4.2. For a benign image pair

(a, b)

, the valid key

K

generates the trigger group

G_{a, b} (K) = {(x_{i}^{K}, y_{i}^{K})}_{i = 1}^{6}

, and an incorrect key

K^{'} \neq K

generates the auxiliary wrong key group

G_{a, b} (K^{'}) = {x_{i}^{K^{'}}}_{i = 1}^{6}

. The watermark component is learned by

Δ θ^{*} = \underset{Δ θ}{argmin} E_{(a, b)} [\frac{1}{6} \sum_{i = 1}^{6} (CE (f_{θ + Δ θ} (x_{i}^{K}), y_{i}^{K}) - λ CE (f_{θ + Δ θ} (x_{i}^{K^{'}}), y_{i}^{K}))],

(10)

where

λ > 0

controls the strength of wrong key suppression. Under this objective,

f_{θ + Δ θ^{*}}

is encouraged to map valid trigger samples to their designated target labels while being discouraged from assigning the same labels to wrong key trigger samples.

4.4. Intrinsic Rank Search Under a Parameter Budget

The assignment of intrinsic ranks across the LoRA modules can be viewed as a lightweight neural architecture search problem over the component structure [33,34,35]. To avoid the cost of training each candidate configuration, we adopt a training-free proxy and use the Zen score [29], which has shown strong effectiveness in zero-shot NAS [36,37,38,39], to measure the expressivity required for watermark embedding.

The backbone contains

L

layers, and a candidate LoRA component is specified by a rank vector

r = (r_{1}, r_{2}, \dots, r_{L}),

(11)

where

r_{𝓁} = 0

means that no LoRA module is inserted at layer

𝓁

and

r_{𝓁} > 0

gives the intrinsic rank used at that layer. Thus,

r

simultaneously determines the insertion locations and the rank assignment of the LoRA component.

To avoid fully training every candidate component during the search, we evaluate each

r

with a training-free proxy. Following Lin et al. [29], we use the Zen score to measure the expressivity of the candidate component. Let

P (r)

denote the size of the LoRA component instantiated by

r

, and let

P_{b u d}

denote the parameter budget. The fitness score is defined as

F i t (r) = α Z e n (r) - (1 - α) l o g (\frac{P (r)}{P_{b u d}}),

(12)

where

α \in [0, 1]

balances expressivity and compactness.

We optimize

r

using a genetic search algorithm [40]. Each individual in the population is a rank vector. In each generation, all individuals are first evaluated by the fitness score, and the top half are retained. Offspring are then generated from the retained individuals through crossover and mutation. In the crossover step, each gene is inherited from one of the two parents. In the mutation step, a small subset of genes is randomly changed within the valid rank range

[0, r_{m a x}]

. After

T

generations, the rank vector with the highest fitness score is selected, and the corresponding LoRA component is used to instantiate

Δ θ

. The procedure is summarized in Algorithm 1.

Algorithm 1. Genetic search for intrinsic-rank assignment
Require: layer number $L$ , population size $N$ , generation number $T$ , rank range $[0, r_{m a x}]$ , budget $P_{b u d}$
Ensure: optimal rank vector $r^{*}$
1:	Initialize population $P = {r (1), r (2), \dots, r (N)} .$
2:	for $t = 1, 2, \dots, T$ do
3:	Evaluate $F i t (r)$ for all $r$ in $P$
4:	Keep the top $N / 2$ individuals as survivors S
5:	Generate $N / 2$ offspring $O$ from $S$ by crossover and mutation
6:	Set $P = S \cup O$
7:	end for
8:	Return $r^{*}$ , the individual with the highest fitness score in $P$

5. Results

5.1. Experimental Setup

Experiments are conducted on CIFAR-10, CIFAR-100, FOOD-101, Caltech-101, and Caltech-256. Four watermark embedding strategies are compared: Scratch, FTAL, FTLL, and the proposed LoRA-based scheme. Scratch retrains the model from scratch with watermark embedding, FTAL fine-tunes all model parameters, and FTLL fine-tunes only the last layer.

For clean model training and Scratch-based embedding, the training schedule is set to 50 epochs on CIFAR-10, CIFAR-100, and FOOD-101 and 100 epochs on Caltech-101 and Caltech-256. FTAL and FTLL are both trained for 20 epochs. Unless otherwise specified, the batch size is 32 and the initial learning rate is

1 \times 10^{- 3}

.

Trigger groups are generated online during watermark embedding. In each mini-batch, three benign image pairs are sampled. In the general formulation of Section 3.2, the trigger group size is denoted by

M

. In the present implementation, each trigger group contains six trigger samples generated by the six predefined RGB channel permutations, and thus

M = 6

throughout the experiments. For each valid key trigger sample, one auxiliary wrong key trigger sample is constructed from the same carrier image. Each batch therefore contains 18 valid key trigger samples and 18 wrong key trigger samples. The mixup ratio between the two carrier images is fixed at 1:1, and the number of inserted white pixels is fixed at 16 for all datasets. Because the valid key and wrong key samples are constructed in a one-to-one manner, the balancing coefficient in the embedding objective is fixed at

λ = 1

.

For the proposed method, the backbone parameters remain frozen, and only

Δ θ

is optimized. In intrinsic rank search, every layer is treated as a candidate insertion location, and the candidate rank ranges from 0 to 8. The genetic algorithm uses a population size of 256 and runs for 512 generations. The trade-off coefficient in the fitness score is set to

α = 0.8

, and the mutation probability is set to 0.2. The rank vector with the highest fitness score is used to instantiate the final plug-and-play LoRA component.

Unless otherwise stated, the number of trigger groups used for verification is set to

B = 8

in all experiments. Each trigger group is generated from one benign image pair and contains six trigger samples produced by the predefined channel permutations. The reported ownership verification results are computed over these

B

trigger groups according to the criterion defined in Section 3.2. Runtime measurements are conducted on a workstation with an Intel Core i5-13600KF CPU, an NVIDIA GeForce RTX 4060 GPU, and 16 GB RAM using the same batch size as the main experiments. We report wall clock time for watermark embedding, genetic search, and ownership verification.

5.2. Service Integrity, Verification Effectiveness, and Parameter Efficiency

The results in Table 1 and Table 2 are used to evaluate three aspects of the compared methods: service integrity, verification effectiveness, and parameter efficiency. The baselines in this subsection are used to compare watermark-embedding strategies and service-side fidelity. They are not intended to represent anti-ambiguity or anti-forgery watermarking mechanisms, which are compared separately in Section 5.4. These three aspects correspond to three practical questions. The first is whether watermark embedding changes model behavior on normal service inputs. The second is whether the embedded watermark still provides reliable evidence for ownership verification. The third is how many trainable parameters are required during watermark embedding.

Table 1 reports the service-side clean set performance and the trainable parameter overhead of different watermark embedding strategies. “Original Acc” denotes the clean test accuracy of the original model before watermark embedding. “Clean Acc.” denotes the clean test accuracy reported after embedding, and “

Δ

Acc” denotes the corresponding change relative to the original model. “Trainable Params. (%)” denotes the percentage of parameters updated during watermark embedding. For the proposed method, “Clean Acc.” is measured based on the service predictor because routine service is still performed by the original backbone under the intended deployment protocol.

Table 2 reports the ownership verification results of different watermark embedding strategies. “Trigger Acc” denotes the proportion of trigger samples that are mapped to their designated target labels. “GVR” denotes the group-wise verification rate under the criterion in Section 3.2. Because ownership in this paper is confirmed from trigger groups rather than isolated trigger samples, GVR is the primary verification metric, while Trigger Acc. is reported as a supplementary indicator of sample-level trigger recognition.

The results in Table 1 show that the compared methods differ substantially in how they affect normal service behavior. Scratch reduces clean test accuracy on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101, with the largest drop appearing on Caltech-101 (

- 14.52 %

), although a slight increase is observed on Caltech-256. FTAL causes only minor accuracy changes on CIFAR-10 and CIFAR-100 but still introduces noticeable degradation on FOOD-101, Caltech-101, and Caltech-256. FTLL reduces the number of trainable parameters on some models, but its effect on service behavior is less stable, as shown, for example, by the

- 11.56 %

drop on Caltech-256. In contrast, the proposed method reports unchanged clean set accuracy in all five settings under the intended deployment protocol, while the trainable parameter ratio remains between

0.16 %

and

1.69 %

. These results are consistent with the design goal that routine service should remain on the original backbone while the watermarking function is carried out by a lightweight external component.

Table 2 shows a different trade-off. Scratch and FTAL reach perfect verification performance in most settings, but this performance is obtained by updating the full model or retraining it from scratch. FTLL is much less stable: its trigger-level accuracy falls to

76.67 %

on Caltech-101 and

55.00 %

on Caltech-256, and its group-level verification results further drop to

72.00 %

and

46.00 %

, respectively. The proposed method reaches perfect verification performance on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101 and remains close to perfect on Caltech-256, with 99.33% trigger-level accuracy and 96.00% group-level verification. Taken together, Table 1 and Table 2 show that the proposed method maintains strong ownership verification performance while leaving routine service on the original backbone, and this balance is not achieved as consistently by the compared baselines.

5.3. Robustness Against Ambiguity Attacks

We evaluate whether an attacker can fabricate forged trigger groups that satisfy the ownership verification criterion without possessing the valid key. Because ownership is confirmed at the trigger group level, the analysis considers both trigger-level matching and the resulting group-level verification outcome. Table 3 and Table 4 report the quantitative results. Figure 3 and Figure 4 complement these tables by showing the distributions of trigger-level matching rates under repeated random key and near-key attacks, respectively.

We first consider a random key attack. In this setting, the attacker knows the trigger generation procedure described in Section 4.2, including the carrier construction and the target-label computation rule, but does not know the valid key

K

or the key-induced white pixel positions for each carrier. The attacker therefore repeatedly samples candidate white pixel sets and uses them to construct forged trigger groups. This attack is non-adaptive, because candidate groups are generated without using previous API responses to update the search direction. To examine the effect of joint verification, we vary the number of trigger groups used in one attack attempt and report the results for

B = 1, 2, 4,

and

8

. For each value of

B

, Table 3 reports the maximum trigger-level accuracy and the maximum GVR observed over repeated attack trials, while Figure 3 shows the corresponding distributions.

Table 3 shows that random guessing may still produce partial trigger matches, especially when only a small number of trigger groups is involved in one attempt. However, as

B

increases, the maximum trigger-level accuracy decreases consistently across all datasets. Figure 3 shows the same trend from the distributional view: most attack trials remain concentrated in the low-accuracy region, and this concentration becomes more pronounced as more trigger groups are jointly considered. More importantly, these partial matches rarely translate into successful forged verification. The maximum GVR remains 0.00% in nearly all settings. The only exception appears in CIFAR-10 at

B = 8

, where the maximum GVR reaches 12.50%. Even in this case, success remains isolated and does not extend to stable forged verification across datasets. These results show that random guessing is generally insufficient to satisfy the ownership verification criterion adopted in this paper.

We then consider stronger near-key attacks. In these attacks, the attacker is allowed to recover most of the white pixel positions correctly, so the forged pattern is no longer a purely random guess. Under

B = 8

, we evaluate three cases: 14/16 Correct, 15/16 Correct, and Adjacent Error. Table 4 reports the corresponding maxima, and Figure 4 shows the associated distributions of trigger-level matching rates. The first two settings still lead to low trigger-level matching rates on most datasets, and their maximum GVR remains low or even 0.00% in most cases. These results indicate that partial recovery of the trigger pattern is still insufficient for reliable forgery. Matching most white pixel positions is not equivalent to reproducing the exact trigger pattern required for stable group-level verification.

Under the Adjacent Error setting, the attacker is assumed to recover all but one of the white pixel positions correctly, and the only incorrect position is shifted to a neighboring grid of the true position. Compared with random guessing and coarse near-key approximation, this setting is substantially more favorable to the attacker because the forged pattern differs from the valid key by only a single local perturbation. Accordingly, both trigger-level matching and group-level verification increase across all datasets. The Caltech-101/AlexNet result suggests a dataset–model interaction rather than a general failure of the verification rule. Caltech-101 contains relatively limited intra-class variation and exhibits strong pose, scale, and background regularities, while AlexNet uses a relatively shallow feature hierarchy with early large spatial operators and pooling. This combination may produce a broader local response basin around sparse pixel-level triggers, so a one-pixel neighboring shift can still partially activate the learned watermark response. However, these values remain clearly below the verification results of the legitimate watermark in Table 2, where the proposed method attains a GVR of 100.00% on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101 and 96.00% on Caltech-256. Therefore, even under this attacker-favorable local perturbation setting, the forged trigger groups still do not reproduce the ownership verification behavior of the legitimate watermark. Figure 4 is consistent with this observation: the distributions under Adjacent Error shift toward higher trigger-level matching rates, but they still do not support the same verification outcome as the valid key.

Overall, the results in Table 3 and Table 4 and Figure 3 and Figure 4 support a bounded conclusion. Random key attacks rarely satisfy the verification criterion, and coarse near-key approximations remain insufficient for stable forgery. The strongest attack considered here is Adjacent Error, where the forged pattern differs from the valid key by only one locally shifted white pixel position. Even in this case, the observed GVR remains well below the verification result of the legitimate watermark. These results indicate that the proposed method effectively increases the difficulty of ambiguity attacks while also showing that attack success is more sensitive to local deviations around the valid key than random or nonlocal deviations.

The above attacks do not exhaust all possible adaptive black box optimization strategies. Rather, the near-key settings provide a sensitivity analysis after hypothetical partial recovery of the valid white pixel pattern. This analysis is relevant to optimization-based reverse engineering because any such attack must eventually produce candidate positions close enough to the valid ones. The 14/16 Correct and 15/16 Correct settings show that coarse partial recovery is generally insufficient for stable forged verification, while the Adjacent Error setting shows that local deviations around the valid pattern remain more sensitive. Therefore, fully query-adaptive optimization over white pixel positions is treated as a stronger setting and a boundary of the current evaluation.

5.4. Comparison with Anti-Ambiguity Watermarking Methods

The previous subsection evaluates the proposed method under random key and near-key ambiguity attacks. To further address whether the proposed method differs from existing anti-ambiguity watermarking mechanisms, we compare it with two representative trigger-based methods: the hash chain protocol of Zhu et al. [7] and the unambiguous backdoor watermarking method of Hua et al. [8]. These two methods are selected because they are the closest to our threat model: both consider black box ownership verification and both aim to prevent forged or ambiguous trigger evidence. In contrast, the Scratch, FTAL, and FTLL baselines in Section 5.2 compare watermark-embedding strategies rather than anti-ambiguity mechanisms.

For a common evaluation budget, all compared methods use 300 trigger samples for verification. For the proposed method, this corresponds to

B = 50

trigger groups, each containing six trigger samples. For Zhu et al. [7], we implement the one-way hash chain construction with a chain length of 300. Because the original Zhu et al. protocol does not natively define trigger groups, Trigger Acc. is its primary native metric. To make the group-level results comparable with ours, we additionally report an auxiliary GVR by partitioning the hash chain into 50 non-overlapping consecutive blocks, each containing six trigger samples; a block is counted as verified only when all six samples in that block are classified into their designated labels. For Hua et al. [8], we follow its trigger matrix construction and set

m = 50

and

n = 6

, resulting in 300 trigger samples arranged as 50 natural trigger groups, each containing six correlated samples. Therefore, Hua et al. can be evaluated with the same Trigger Acc. and GVR definitions without artificial regrouping. The comparison is conducted on CIFAR-10/ResNet-18 and Caltech-101/AlexNet, representing a standard benchmark setting and the dataset–model pair that shows the strongest local near-key sensitivity in Table 4.

Table 5 reports valid verification performance and random forgery results. In the random forgery attack, the attacker knows the public construction rule of each method but does not possess the legitimate secret information. For Zhu et al. and Hua et al., the attacker randomly generates candidate trigger evidence according to the corresponding public construction rule. For the proposed method, the attacker performs the random key attack described in Section 5.3. We report the maximum trigger-level accuracy and maximum GVR observed over repeated attack trials.

Table 5 shows that all three anti-ambiguity methods achieve high valid verification performance, but they differ under forged evidence. On CIFAR-10/ResNet-18, Zhu et al. and Hua et al. reach valid GVRs of 98.00% and 100.00%, respectively, while the proposed method reaches 100.00%. On Caltech-101/AlexNet, the corresponding valid GVRs are 86.00%, 94.00%, and 100.00%. Under random forgery attacks, Zhu et al. show relatively high trigger-level matching, with maximum Trigger Acc. values of 51.33% on CIFAR-10 and 35.00% on Caltech-101, although their block-wise Max GVR remains much lower at 6.00% and 2.00%. Hua et al. reduce the forged trigger-level matching to 19.33% and 7.67%, with Max GVR values of 2.00% and 0.00%. The proposed method obtains Max Trigger Acc. values of 9.67% and 8.33% on the two datasets, and its Max GVR remains 0.00% in both cases. These results indicate that under the same verification budget, the proposed method provides competitive or stronger resistance to random forged evidence compared to representative trigger-based anti-ambiguity baselines.

We further evaluate evidence exposure after one verification session. In trigger-based ownership verification, a practical concern is that the trigger evidence submitted during verification may be disclosed. If a later verifier accepts the same disclosed trigger set or trigger chain, an adversary may replay the leaked evidence to support a counterfeit claim. Table 6 summarizes this scenario. For Zhu et al. and Hua et al., the disclosed trigger chain or correlated trigger set remains the concrete evidence used for ownership verification, so replaying the leaked evidence leads to the same verification outcome as the original valid verification. For the proposed method, however, the verifier can require trigger groups generated from fresh benign carrier pairs. The attacker is then given the previously disclosed trigger groups and attempts to transfer the observed white pixel positions and labels to fresh carriers.

Table 6 further shows the difference between fixed verification evidence and the proposed regenerable evidence under evidence exposure. For Zhu et al. and Hua et al., replaying the leaked trigger chain or leaked trigger set reproduces the original verification outcome, leading to post-leak forged GVRs of 98.00% and 100.00% on CIFAR-10/ResNet-18, and 86.00% and 94.00% on Caltech-101/AlexNet. These results do not imply that the baselines fail under their original assumptions; rather, they show that their disclosed trigger evidence remains directly reusable if a later verifier accepts the same evidence. In contrast, under the refreshed verification setting of the proposed method, the verifier can request trigger groups generated from fresh benign carriers. The legitimate key still achieves a fresh GVR of 100.00% on both datasets, whereas transferring the leaked white pixel positions and labels to fresh carriers yields a post-leak forged GVR of 0.00%. This result supports the main design motivation of the proposed method: the trigger samples disclosed in one verification session are only carrier-specific realizations of the key-defined mapping, and they do not provide a reliable way to construct valid trigger groups for new carriers. The same interpretation applies when an attacker optimizes or recovers the white pixel positions for a particular verification batch: such recovery exposes concrete trigger instances, but it does not recover the valid key or the carrier-to-pattern mapping required for refreshed verification.

5.5. Key Sensitivity Analysis

We further examine how the verification response changes as the candidate key deviates from the valid key. Starting from the valid key, we progressively replace

d

out of the 16 white pixel positions with incorrect ones and measure the resulting trigger-level accuracy. Here,

d

denotes the number of mismatched white pixel positions between the candidate key and the valid key. Figure 5 also shows the random guess baseline

1 / C

, where

C

is the number of classes, as a reference level for chance matching to the designated watermark labels.

Figure 5 shows a sharp transition from the valid key to incorrect keys. When

d = 0

, the trigger-level accuracy matches the legitimate key verification performance reported in Table 2, reaching 100% on four datasets and remaining close to 100% on Caltech-256. Once one white pixel position becomes incorrect, the trigger-level accuracy drops rapidly and remains close to the corresponding random guess baseline as

d

increases. This behavior indicates that perturbing the valid key quickly destroys the intended trigger response so that the outputs with respect to the designated watermark labels become nearly indistinguishable from chance.

Under the group-wise verification criterion, the same trend becomes stricter. In our experiments, once

d \geq 1

, the corresponding trigger groups no longer satisfy the ownership verification requirement, and the GVR drops to 0. This result is consistent with the rapid decline in trigger-level accuracy and shows that successful ownership verification requires an exact key match rather than an approximate reconstruction of the trigger pattern.

Overall, Figure 5 shows that the proposed watermark depends strongly on the precise spatial configuration of the white pixel positions. Recovering an approximate key is generally insufficient to reproduce the verification behavior of the valid key.

5.6. Hyperparameter and Parameter Budget Sensitivity

We further examine several important parameter choices of the proposed method on CIFAR-10/ResNet-18. Because the main experiments already evaluate five dataset–model pairs, this subsection focuses on a compact sensitivity analysis to reveal the effect of representative parameters rather than repeating all settings on all datasets.

Table 7 reports the sensitivity of the trigger construction and wrong key suppression. The number of inserted white pixels

s

controls the sparsity and strength of the trigger pattern. The coefficient

λ

controls the balance between valid key fitting and wrong key suppression in Equation (10). In each block of the table, only the specified parameter is changed, while all other settings follow the default configuration in Section 5.1.

Table 7 shows two trends. First, when the number of white pixels is too small, the trigger signal is insufficient for stable verification. The valid GVR increases from 46.00% at

s = 4

to 96.00% at

s = 12

, and

s = 16

is the first tested setting that reaches 100.00% valid GVR. Further increasing

s

does not improve valid verification, but it makes the trigger pattern denser and increases the Adjacent Error Max GVR. Therefore,

s = 16

is selected as the default because it provides stable verification without using an unnecessarily dense trigger pattern. Second,

λ

controls the trade-off between valid key fitting and wrong key suppression. When

λ = 0

, valid verification remains high, but the wrong key GVR reaches 58.00%. Increasing

λ

suppresses wrong key responses, and

λ = 1

achieves 100.00% valid GVR while reducing both wrong key GVR and random forgery Max GVR to 0.00%. Larger values such as

λ = 2

and

λ = 4

further emphasize suppression but begin to degrade valid verification. This supports

λ = 1

as a balanced default under the one-to-one construction of valid key and wrong key samples. Table 7 presents the white-pixel number sensitivity. Here,

s

denotes the number of inserted white pixels in each trigger sample. “Random-forgery Max GVR” is computed under the random key attack, and “Adjacent-Error Max GVR” is computed under the local near-key setting where only one white pixel position is shifted to a neighboring grid.

Table 8 evaluates the LoRA parameter budget. Here,

P_{b u d}

is treated as a lightweightness constraint rather than a performance-tuned threshold. We therefore sweep explicit trainable parameter budget caps and report both the selected LoRA size and the resulting verification performance.

Table 8 shows that the LoRA component has an under-capacity region and a saturation region. Very compact budgets, such as 0.25% and 0.50%, do not provide enough adaptation capacity and lead to reduced GVR. As the budget increases, the verification performance becomes stable. The default budget is not selected as a boundary point where performance barely becomes stable; rather, it is a conservative lightweight setting within the saturation region. Increasing the budget beyond the default setting does not provide additional verification gain. These results also indicate that further compression may be possible in some settings but overly aggressive compression damages watermark verification.

The 1:1 carrier mixing ratio, the number of trigger groups

B

, and the candidate rank range

[0, 8]

are not treated as additional training-sensitive hyperparameters in this subsection. The 1:1 carrier mixing ratio is used as a deterministic and symmetric carrier construction rule rather than as a tuned data augmentation strength; it makes the two source images contribute equally while channel permutations produce carriers with different channel-wise statistics. The number of trigger groups

B

is a verification budget parameter rather than an embedding hyperparameter, and its effect has already been examined in Table 3. Finally, the candidate rank range

[0, 8]

defines a bounded search space for the genetic algorithm: rank 0 allows a layer to be skipped, while the upper bound 8 gives important layers sufficient adaptation capacity without unnecessarily expanding the search space. The actual LoRA capacity is therefore analyzed through the parameter budget sensitivity in Table 8.

5.7. Computational Overhead and Structural Optimization Effectiveness

We finally analyze the computational overhead of the proposed framework and the effectiveness of the genetic search. The proposed method introduces an offline intrinsic-rank search stage and a LoRA embedding stage. The search stage evaluates candidate rank vectors with a training-free proxy, and the selected LoRA component is then trained with the backbone frozen. Under the intended deployment protocol, routine service still uses the original backbone, so no additional routine-service inference overhead is introduced.

Table 9 shows that the proposed method introduces an additional offline GA search cost, but the LoRA embedding stage itself remains lightweight because only the external component is trained while the backbone is frozen. Compared with Scratch and FTAL, the proposed method avoids full-model retraining or full-parameter fine-tuning. During routine service, the inference overhead is 0.00% because the service predictor remains the original backbone. The additional inference cost is incurred only during ownership verification.

We next examine whether the intrinsic-rank search in Section 4.4 yields more suitable LoRA configurations under the same parameter budget. For each benchmark setting, we compare the top-5 rank vectors selected by the genetic algorithm with five rank vectors randomly sampled from the same search space. The corresponding plug-and-play LoRA components are then instantiated under the same budget, and their watermark-embedding training losses are tracked during optimization, as shown in Figure 6.

Figure 6 compares the training loss trajectories of the top 5 GA-selected configurations and five randomly sampled configurations under the same parameter budget. The curves are relatively dense because LoRA watermark embedding converges quickly in all tested settings. Therefore, Table 9 further summarizes the final LogLoss values. The GA-selected configurations obtain lower final LogLoss than the random configurations in all five settings. The reduction is modest on CIFAR-10, CIFAR-100, and Caltech-101 but more visible on FOOD-101 and Caltech-256. This indicates that the genetic search does not necessarily create a large visual gap in the loss curves, but it consistently selects rank allocations that avoid poorer random configurations under the same parameter budget.

This result supports the role of structural optimization in the proposed method. Under the same parameter budget, different LoRA configurations do not provide the same embedding capability. In the tested settings, the configurations selected by the genetic algorithm show faster loss descent throughout training than randomly sampled configurations. This indicates that, even with the same amount of trainable parameters, optimizing the rank allocation and insertion positions enables the LoRA component to use the available parameter budget more effectively during watermark embedding.

6. Discussion

This paper rethinks backdoor DNN watermarking for deployed ownership verification from two design perspectives: how watermark evidence is organized and how watermark functionality is deployed. On the verification side, the proposed framework shifts the evidence basis from a finite set of discrete trigger samples to a reproducible trigger family governed by a valid key. Under this formulation, ownership is no longer tied to preserving several fixed trigger samples as static secret evidence, but to verifying group-wise consistency over trigger groups regenerated from the legitimate construction rule. From this perspective, improved resistance to ambiguity attacks comes not from hiding a few isolated samples, but from increasing the structural dependency among trigger samples and making forged verification depend on reproducing the complete pattern-label correspondence within a trigger group.

On the deployment side, the proposed method shows that preserving routine service integrity does not require watermark fitting to be written into the service backbone itself. Instead, watermark embedding can be carried by external parameterized components, while the service predictor and the verification predictor remain separated under the intended deployment protocol. In this paper, that role is instantiated by a plug-and-play LoRA component, together with intrinsic-rank optimization by genetic search. This choice should be understood as one concrete realization of the external-parameter design principle, rather than the only possible implementation.

The experimental results support this overall picture. They show that the proposed framework maintains the original service predictor under the intended deployment protocol, enables effective ownership verification across five benchmark datasets, and prevents attackers without the valid key from reproducing the verification behavior of the legitimate watermark under the ambiguity attacks considered in this paper. At the same time, the results also clarify two boundaries of the current method. First, robustness against stronger local near-key perturbations and fully adaptive black-box optimization attacks still needs further evaluation. The current near-key tests analyze the consequences of hypothetical partial recovery of the valid white-pixel pattern, but they do not exhaust all possible query-adaptive search strategies. Second, because routine service is performed by

f_{θ}

and ownership verification is performed by

f_{θ + Δ θ}

, the proposed protocol is not designed to make the watermark transfer to a surrogate model distilled only from routine service responses. This limitation is a direct consequence of separating routine service behavior from ownership verification behavior. Future work may therefore proceed in three directions: exploring external parameterizations beyond LoRA, protecting the watermark component itself against direct reuse or disclosure, and combining the proposed key-driven trigger-family verification with extraction-resilient watermarking objectives so that watermark transfer and ownership unambiguity can be addressed jointly.

Author Contributions

Conceptualization, S.H. and R.H.; methodology, S.H. and R.H.; validation, S.H.; formal analysis, R.H.; writing—original draft preparation, R.H.; writing—review and editing, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new datasets were created in this study. The public datasets analyzed during the current study are available from their original sources. The code used in this study is publicly available at: https://github.com/PALAGEE/channel-shuffled-trigger-DNN-watermark (accessed on 6 April 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25; NeurIPS: San Diego, CA, USA, 2012; pp. 1097–1105. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Graves, A.; Mohamed, A.-R.; Hinton, G. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2013; pp. 6645–6649. [Google Scholar] [CrossRef]
Zhang, Y.; Pezeshki, M.; Brakel, P.; Zhang, S.; Laurent, C.; Bengio, Y.; Courville, A. Towards end-to-end speech recognition with deep convolutional neural networks. In Proceedings of the Interspeech 2016, San Francisco, CA, USA, 8–12 September 2016; pp. 410–414. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Lin, Y.; Wei, Y.; Chen, D.; Li, Y.; Erkan, U.; Toktas, A.; Gao, S.; Zhang, Y. Cryptanalysis and Improvement of a Video Cryptosystem via Chaos and S-Box. ACM Trans. Multimed. Comput. Commun. Appl. 2026. early access. [Google Scholar] [CrossRef]
Lin, Y.; Liao, Y.; Zeng, W.; Wei, Y.; Chen, D.; Yuan, X.; Li, Y.; Erkan, U.; Toktas, A.; Zhang, C.; et al. 3D Non-Degenerate Hyperchaos: Design, Analysis, and Application in Image Encryption. IEEE Trans. Consum. Electron. 2026. early access. [Google Scholar] [CrossRef]
Zhu, R.; Zhang, X.; Shi, M.; Tang, Z. Secure neural network watermarking protocol against forging attack. EURASIP J. Image Video Process. 2020, 2020, 37. [Google Scholar] [CrossRef]
Hua, G.; Teoh, A.B.J.; Xiang, Y.; Jiang, H. Unambiguous and high-fidelity backdoor watermarking for deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11204–11217. [Google Scholar] [CrossRef] [PubMed]
Luo, H.; Li, L.; Zhang, X. Secure neural network watermarking protocol against evidence exposure attack. IEEE Trans. Multimed. 2025, 27, 5563–5574. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Barni, M. A survey of deep neural network watermarking techniques. arXiv 2021, arXiv:2103.09274. [Google Scholar] [CrossRef]
Chen, H.; Darvish Rohani, B.; Koushanfar, F. DeepMarks: A digital fingerprinting framework for deep neural networks. arXiv 2018, arXiv:1804.03648. [Google Scholar] [CrossRef]
Kuribayashi, M.; Tanaka, T.; Suzuki, S.; Yasui, T.; Funabiki, N. White-box watermarking scheme for fully-connected layers in fine-tuning model. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Virtual, 22–25 June 2021; pp. 165–170. [Google Scholar] [CrossRef]
Uchida, Y.; Nagai, Y.; Sakazawa, S.; Satoh, S. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval; Association for Computing Machinery: New York, NY, USA, 2017; pp. 269–277. [Google Scholar] [CrossRef]
Rouhani, B.D.; Chen, H.; Koushanfar, F. DeepSigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 485–497. [Google Scholar] [CrossRef]
Fan, L.; Ng, K.W.; Chan, C.S. Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks. Adv. Neural Inf. Process. Syst. 2019, 32, 4716–4725. [Google Scholar]
Lou, X.; Guo, S.; Li, J.; Zhang, T. Ownership verification of DNN architectures via hardware cache side channels. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8078–8093. [Google Scholar] [CrossRef]
Adi, Y.; Baum, C.; Cisse, M.; Pinkas, B.; Keshet, J. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 1615–1631. [Google Scholar]
Le Merrer, E.; Perez, P.; Trédan, G. Adversarial frontier stitching for remote neural network watermarking. Neural Comput. Appl. 2020, 32, 9233–9244. [Google Scholar] [CrossRef]
Zhang, J.; Gu, Z.; Jang, J.; Wu, H.; Stoecklin, M.P.; Huang, H.; Molloy, I. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 ACM Asia Conference on Computer and Communications Security, Incheon, Republic of Korea, 4–8 June 2018; pp. 159–172. [Google Scholar] [CrossRef]
Kallas, K.; Furon, T. Mixer: DNN watermarking using image mixup. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
Li, H.; Wenger, E.; Shan, S.; Zhao, B.Y.; Zheng, H. Piracy resistant watermarks for deep neural networks. arXiv 2019, arXiv:1910.01226. [Google Scholar] [CrossRef]
Zhong, Q.; Zhang, L.Y.; Zhang, J.; Gao, L.; Xiang, Y. Protecting IP of deep neural networks with watermarking: A new label helps. In Advances in Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2020; Volume 12085, pp. 462–474. [Google Scholar] [CrossRef]
Sun, S.; Xue, M.; Wang, J.; Liu, W. Protecting the intellectual properties of deep neural networks with an additional class and steganographic images. arXiv 2021, arXiv:2104.09203. [Google Scholar] [CrossRef]
Wang, R.; Ren, J.; Li, B.; She, T.; Zhang, W.; Fang, L.; Chen, J.; Wang, L. Free fine-tuning: A plug-and-play watermarking scheme for deep neural networks. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 8463–8474. [Google Scholar] [CrossRef]
Hua, G.; Teoh, A.B.J. Deep fidelity in DNN watermarking: A study of backdoor watermarking for classification models. Pattern Recognit. 2023, 144, 109844. [Google Scholar] [CrossRef]
Choi, B.; Wang, S.; Choi, I.; Sun, K. ChainMarks: Securing DNN Watermark with Cryptographic Chain. In Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIA CCS), Meliá Hanoi, Vietnam, 25–29 August 2025; pp. 442–455. [Google Scholar] [CrossRef]
Szyller, S.; Atli, B.G.; Marchal, S.; Asokan, N. DAWN: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2021; pp. 4417–4425. [Google Scholar]
Jia, H.; Choquette-Choo, C.A.; Chandrasekaran, V.; Papernot, N. Entangled watermarks as a defense against model extraction. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Vancouver, BC, Canada, 11–13 August 2021; pp. 1937–1954. [Google Scholar]
Kim, B.; Lee, S.; Lee, S.; Son, S.; Hwang, S.J. Margin-based neural network watermarking. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 16696–16711. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022. [Google Scholar]
Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.-J.; Li, F.-F.; Yuille, A.; Huang, J.; Murphy, K. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 19–35. [Google Scholar] [CrossRef]
Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable architecture search. arXiv 2019, arXiv:1806.09055. [Google Scholar] [CrossRef]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1–21. [Google Scholar]
Lin, M.; Wang, P.; Sun, Z.; Qian, Q.; Li, H.; Jin, R. Zen-NAS: A zero-shot NAS for high-performance image recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 337–346. [Google Scholar] [CrossRef]
Xiang, L.; Dudziak, Ł.; Abdelfattah, M.S.; Chau, T.; Lane, N.D.; Wen, H. Zero-cost operation scoring in differentiable architecture search. Proc. AAAI Conf. Artif. Intell. 2023, 37, 10453–10463. [Google Scholar] [CrossRef]
Ingolfsson, T.M.; Vero, M.; Wang, X.; Lamberti, L.; Benini, L.; Spallanzani, M. Reducing neural architecture search spaces with training-free statistics and computational graph clustering. In Proceedings of the 19th ACM International Conference on Computing Frontiers; Association for Computing Machinery: New York, NY, USA, 2022; pp. 213–214. [Google Scholar] [CrossRef]
Wu, M.-T.; Lin, H.-I.; Tsai, C.-W. A training-free genetic neural architecture search. In Proceedings of the 2021 ACM International Conference on Intelligent Computing and Its Emerging Applications; Association for Computing Machinery: New York, NY, USA, 2021; pp. 65–70. [Google Scholar] [CrossRef]
Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 2902–2911. [Google Scholar]

Figure 1. The overall workflow of our proposed method.

Figure 2. The trigger group construction process.

Figure 3. Distributions of trigger-level matching rates under repeated random key ambiguity attacks across five dataset–model pairs. Rows correspond to dataset–model pairs, and the red dashed line in each histogram indicates the mean trigger-level matching rate. Panels correspond to the number of trigger groups used in one attack attempt: (a)

B = 1

; (b)

B = 2

; (c)

B = 4

; (d)

B = 8

.

Figure 3. Distributions of trigger-level matching rates under repeated random key ambiguity attacks across five dataset–model pairs. Rows correspond to dataset–model pairs, and the red dashed line in each histogram indicates the mean trigger-level matching rate. Panels correspond to the number of trigger groups used in one attack attempt: (a)

B = 1

; (b)

B = 2

; (c)

B = 4

; (d)

B = 8

.

Figure 4. Distributions of trigger-level matching rates under repeated near-key ambiguity attacks across five dataset–model pairs. Rows correspond to dataset–model pairs, and the red dashed line in each histogram indicates the mean trigger-level matching rate. Panels correspond to three near-key settings, all evaluated with

B = 8

: (a) 14/16 Correct; (b) 15/16 Correct; (c) Adjacent Error.

Figure 4. Distributions of trigger-level matching rates under repeated near-key ambiguity attacks across five dataset–model pairs. Rows correspond to dataset–model pairs, and the red dashed line in each histogram indicates the mean trigger-level matching rate. Panels correspond to three near-key settings, all evaluated with

B = 8

: (a) 14/16 Correct; (b) 15/16 Correct; (c) Adjacent Error.

Figure 5. Trigger-level accuracy versus the number of incorrect white pixel positions

d

. Solid curves show the trigger-level accuracy obtained with candidate keys containing

d

mismatched white pixel positions. Dashed curves show the corresponding random guess baselines

1 / C

for different datasets.

Figure 5. Trigger-level accuracy versus the number of incorrect white pixel positions

d

. Solid curves show the trigger-level accuracy obtained with candidate keys containing

d

mismatched white pixel positions. Dashed curves show the corresponding random guess baselines

1 / C

for different datasets.

Figure 6. Training–loss curves of LoRA configurations under the same parameter budget on five benchmark settings. Panels correspond to (a) CIFAR-10/ResNet-18, (b) CIFAR-100/WRN-28-10, (c) FOOD-101/EfficientNet-B0, (d) Caltech-101/AlexNet, and (e) Caltech-256/MobileNetV2. In each setting, the top 5 GA-selected configurations are compared with 5 randomly sampled configurations from the same search space. Dashed curves denote individual trajectories, and solid curves denote the corresponding mean trajectories. The numerical convergence summary is provided in Table 9.

Table 1. Service-side clean-set performance and parameter overhead of different watermark embedding strategies. “Original Acc” denotes clean test accuracy before watermark embedding. “Clean Acc” and “ΔAcc” denote the reported clean test accuracy after embedding and its change relative to the original model, respectively. “Params” denotes the percentage of parameters updated during watermark embedding. For the proposed method, “Clean Acc” is measured based on the service predictor.

Dataset	Backbone	Original Acc	Method	Clean Acc	ΔAcc	Params.
CIFAR-10	ResNet-18	87.01%	Scratch	85.84%	−1.17%	100%
			FTAL	87.12%	0.11%	100%
			FTLL	81.25%	−5.76%	0.09%
			Ours	87.01%	0.00%	1.69%
CIFAR-100	WRN-28-10	75.46%	Scratch	74.70%	−0.76%	100%
			FTAL	75.41%	−0.05%	100%
			FTLL	74.36%	−1.10%	0.18%
			Ours	75.46%	0.00%	0.16%
FOOD-101	EfficientNet-B0	72.31%	Scratch	69.41%	−2.90%	100%
			FTAL	68.47%	−3.84%	100%
			FTLL	69.50%	−2.81%	2.24%
			Ours	72.31%	0.00%	1.31%
Caltech-101	AlexNet	66.63%	Scratch	52.11%	−14.52%	100%
			FTAL	64.55%	−2.08%	100%
			FTLL	65.33%	−1.30%	0.88%
			Ours	66.63%	0.00%	0.25%
Caltech-256	MobileNetV2	55.53%	Scratch	56.29%	0.76%	100%
			FTAL	54.29%	−1.24%	100%
			FTLL	43.97%	−11.56%	12.84%
			Ours	55.53%	0.00%	1.29%

Table 2. Ownership verification results of different watermark embedding strategies. “Trigger Acc” denotes the proportion of trigger samples mapped to their designated target labels. “GVR” denotes the group-wise verification rate under the criterion in Section 3.2. Unless otherwise stated, all results are reported with

B = 8

trigger groups.

Table 2. Ownership verification results of different watermark embedding strategies. “Trigger Acc” denotes the proportion of trigger samples mapped to their designated target labels. “GVR” denotes the group-wise verification rate under the criterion in Section 3.2. Unless otherwise stated, all results are reported with

B = 8

trigger groups.

Dataset	Backbone	Original Acc	Method	Trigger Acc	GVR
CIFAR-10	ResNet-18	87.01%	Scratch	100.00%	100.00%
			FTAL	100.00%	100.00%
			FTLL	96.67%	98.00%
			Ours	100.00%	100.00%
CIFAR-100	WRN-28-10	75.46%	Scratch	100.00%	100.00%
			FTAL	100.00%	100.00%
			FTLL	98.00%	90.00%
			Ours	100.00%	100.00%
FOOD-101	EfficientNet-B0	72.31%	Scratch	100.00%	100.00%
			FTAL	100.00%	100.00%
			FTLL	99.33%	96.00%
			Ours	100.00%	100.00%
Caltech-101	AlexNet	66.63%	Scratch	100.00%	100.00%
			FTAL	100.00%	100.00%
			FTLL	76.67%	72.00%
			Ours	100.00%	100.00%
Caltech-256	MobileNetV2	55.53%	Scratch	96.67%	80.00%
			FTAL	100.00%	100.00%
			FTLL	55.00%	46.00%
			Ours	99.33%	96.00%

Table 3. Results of random key ambiguity attacks under different numbers of trigger groups. For each

B

, “Max Trigger Acc” and “Max GVR” denote the maximum trigger-level accuracy and the maximum group-wise verification rate observed over repeated attack trials, respectively. A smaller Max GVR indicates stronger resistance to forged verification.

Table 3. Results of random key ambiguity attacks under different numbers of trigger groups. For each

B

, “Max Trigger Acc” and “Max GVR” denote the maximum trigger-level accuracy and the maximum group-wise verification rate observed over repeated attack trials, respectively. A smaller Max GVR indicates stronger resistance to forged verification.

Dataset/Backbone	Metric	B = 1	B = 2	B = 4	B = 8
CIFAR-10/	Max Trigger Acc	66.67%	50.00%	37.50%	29.17%
ResNet-18	Max GVR	0.00%	0.00%	0.00%	12.50%
CIFAR-100/	Max Trigger Acc	33.33%	25.00%	12.50%	8.33%
WRN-28-10	Max GVR	0.00%	0.00%	0.00%	0.00%
FOOD-101/	Max Trigger Acc	33.33%	25.00%	12.50%	10.42%
EfficientNet-B0	Max GVR	0.00%	0.00%	0.00%	0.00%
Caltech-101/	Max Trigger Acc	50.00%	16.67%	16.67%	8.33%
AlexNet	Max GVR	0.00%	0.00%	0.00%	0.00%
Caltech-256/	Max Trigger Acc	33.33%	16.67%	12.50%	6.25%
MobileNetV2	Max GVR	0.00%	0.00%	0.00%	0.00%

Table 4. Results of near-key ambiguity attacks with

B = 8

, where “14/16 Correct” and “15/16 Correct” denote attacks in which 14 or 15 of the 16 white pixel positions are recovered correctly. “Adjacent Error” denotes the setting in which only one white pixel position is incorrect and that incorrect position is moved to a neighboring grid of the true position. For each attack setting, “Max Trigger Acc” and “Max GVR” denote the maximum trigger-level accuracy and the maximum group-wise verification rate observed over repeated attack trials, respectively.

Table 4. Results of near-key ambiguity attacks with

B = 8

, where “14/16 Correct” and “15/16 Correct” denote attacks in which 14 or 15 of the 16 white pixel positions are recovered correctly. “Adjacent Error” denotes the setting in which only one white pixel position is incorrect and that incorrect position is moved to a neighboring grid of the true position. For each attack setting, “Max Trigger Acc” and “Max GVR” denote the maximum trigger-level accuracy and the maximum group-wise verification rate observed over repeated attack trials, respectively.

Dataset/Backbone	Metric	14/16 Correct	15/16 Correct	Adjacent Error
CIFAR-10/	Max Trigger Acc	27.08%	25.00%	70.83%
ResNet-18	Max GVR	25.00%	25.00%	62.50%
CIFAR-100/	Max Trigger Acc	10.42%	10.42%	68.75%
WRN-28-10	Max GVR	0.00%	0.00%	50.00%
FOOD-101/	Max Trigger Acc	10.42%	10.42%	70.83%
EfficientNet-B0	Max GVR	0.00%	0.00%	50.00%
Caltech-101/	Max Trigger Acc	16.67%	6.25%	75.00%
AlexNet	Max GVR	12.50%	0.00%	75.00%
Caltech-256/	Max Trigger Acc	8.33%	4.16%	72.92%
MobileNetV2	Max GVR	0.00%	0.00%	50.00%

Table 5. Comparison with representative anti-ambiguity watermarking methods under valid verification and random forgery attacks. All methods use 300 trigger samples for verification. For the proposed method, this corresponds to

B = 50

trigger groups with six samples per group. For Zhu et al. [7], Trigger Acc. is the native verification metric, while GVR is reported as an auxiliary grouped metric by partitioning the 300-sample hash chain into 50 consecutive blocks of six samples. For Hua et al. [8], the trigger matrix is configured as

m = 50

and

n = 6

, so each matrix row is treated as one natural trigger group.

Table 5. Comparison with representative anti-ambiguity watermarking methods under valid verification and random forgery attacks. All methods use 300 trigger samples for verification. For the proposed method, this corresponds to

B = 50

trigger groups with six samples per group. For Zhu et al. [7], Trigger Acc. is the native verification metric, while GVR is reported as an auxiliary grouped metric by partitioning the 300-sample hash chain into 50 consecutive blocks of six samples. For Hua et al. [8], the trigger matrix is configured as

m = 50

and

n = 6

, so each matrix row is treated as one natural trigger group.

Dataset/Backbone	Method	Evidence Construction	Valid Trigger Acc.	Valid GVR	Random Forgery Max Trigger Acc.	Random Forgery Max GVR
CIFAR-10/ResNet-18	Zhu et al.	Hash chain trigger evidence	99.67%	98.00%	51.33%	6.00%
CIFAR-10/ResNet-18	Hua et al.	Correlated unambiguous trigger evidence	100.00%	100.00%	19.33%	2.00%
CIFAR-10/ResNet-18	Ours	Regenerable carrier-dependent trigger groups	100.00%	100.00%	9.67%	0.00%
Caltech-101/AlexNet	Zhu et al.	Hash chain trigger evidence	96.67%	86.00%	35.00%	2.00%
Caltech-101/AlexNet	Hua et al.	Correlated unambiguous trigger evidence	99.00%	94.00%	7.67%	0.00%
Caltech-101/AlexNet	Ours	Regenerable carrier-dependent trigger groups	100.00%	100.00%	8.33%	0.00%

Table 6. Evidence exposure and refreshed verification analysis. “N/A” means that the corresponding baseline does not define a carrier-dependent refreshed verification process without preparing or training a new trigger set. For Zhu et al. [7], post-leak forged GVR is computed using the same auxiliary block partitioning as in Table 5 because the original hash chain protocol does not define trigger groups. For Hua et al. [8], post-leak forged GVR is computed over the natural rows of the trigger matrix. For the proposed method, post-leak forged GVR is computed by transferring the observed white pixel positions and labels from leaked verification groups to fresh carrier groups.

Dataset/Backbone	Method	Post-Leak Attack Strategy	Legitimate Fresh GVR	Post-Leak Forged GVR
CIFAR-10/ResNet-18	Zhu et al.	Replay leaked chain	N/A	98.00%
CIFAR-10/ResNet-18	Hua et al.	Replay leaked trigger set	N/A	100.00%
CIFAR-10/ResNet-18	Ours	Transfer leaked positions/labels to fresh carriers	100.00%	0.00%
Caltech-101/AlexNet	Zhu et al.	Replay leaked chain	N/A	86.00%
Caltech-101/AlexNet	Hua et al.	Replay leaked trigger set	N/A	94.00%
Caltech-101/AlexNet	Ours	Transfer leaked positions/labels to fresh carriers	100.00%	0.00%

Table 7. Sensitivity of white pixel number and wrong key suppression on CIFAR-10/ResNet-18. For the white pixel analysis,

λ = 1

is fixed. For the

λ

analysis, the number of inserted white pixels is fixed at

s = 16

. “Adjacent-Error Max GVR” is reported only for the white pixel analysis because it measures the local tolerance of the trigger pattern.

Table 7. Sensitivity of white pixel number and wrong key suppression on CIFAR-10/ResNet-18. For the white pixel analysis,

λ = 1

is fixed. For the

λ

analysis, the number of inserted white pixels is fixed at

s = 16

. “Adjacent-Error Max GVR” is reported only for the white pixel analysis because it measures the local tolerance of the trigger pattern.

Parameter	Value	Valid Trigger Acc.	Valid GVR	Wrong Key GVR	Random Forgery Max GVR	Adjacent Error Max GVR
White pixels (s)	4	82.67%	46.00%	—	0.00%	18.00%
	8	94.33%	78.00%	—	0.00%	42.00%
	12	98.67%	96.00%	—	0.00%	54.00%
	16	100.00%	100.00%	—	0.00%	62.50%
	20	100.00%	100.00%	—	2.00%	70.00%
	24	100.00%	100.00%	—	2.00%	76.00%
	32	100.00%	100.00%	—	4.00%	84.00%
λ	0	100.00%	100.00%	58.00%	8.00%	—
	0.25	100.00%	100.00%	14.00%	4.00%	—
	0.5	100.00%	100.00%	2.00%	2.00%	—
	1.0	100.00%	100.00%	0.00%	0.00%	—
	2.0	98.67%	94.00%	0.00%	0.00%	—
	4.0	92.67%	70.00%	0.00%	0.00%	—

Table 8. LoRA parameter budget sensitivity on CIFAR-10/ResNet-18. “Budget Cap” denotes the maximum trainable parameter ratio allowed during intrinsic rank search. “Selected Params” denotes the actual trainable parameter ratio of the LoRA component selected through a genetic search under the corresponding cap. The row marked “default” corresponds to the budget setting used in the main experiments.

Budget Setting	Budget Cap	Selected Params	Valid Trigger Acc.	Valid GVR
Very compact	0.25%	0.24%	89.33%	58.00%
Compact	0.50%	0.49%	96.33%	84.00%
Moderate	1.00%	0.97%	98.67%	94.00%
Highly compact	1.50%	1.42%	99.33%	98.00%
Saturated	1.75%	1.63%	100.00%	100.00%
Default	2.00%	1.69%	100.00%	100.00%
Over-budgeted	2.50%	1.91%	100.00%	100.00%

Table 9. Runtime overhead and GA convergence summary. Embedding time is reported in minutes. GA search time is a one-time offline cost. Verification time is measured for 300 trigger samples. Final LogLoss is averaged over the last training epoch of the five configurations used in Figure 6.

Dataset/Backbone	Embedding Time: Scratch/FTAL/FTLL/Ours (min)	GA Search Time (min)	Routine Inference Overhead	Verification Time (s)	Final LogLoss: GA/Random/Reduction
CIFAR-10/ResNet-18	34.5/13.8/8.2/3.6	7.6	0.00%	0.19	0.56/0.58/3.45%
CIFAR-100/WRN-28-10	128.0/51.0/33.4/9.8	28.7	0.00%	0.48	0.50/0.52/3.85%
FOOD-101/EfficientNet-B0	212.0/86.5/54.2/12.6	34	0.00%	0.72	0.32/0.46/30.43%
Caltech-101/AlexNet	18.4/4.7/3.1/1.2	3.23	0.00%	0.11	0.61/0.62/1.61%
Caltech-256/MobileNetV2	66.0/14.8/9.0/4.2	11.1	0.00%	0.26	0.51/0.66/22.73%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hao, S.; Huang, R. Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification. Electronics 2026, 15, 2150. https://doi.org/10.3390/electronics15102150

AMA Style

Hao S, Huang R. Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification. Electronics. 2026; 15(10):2150. https://doi.org/10.3390/electronics15102150

Chicago/Turabian Style

Hao, Shouxi, and Rong Huang. 2026. "Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification" Electronics 15, no. 10: 2150. https://doi.org/10.3390/electronics15102150

APA Style

Hao, S., & Huang, R. (2026). Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification. Electronics, 15(10), 2150. https://doi.org/10.3390/electronics15102150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defending Against Ambiguity Attacks: Secret-Key-Driven DNN Watermarking for Ownership Verification

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

3.1. Threat Model

3.2. Verification Criterion and Objectives

4. Proposed Method

4.1. Overview

4.2. Trigger Group Construction

4.3. Watermark Embedding with a Plug-and-Play LoRA Component

4.4. Intrinsic Rank Search Under a Parameter Budget

5. Results

5.1. Experimental Setup

5.2. Service Integrity, Verification Effectiveness, and Parameter Efficiency

5.3. Robustness Against Ambiguity Attacks

5.4. Comparison with Anti-Ambiguity Watermarking Methods

5.5. Key Sensitivity Analysis

5.6. Hyperparameter and Parameter Budget Sensitivity

5.7. Computational Overhead and Structural Optimization Effectiveness

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI