1. Introduction
With the advancement of deep learning technology [
1], it has been widely applied in fields like computer vision [
2,
3], speech recognition [
4,
5], and natural language processing [
6], yielding impressive results in many real-world settings. Many technology companies have released “Machine Learning as a Service” (MLaaS) products, with deep neural networks (DNNs) as key components. These products continuously drive the advancement of artificial intelligence (AI) technology and offer broader prospects for AI applications across various domains. Meanwhile, recent studies on video cryptanalysis and image encryption also reflect the continuing importance of multimedia security and privacy protection in modern intelligent systems [
7,
8].
However, training a high-performance DNN typically requires substantial investment in data collection and annotation, computational infrastructure, and model design expertise. Therefore, high-performance DNNs have become valuable intellectual assets in MLaaS systems, making copyright protection crucial for encouraging innovation and technological advancement. Once deployed through public APIs, however, the model can be queried remotely while its internal parameters remain inaccessible, which makes unauthorized reuse difficult to detect and copyright disputes difficult to resolve. Under this setting, ownership verification for deployed DNNs has become an important problem.
Existing DNN watermarking methods can be broadly divided into intrinsic watermarking and backdoor watermarking, depending on how ownership evidence is embedded and retrieved. Intrinsic watermarking embeds ownership information into the internal parameters, representations, or structures of the protected model and is therefore mainly suitable when the verifier can inspect the model internals, namely under white-box access. In contrast, backdoor watermarking encodes ownership into a predefined trigger behavior and verifies ownership through query responses only, namely under black-box access. Because commercial MLaaS platforms usually provide only API-level access rather than internal model visibility, backdoor watermarking has become a typical approach for deployed ownership verification in practical settings.
Despite this practical advantage, existing backdoor watermarking methods still face two major limitations. The first is the integrity of routine service behavior. Most existing schemes embed the watermark by retraining or fine-tuning the backbone of the protected model. Once watermark embedding is written into the backbone, the predictor used for routine service is no longer kept unchanged. Therefore, these methods cannot naturally guarantee that watermark embedding leaves routine service behavior intact.
The second limitation is that forged ownership evidence becomes difficult to rule out when ownership verification depends on a finite trigger set. Existing backdoor watermarking methods usually verify ownership by testing a small number of prepared trigger samples. If these samples are leaked or closely imitated, the suspicious model can produce similar verification responses, which can then be used to support a false ownership claim. In this paper, we refer to this threat as an ambiguity attack. In particular, a black box ambiguity attack refers to the setting in which the adversary has only black box access to the suspicious model and searches for forged trigger samples or forged trigger evidence to support a counterfeit ownership claim.
Recent studies have shown that ambiguity attacks become more difficult when verification no longer depends on isolated trigger samples but instead depends on correlated trigger evidence whose samples and labels are constrained by deterministic, hash-based, or encrypted relations [
9,
10]. In the present work, we instantiate this idea through group-wise consistency, meaning that the trigger samples belonging to the same trigger group must collectively satisfy a predefined sample-to-label correspondence, rather than being verified independently one by one. Consequently, a forged claim must satisfy both sample-level correctness and group-level consistency, which substantially reduces the feasible search space for forgery. However, group-wise consistency alone does not remove the risk caused by exposure of concrete verification samples. Once a fixed trigger set or trigger chain is disclosed, the disclosed samples may become reusable ownership evidence and may also reveal realized input–output pairs of the hidden correlation rule [
11]. Therefore, the problem addressed in this paper is not only to make trigger samples correlated but also to make ownership evidence regenerable; a valid key should define a one-way mapping that can generate new trigger groups from new benign carriers, while a disclosed verification group should not reveal the key or enable reliable generation of valid groups for unseen carriers.
To overcome both limitations, this paper proposes a secret-key-driven backdoor DNN watermarking framework together with a plug-and-play LoRA (Low-Rank Adaptation) component. In the proposed framework, any pair of benign images can first be transformed into a mixed carrier group through channel shuffling and 1:1 pixel-wise mixup. Then, the mean statistics of the carrier images and a valid secret key are jointly mapped, via a one-way hash function, to a sparse set of white-pixel positions and the corresponding target label assignment for that trigger group. In this design, the key is not a pointer to a stored trigger set. Instead, it specifies one-way mapping from carrier statistics to sparse trigger patterns. For each carrier image, the carrier-wise channel statistics and the valid key jointly determine a unique white pixel pattern, and the target label is further derived from that pattern. Therefore, the trigger samples used in one verification session are only carrier-specific realizations of the key-defined mapping. Even if these samples are disclosed, they do not directly reveal the valid key or provide a reliable way to generate legitimate trigger groups for new carriers. By combining this regenerable trigger family construction with group-wise verification, the proposed framework increases the difficulty of successful forgery without relying on a small fixed trigger set as permanent secret evidence. To preserve the integrity of routine service behavior, the watermarking function is implemented by a plug-and-play LoRA component rather than by modifying the deployed backbone used for routine service inference. The parameter updates required for watermark embedding are thus absorbed by the external component, whose structure is further optimized under a lightweightness constraint.
The main contributions of this paper are as follows.
(1) We propose a secret-key-driven backdoor DNN watermarking framework in which ownership verification depends on trigger groups regenerated from a valid key and carrier image statistics rather than preserving a finite trigger set as static evidence.
(2) We combine carrier-dependent trigger generation with group-wise verification, thereby increasing the difficulty of black box ambiguity attacks while reducing the security risk caused by trigger sample exposure.
(3) We implement the watermarking function with a plug-and-play LoRA component and further optimize this component under a lightweightness constraint, thereby enabling ownership verification while preserving the integrity of routine service behavior under the intended deployment protocol.
2. Related Work
Existing research on DNN watermarking [
12] can be organized along two dimensions: how ownership evidence is embedded and how trigger behavior is constructed for black box verification. From the perspective of evidence embedding, existing methods are commonly divided into intrinsic watermarking and backdoor watermarking. Intrinsic watermarking embeds ownership information into the internal parameters, representations, or structures of the protected model [
13,
14,
15,
16,
17,
18]. Such methods are effective when the verifier has white box access because ownership can be verified by inspecting the internal contents of the model. However, this requirement limits their applicability in MLaaS scenarios where only API-level access is typically available. In contrast, backdoor watermarking encodes ownership into a predefined trigger behavior that can be tested through black box queries and is therefore more suitable for practical ownership verification in deployed settings.
Within backdoor watermarking, prior methods can be further divided into sample-triggered and pattern-triggered schemes. Sample-triggered methods verify ownership using specific trigger samples, such as abstract images, noisy images, or adversarial examples [
16,
19,
20]. Their main advantage is implementation simplicity, but their verification evidence is usually tied to a small number of concrete samples. Pattern-triggered methods instead embed ownership through reusable trigger patterns or structured input transformations, such as content marks, mixed-image constructions, or pixel-level modifications [
21,
22,
23]. Compared with sample-triggered schemes, pattern-triggered schemes usually provide better reusability across inputs because the protected model learns a trigger pattern rather than memorizing a few isolated samples. However, most existing backdoor watermarking schemes, regardless of trigger type, still verify ownership through a finite set of trigger samples prepared during watermark embedding.
A separate line of research has focused on preserving the integrity of routine service behavior after watermark embedding. Zhong [
24] and Sun [
25] attempted to reduce interference with the original task by introducing additional categories or neurons to absorb watermark-related behavior. Wang et al. [
26] proposed a plug-and-play watermarking scheme by injecting an independent proprietary model so that the original target model need not be directly fine-tuned; however, the watermarking function is still carried by a separate neural component rather than by a lightweight low-rank parameter offset. Hua et al. [
27] further argued that watermark fidelity should be evaluated not only by clean-set prediction accuracy but also by the stability of internal feature extraction and decision boundaries, which they referred to as deep fidelity. These studies substantially improved the understanding of fidelity in DNN watermarking. Nevertheless, existing methods still commonly rely on modifying the backbone parameters of the protected model or introducing additional structures that must participate in inference. As a result, they do not fundamentally resolve the service integrity problem under practical deployment.
Another line of work has studied ambiguity attacks and corresponding anti-forgery mechanisms. Fan et al. [
17] showed that ownership verification can be challenged by forged watermark evidence, thereby establishing ambiguity as a practical security threat. Zhu et al. [
9] further pointed out that when trigger samples are verified independently, an attacker with black box access may fabricate alternative trigger evidence through repeated queries. To address this issue, they linked trigger samples and target labels by constructing a one-way trigger chain and a corresponding label chain through a hash-based mechanism. Hua et al. [
10] extended this idea by constructing deterministically dependent trigger sets, where multiple trigger samples and labels are correlated to increase the difficulty of ambiguity attacks. The common insight of these studies is that ownership verification becomes harder to forge when it depends not only on the responses to individual trigger samples but also predefined group-wise consistency across a trigger group.
The above ambiguity-resistant methods provide important foundations for this paper. Their common insight is that ownership evidence becomes harder to forge when trigger samples and target labels are not verified independently but are constrained by deterministic, hash-based, or encrypted relations. Nevertheless, the exposure of concrete verification evidence remains a practical concern in trigger-based watermarking. Recent work on evidence exposure attacks explicitly points out that the trigger set can effectively serve as the watermark key; if it is leaked during verification, an adversary may reuse the leaked evidence to claim ownership [
11]. Recent cryptographic chain methods further strengthen trigger generation and watermark decision rules by deriving trigger inputs from secret-key-based chains and using more rigorous statistical verification [
28].
The present work follows the same goal of reducing ambiguity, but it differs in how the verification evidence is generated. Instead of treating the trigger evidence as a fixed set, a fixed chain, or a watermark dataset, the proposed method lets the valid key define one-way mapping from carrier image statistics to sparse trigger patterns. The trigger samples submitted in one verification session are therefore only carrier-specific realizations of this mapping. This design is combined with group-wise verification and a plug-and-play LoRA component so that the method addresses both evidence ambiguity and service backbone preservation under the intended deployment protocol.
Several recent watermarking methods explicitly target model extraction or knowledge distillation attacks. For example, active API [
29] watermarking methods inject watermark behavior through service responses, while entanglement-based [
30] or margin-based [
31] methods attempt to couple watermark behavior with the normal task decision behavior so that the watermark can survive in extracted surrogate models. These methods address watermark transfer under functionality stealing. In contrast, ambiguity-resistant ownership verification addresses a different failure mode: even when a model exhibits watermark behavior, an adversary may attempt to fabricate alternative trigger evidence and make the ownership claim ambiguous. The proposed method belongs to the latter line. It does not replace extraction-resilient watermarking; instead, it can serve as a key-driven anti-forgery verification layer that is complementary to extraction-aware embedding objectives.
5. Results
5.1. Experimental Setup
Experiments are conducted on CIFAR-10, CIFAR-100, FOOD-101, Caltech-101, and Caltech-256. Four watermark embedding strategies are compared: Scratch, FTAL, FTLL, and the proposed LoRA-based scheme. Scratch retrains the model from scratch with watermark embedding, FTAL fine-tunes all model parameters, and FTLL fine-tunes only the last layer.
For clean model training and Scratch-based embedding, the training schedule is set to 50 epochs on CIFAR-10, CIFAR-100, and FOOD-101 and 100 epochs on Caltech-101 and Caltech-256. FTAL and FTLL are both trained for 20 epochs. Unless otherwise specified, the batch size is 32 and the initial learning rate is .
Trigger groups are generated online during watermark embedding. In each mini-batch, three benign image pairs are sampled. In the general formulation of
Section 3.2, the trigger group size is denoted by
. In the present implementation, each trigger group contains six trigger samples generated by the six predefined RGB channel permutations, and thus
throughout the experiments. For each valid key trigger sample, one auxiliary wrong key trigger sample is constructed from the same carrier image. Each batch therefore contains 18 valid key trigger samples and 18 wrong key trigger samples. The mixup ratio between the two carrier images is fixed at 1:1, and the number of inserted white pixels is fixed at 16 for all datasets. Because the valid key and wrong key samples are constructed in a one-to-one manner, the balancing coefficient in the embedding objective is fixed at
.
For the proposed method, the backbone parameters remain frozen, and only is optimized. In intrinsic rank search, every layer is treated as a candidate insertion location, and the candidate rank ranges from 0 to 8. The genetic algorithm uses a population size of 256 and runs for 512 generations. The trade-off coefficient in the fitness score is set to , and the mutation probability is set to 0.2. The rank vector with the highest fitness score is used to instantiate the final plug-and-play LoRA component.
Unless otherwise stated, the number of trigger groups used for verification is set to
in all experiments. Each trigger group is generated from one benign image pair and contains six trigger samples produced by the predefined channel permutations. The reported ownership verification results are computed over these
trigger groups according to the criterion defined in
Section 3.2. Runtime measurements are conducted on a workstation with an Intel Core i5-13600KF CPU, an NVIDIA GeForce RTX 4060 GPU, and 16 GB RAM using the same batch size as the main experiments. We report wall clock time for watermark embedding, genetic search, and ownership verification.
5.2. Service Integrity, Verification Effectiveness, and Parameter Efficiency
The results in
Table 1 and
Table 2 are used to evaluate three aspects of the compared methods: service integrity, verification effectiveness, and parameter efficiency. The baselines in this subsection are used to compare watermark-embedding strategies and service-side fidelity. They are not intended to represent anti-ambiguity or anti-forgery watermarking mechanisms, which are compared separately in
Section 5.4. These three aspects correspond to three practical questions. The first is whether watermark embedding changes model behavior on normal service inputs. The second is whether the embedded watermark still provides reliable evidence for ownership verification. The third is how many trainable parameters are required during watermark embedding.
Table 1 reports the service-side clean set performance and the trainable parameter overhead of different watermark embedding strategies. “Original Acc” denotes the clean test accuracy of the original model before watermark embedding. “Clean Acc.” denotes the clean test accuracy reported after embedding, and “
Acc” denotes the corresponding change relative to the original model. “Trainable Params. (%)” denotes the percentage of parameters updated during watermark embedding. For the proposed method, “Clean Acc.” is measured based on the service predictor because routine service is still performed by the original backbone under the intended deployment protocol.
Table 2 reports the ownership verification results of different watermark embedding strategies. “Trigger Acc” denotes the proportion of trigger samples that are mapped to their designated target labels. “GVR” denotes the group-wise verification rate under the criterion in
Section 3.2. Because ownership in this paper is confirmed from trigger groups rather than isolated trigger samples, GVR is the primary verification metric, while Trigger Acc. is reported as a supplementary indicator of sample-level trigger recognition.
The results in
Table 1 show that the compared methods differ substantially in how they affect normal service behavior. Scratch reduces clean test accuracy on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101, with the largest drop appearing on Caltech-101 (
), although a slight increase is observed on Caltech-256. FTAL causes only minor accuracy changes on CIFAR-10 and CIFAR-100 but still introduces noticeable degradation on FOOD-101, Caltech-101, and Caltech-256. FTLL reduces the number of trainable parameters on some models, but its effect on service behavior is less stable, as shown, for example, by the
drop on Caltech-256. In contrast, the proposed method reports unchanged clean set accuracy in all five settings under the intended deployment protocol, while the trainable parameter ratio remains between
and
. These results are consistent with the design goal that routine service should remain on the original backbone while the watermarking function is carried out by a lightweight external component.
Table 2 shows a different trade-off. Scratch and FTAL reach perfect verification performance in most settings, but this performance is obtained by updating the full model or retraining it from scratch. FTLL is much less stable: its trigger-level accuracy falls to
on Caltech-101 and
on Caltech-256, and its group-level verification results further drop to
and
, respectively. The proposed method reaches perfect verification performance on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101 and remains close to perfect on Caltech-256, with 99.33% trigger-level accuracy and 96.00% group-level verification. Taken together,
Table 1 and
Table 2 show that the proposed method maintains strong ownership verification performance while leaving routine service on the original backbone, and this balance is not achieved as consistently by the compared baselines.
5.3. Robustness Against Ambiguity Attacks
We evaluate whether an attacker can fabricate forged trigger groups that satisfy the ownership verification criterion without possessing the valid key. Because ownership is confirmed at the trigger group level, the analysis considers both trigger-level matching and the resulting group-level verification outcome.
Table 3 and
Table 4 report the quantitative results.
Figure 3 and
Figure 4 complement these tables by showing the distributions of trigger-level matching rates under repeated random key and near-key attacks, respectively.
We first consider a random key attack. In this setting, the attacker knows the trigger generation procedure described in
Section 4.2, including the carrier construction and the target-label computation rule, but does not know the valid key
or the key-induced white pixel positions for each carrier. The attacker therefore repeatedly samples candidate white pixel sets and uses them to construct forged trigger groups. This attack is non-adaptive, because candidate groups are generated without using previous API responses to update the search direction. To examine the effect of joint verification, we vary the number of trigger groups used in one attack attempt and report the results for
and
. For each value of
,
Table 3 reports the maximum trigger-level accuracy and the maximum GVR observed over repeated attack trials, while
Figure 3 shows the corresponding distributions.
Table 3 shows that random guessing may still produce partial trigger matches, especially when only a small number of trigger groups is involved in one attempt. However, as
increases, the maximum trigger-level accuracy decreases consistently across all datasets.
Figure 3 shows the same trend from the distributional view: most attack trials remain concentrated in the low-accuracy region, and this concentration becomes more pronounced as more trigger groups are jointly considered. More importantly, these partial matches rarely translate into successful forged verification. The maximum GVR remains 0.00% in nearly all settings. The only exception appears in CIFAR-10 at
, where the maximum GVR reaches 12.50%. Even in this case, success remains isolated and does not extend to stable forged verification across datasets. These results show that random guessing is generally insufficient to satisfy the ownership verification criterion adopted in this paper.
We then consider stronger near-key attacks. In these attacks, the attacker is allowed to recover most of the white pixel positions correctly, so the forged pattern is no longer a purely random guess. Under
, we evaluate three cases: 14/16 Correct, 15/16 Correct, and Adjacent Error.
Table 4 reports the corresponding maxima, and
Figure 4 shows the associated distributions of trigger-level matching rates. The first two settings still lead to low trigger-level matching rates on most datasets, and their maximum GVR remains low or even 0.00% in most cases. These results indicate that partial recovery of the trigger pattern is still insufficient for reliable forgery. Matching most white pixel positions is not equivalent to reproducing the exact trigger pattern required for stable group-level verification.
Under the Adjacent Error setting, the attacker is assumed to recover all but one of the white pixel positions correctly, and the only incorrect position is shifted to a neighboring grid of the true position. Compared with random guessing and coarse near-key approximation, this setting is substantially more favorable to the attacker because the forged pattern differs from the valid key by only a single local perturbation. Accordingly, both trigger-level matching and group-level verification increase across all datasets. The Caltech-101/AlexNet result suggests a dataset–model interaction rather than a general failure of the verification rule. Caltech-101 contains relatively limited intra-class variation and exhibits strong pose, scale, and background regularities, while AlexNet uses a relatively shallow feature hierarchy with early large spatial operators and pooling. This combination may produce a broader local response basin around sparse pixel-level triggers, so a one-pixel neighboring shift can still partially activate the learned watermark response. However, these values remain clearly below the verification results of the legitimate watermark in
Table 2, where the proposed method attains a GVR of 100.00% on CIFAR-10, CIFAR-100, FOOD-101, and Caltech-101 and 96.00% on Caltech-256. Therefore, even under this attacker-favorable local perturbation setting, the forged trigger groups still do not reproduce the ownership verification behavior of the legitimate watermark.
Figure 4 is consistent with this observation: the distributions under Adjacent Error shift toward higher trigger-level matching rates, but they still do not support the same verification outcome as the valid key.
Overall, the results in
Table 3 and
Table 4 and
Figure 3 and
Figure 4 support a bounded conclusion. Random key attacks rarely satisfy the verification criterion, and coarse near-key approximations remain insufficient for stable forgery. The strongest attack considered here is Adjacent Error, where the forged pattern differs from the valid key by only one locally shifted white pixel position. Even in this case, the observed GVR remains well below the verification result of the legitimate watermark. These results indicate that the proposed method effectively increases the difficulty of ambiguity attacks while also showing that attack success is more sensitive to local deviations around the valid key than random or nonlocal deviations.
The above attacks do not exhaust all possible adaptive black box optimization strategies. Rather, the near-key settings provide a sensitivity analysis after hypothetical partial recovery of the valid white pixel pattern. This analysis is relevant to optimization-based reverse engineering because any such attack must eventually produce candidate positions close enough to the valid ones. The 14/16 Correct and 15/16 Correct settings show that coarse partial recovery is generally insufficient for stable forged verification, while the Adjacent Error setting shows that local deviations around the valid pattern remain more sensitive. Therefore, fully query-adaptive optimization over white pixel positions is treated as a stronger setting and a boundary of the current evaluation.
5.4. Comparison with Anti-Ambiguity Watermarking Methods
The previous subsection evaluates the proposed method under random key and near-key ambiguity attacks. To further address whether the proposed method differs from existing anti-ambiguity watermarking mechanisms, we compare it with two representative trigger-based methods: the hash chain protocol of Zhu et al. [
7] and the unambiguous backdoor watermarking method of Hua et al. [
8]. These two methods are selected because they are the closest to our threat model: both consider black box ownership verification and both aim to prevent forged or ambiguous trigger evidence. In contrast, the Scratch, FTAL, and FTLL baselines in
Section 5.2 compare watermark-embedding strategies rather than anti-ambiguity mechanisms.
For a common evaluation budget, all compared methods use 300 trigger samples for verification. For the proposed method, this corresponds to
trigger groups, each containing six trigger samples. For Zhu et al. [
7], we implement the one-way hash chain construction with a chain length of 300. Because the original Zhu et al. protocol does not natively define trigger groups, Trigger Acc. is its primary native metric. To make the group-level results comparable with ours, we additionally report an auxiliary GVR by partitioning the hash chain into 50 non-overlapping consecutive blocks, each containing six trigger samples; a block is counted as verified only when all six samples in that block are classified into their designated labels. For Hua et al. [
8], we follow its trigger matrix construction and set
and
, resulting in 300 trigger samples arranged as 50 natural trigger groups, each containing six correlated samples. Therefore, Hua et al. can be evaluated with the same Trigger Acc. and GVR definitions without artificial regrouping. The comparison is conducted on CIFAR-10/ResNet-18 and Caltech-101/AlexNet, representing a standard benchmark setting and the dataset–model pair that shows the strongest local near-key sensitivity in
Table 4.
Table 5 reports valid verification performance and random forgery results. In the random forgery attack, the attacker knows the public construction rule of each method but does not possess the legitimate secret information. For Zhu et al. and Hua et al., the attacker randomly generates candidate trigger evidence according to the corresponding public construction rule. For the proposed method, the attacker performs the random key attack described in
Section 5.3. We report the maximum trigger-level accuracy and maximum GVR observed over repeated attack trials.
Table 5 shows that all three anti-ambiguity methods achieve high valid verification performance, but they differ under forged evidence. On CIFAR-10/ResNet-18, Zhu et al. and Hua et al. reach valid GVRs of 98.00% and 100.00%, respectively, while the proposed method reaches 100.00%. On Caltech-101/AlexNet, the corresponding valid GVRs are 86.00%, 94.00%, and 100.00%. Under random forgery attacks, Zhu et al. show relatively high trigger-level matching, with maximum Trigger Acc. values of 51.33% on CIFAR-10 and 35.00% on Caltech-101, although their block-wise Max GVR remains much lower at 6.00% and 2.00%. Hua et al. reduce the forged trigger-level matching to 19.33% and 7.67%, with Max GVR values of 2.00% and 0.00%. The proposed method obtains Max Trigger Acc. values of 9.67% and 8.33% on the two datasets, and its Max GVR remains 0.00% in both cases. These results indicate that under the same verification budget, the proposed method provides competitive or stronger resistance to random forged evidence compared to representative trigger-based anti-ambiguity baselines.
We further evaluate evidence exposure after one verification session. In trigger-based ownership verification, a practical concern is that the trigger evidence submitted during verification may be disclosed. If a later verifier accepts the same disclosed trigger set or trigger chain, an adversary may replay the leaked evidence to support a counterfeit claim.
Table 6 summarizes this scenario. For Zhu et al. and Hua et al., the disclosed trigger chain or correlated trigger set remains the concrete evidence used for ownership verification, so replaying the leaked evidence leads to the same verification outcome as the original valid verification. For the proposed method, however, the verifier can require trigger groups generated from fresh benign carrier pairs. The attacker is then given the previously disclosed trigger groups and attempts to transfer the observed white pixel positions and labels to fresh carriers.
Table 6 further shows the difference between fixed verification evidence and the proposed regenerable evidence under evidence exposure. For Zhu et al. and Hua et al., replaying the leaked trigger chain or leaked trigger set reproduces the original verification outcome, leading to post-leak forged GVRs of 98.00% and 100.00% on CIFAR-10/ResNet-18, and 86.00% and 94.00% on Caltech-101/AlexNet. These results do not imply that the baselines fail under their original assumptions; rather, they show that their disclosed trigger evidence remains directly reusable if a later verifier accepts the same evidence. In contrast, under the refreshed verification setting of the proposed method, the verifier can request trigger groups generated from fresh benign carriers. The legitimate key still achieves a fresh GVR of 100.00% on both datasets, whereas transferring the leaked white pixel positions and labels to fresh carriers yields a post-leak forged GVR of 0.00%. This result supports the main design motivation of the proposed method: the trigger samples disclosed in one verification session are only carrier-specific realizations of the key-defined mapping, and they do not provide a reliable way to construct valid trigger groups for new carriers. The same interpretation applies when an attacker optimizes or recovers the white pixel positions for a particular verification batch: such recovery exposes concrete trigger instances, but it does not recover the valid key or the carrier-to-pattern mapping required for refreshed verification.
5.5. Key Sensitivity Analysis
We further examine how the verification response changes as the candidate key deviates from the valid key. Starting from the valid key, we progressively replace
out of the 16 white pixel positions with incorrect ones and measure the resulting trigger-level accuracy. Here,
denotes the number of mismatched white pixel positions between the candidate key and the valid key.
Figure 5 also shows the random guess baseline
, where
is the number of classes, as a reference level for chance matching to the designated watermark labels.
Figure 5 shows a sharp transition from the valid key to incorrect keys. When
, the trigger-level accuracy matches the legitimate key verification performance reported in
Table 2, reaching 100% on four datasets and remaining close to 100% on Caltech-256. Once one white pixel position becomes incorrect, the trigger-level accuracy drops rapidly and remains close to the corresponding random guess baseline as
increases. This behavior indicates that perturbing the valid key quickly destroys the intended trigger response so that the outputs with respect to the designated watermark labels become nearly indistinguishable from chance.
Under the group-wise verification criterion, the same trend becomes stricter. In our experiments, once , the corresponding trigger groups no longer satisfy the ownership verification requirement, and the GVR drops to 0. This result is consistent with the rapid decline in trigger-level accuracy and shows that successful ownership verification requires an exact key match rather than an approximate reconstruction of the trigger pattern.
Overall,
Figure 5 shows that the proposed watermark depends strongly on the precise spatial configuration of the white pixel positions. Recovering an approximate key is generally insufficient to reproduce the verification behavior of the valid key.
5.6. Hyperparameter and Parameter Budget Sensitivity
We further examine several important parameter choices of the proposed method on CIFAR-10/ResNet-18. Because the main experiments already evaluate five dataset–model pairs, this subsection focuses on a compact sensitivity analysis to reveal the effect of representative parameters rather than repeating all settings on all datasets.
Table 7 reports the sensitivity of the trigger construction and wrong key suppression. The number of inserted white pixels
controls the sparsity and strength of the trigger pattern. The coefficient
controls the balance between valid key fitting and wrong key suppression in Equation (10). In each block of the table, only the specified parameter is changed, while all other settings follow the default configuration in
Section 5.1.
Table 7 shows two trends. First, when the number of white pixels is too small, the trigger signal is insufficient for stable verification. The valid GVR increases from 46.00% at
to 96.00% at
, and
is the first tested setting that reaches 100.00% valid GVR. Further increasing
does not improve valid verification, but it makes the trigger pattern denser and increases the Adjacent Error Max GVR. Therefore,
is selected as the default because it provides stable verification without using an unnecessarily dense trigger pattern. Second,
controls the trade-off between valid key fitting and wrong key suppression. When
, valid verification remains high, but the wrong key GVR reaches 58.00%. Increasing
suppresses wrong key responses, and
achieves 100.00% valid GVR while reducing both wrong key GVR and random forgery Max GVR to 0.00%. Larger values such as
and
further emphasize suppression but begin to degrade valid verification. This supports
as a balanced default under the one-to-one construction of valid key and wrong key samples.
Table 7 presents the white-pixel number sensitivity. Here,
denotes the number of inserted white pixels in each trigger sample. “Random-forgery Max GVR” is computed under the random key attack, and “Adjacent-Error Max GVR” is computed under the local near-key setting where only one white pixel position is shifted to a neighboring grid.
Table 8 evaluates the LoRA parameter budget. Here,
is treated as a lightweightness constraint rather than a performance-tuned threshold. We therefore sweep explicit trainable parameter budget caps and report both the selected LoRA size and the resulting verification performance.
Table 8 shows that the LoRA component has an under-capacity region and a saturation region. Very compact budgets, such as 0.25% and 0.50%, do not provide enough adaptation capacity and lead to reduced GVR. As the budget increases, the verification performance becomes stable. The default budget is not selected as a boundary point where performance barely becomes stable; rather, it is a conservative lightweight setting within the saturation region. Increasing the budget beyond the default setting does not provide additional verification gain. These results also indicate that further compression may be possible in some settings but overly aggressive compression damages watermark verification.
The 1:1 carrier mixing ratio, the number of trigger groups
, and the candidate rank range
are not treated as additional training-sensitive hyperparameters in this subsection. The 1:1 carrier mixing ratio is used as a deterministic and symmetric carrier construction rule rather than as a tuned data augmentation strength; it makes the two source images contribute equally while channel permutations produce carriers with different channel-wise statistics. The number of trigger groups
is a verification budget parameter rather than an embedding hyperparameter, and its effect has already been examined in
Table 3. Finally, the candidate rank range
defines a bounded search space for the genetic algorithm: rank 0 allows a layer to be skipped, while the upper bound 8 gives important layers sufficient adaptation capacity without unnecessarily expanding the search space. The actual LoRA capacity is therefore analyzed through the parameter budget sensitivity in
Table 8.
5.7. Computational Overhead and Structural Optimization Effectiveness
We finally analyze the computational overhead of the proposed framework and the effectiveness of the genetic search. The proposed method introduces an offline intrinsic-rank search stage and a LoRA embedding stage. The search stage evaluates candidate rank vectors with a training-free proxy, and the selected LoRA component is then trained with the backbone frozen. Under the intended deployment protocol, routine service still uses the original backbone, so no additional routine-service inference overhead is introduced.
Table 9 shows that the proposed method introduces an additional offline GA search cost, but the LoRA embedding stage itself remains lightweight because only the external component is trained while the backbone is frozen. Compared with Scratch and FTAL, the proposed method avoids full-model retraining or full-parameter fine-tuning. During routine service, the inference overhead is 0.00% because the service predictor remains the original backbone. The additional inference cost is incurred only during ownership verification.
We next examine whether the intrinsic-rank search in
Section 4.4 yields more suitable LoRA configurations under the same parameter budget. For each benchmark setting, we compare the top-5 rank vectors selected by the genetic algorithm with five rank vectors randomly sampled from the same search space. The corresponding plug-and-play LoRA components are then instantiated under the same budget, and their watermark-embedding training losses are tracked during optimization, as shown in
Figure 6.
Figure 6 compares the training loss trajectories of the top 5 GA-selected configurations and five randomly sampled configurations under the same parameter budget. The curves are relatively dense because LoRA watermark embedding converges quickly in all tested settings. Therefore,
Table 9 further summarizes the final LogLoss values. The GA-selected configurations obtain lower final LogLoss than the random configurations in all five settings. The reduction is modest on CIFAR-10, CIFAR-100, and Caltech-101 but more visible on FOOD-101 and Caltech-256. This indicates that the genetic search does not necessarily create a large visual gap in the loss curves, but it consistently selects rank allocations that avoid poorer random configurations under the same parameter budget.
This result supports the role of structural optimization in the proposed method. Under the same parameter budget, different LoRA configurations do not provide the same embedding capability. In the tested settings, the configurations selected by the genetic algorithm show faster loss descent throughout training than randomly sampled configurations. This indicates that, even with the same amount of trainable parameters, optimizing the rank allocation and insertion positions enables the LoRA component to use the available parameter budget more effectively during watermark embedding.
6. Discussion
This paper rethinks backdoor DNN watermarking for deployed ownership verification from two design perspectives: how watermark evidence is organized and how watermark functionality is deployed. On the verification side, the proposed framework shifts the evidence basis from a finite set of discrete trigger samples to a reproducible trigger family governed by a valid key. Under this formulation, ownership is no longer tied to preserving several fixed trigger samples as static secret evidence, but to verifying group-wise consistency over trigger groups regenerated from the legitimate construction rule. From this perspective, improved resistance to ambiguity attacks comes not from hiding a few isolated samples, but from increasing the structural dependency among trigger samples and making forged verification depend on reproducing the complete pattern-label correspondence within a trigger group.
On the deployment side, the proposed method shows that preserving routine service integrity does not require watermark fitting to be written into the service backbone itself. Instead, watermark embedding can be carried by external parameterized components, while the service predictor and the verification predictor remain separated under the intended deployment protocol. In this paper, that role is instantiated by a plug-and-play LoRA component, together with intrinsic-rank optimization by genetic search. This choice should be understood as one concrete realization of the external-parameter design principle, rather than the only possible implementation.
The experimental results support this overall picture. They show that the proposed framework maintains the original service predictor under the intended deployment protocol, enables effective ownership verification across five benchmark datasets, and prevents attackers without the valid key from reproducing the verification behavior of the legitimate watermark under the ambiguity attacks considered in this paper. At the same time, the results also clarify two boundaries of the current method. First, robustness against stronger local near-key perturbations and fully adaptive black-box optimization attacks still needs further evaluation. The current near-key tests analyze the consequences of hypothetical partial recovery of the valid white-pixel pattern, but they do not exhaust all possible query-adaptive search strategies. Second, because routine service is performed by and ownership verification is performed by , the proposed protocol is not designed to make the watermark transfer to a surrogate model distilled only from routine service responses. This limitation is a direct consequence of separating routine service behavior from ownership verification behavior. Future work may therefore proceed in three directions: exploring external parameterizations beyond LoRA, protecting the watermark component itself against direct reuse or disclosure, and combining the proposed key-driven trigger-family verification with extraction-resilient watermarking objectives so that watermark transfer and ownership unambiguity can be addressed jointly.