Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models

Xu, Xin; Wang, Zihao; Chen, Weitong; Tang, Wei; Ren, Na; Zhu, Changqing

doi:10.3390/rs17142379

Open AccessArticle

Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models

by

Xin Xu

^1,2

,

Zihao Wang

³,

Weitong Chen

^1,2,*

,

Wei Tang

⁴,

Na Ren

^3,5,6,7

and

Changqing Zhu

^5,6,7

¹

School of Information Engineering, Yangzhou University, Yangzhou 225127, China

²

Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou 225127, China

³

Hunan Engineering Research Center of Geographic Information Security and Application, Changsha 410007, China

⁴

Zhejiang Academy of Surveying and Mapping, Hangzhou 311100, China

⁵

Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing 210023, China

⁶

State Key Laboratory Cultivation Base of Geographical Environment Evolution of Jiangsu Province, Nanjing 210023, China

⁷

Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2379; https://doi.org/10.3390/rs17142379

Submission received: 12 June 2025 / Revised: 8 July 2025 / Accepted: 9 July 2025 / Published: 10 July 2025

Download

Browse Figures

Versions Notes

Abstract

Remote sensing object detection (RSOD) models are widely deployed on edge devices for critical applications. Their security and integrity have become urgent concerns. This work proposes a fragile model watermarking method that enables black-box integrity verification for RSOD models. Specifically, for a given RSOD model, we construct class-specific sensitive object triggers and corresponding fragile watermark samples for each target category. During the trigger generation process, a trained surrogate model is first employed to construct the initial sensitive object trigger, where real objects are utilized to guide the trigger to acquire weak semantic features of the target class. This trigger is then jointly optimized using both the original model and a tampered version. The original model ensures that the trigger remains recognizable, while the tampered model encourages sensitivity to parameter changes. During integrity verification, the model is queried with all the fragile watermark samples. The model is considered intact only if all predictions match the expected results. Extensive experiments demonstrate that the proposed method is effective across multiple RSOD models. It exhibits high sensitivity to various model modifications, including backdoor injection, fine-tuning, pruning, random parameter perturbation, and model compression.

Keywords:

remote sensing object detection; fragile model watermarking; integrity verification; sensitive object trigger; black-box verification

1. Introduction

With the rapid advancement of deep learning technologies in the field of remote sensing [1,2], remote sensing object detection (RSOD) models have been widely applied in various critical domains [3], including disaster monitoring [4], traffic supervision [5,6], urban public safety inspection [7,8], and situational awareness [9]. However, adversarial attacks targeting RSOD models remain a significant security concern [10,11]. Attackers may manipulate model parameters, inject backdoors, or subtly modify model structures, causing the model to behave unpredictably and potentially leading to severe consequences [10]. In critical scenarios where RSOD models are deployed, model security is not only essential to ensure reliability but also crucial for preventing potential risks in social, economic, and public safety contexts. Therefore, conducting regular integrity verification of deployed RSOD models to prevent malicious tampering is of great practical significance.

In remote sensing applications, RSOD models are deployed on edge devices equipped with autonomous sensing and computing capabilities [12], such as fixed ground cameras [13], unmanned aerial vehicles [14], satellite platforms [15], and in-vehicle camera systems [16]. These edge deployment environments are often communication-limited and have weaker security protections [17], making the deployed models vulnerable to malicious tampering during operation [10]. Specifically, models may be attacked at various stages, as illustrated in Figure 1. These include the risk of malicious replacement during model transmission, deployment, and remote updates, as well as tampering or backdoor injection after the device is accessed by unauthorized parties. Moreover, model deployment engineers, maintenance engineers, and edge device administrators may have varying degrees of access to model parameters and execution permissions, further increasing the difficulty of ensuring model integrity. Therefore, to ensure the security of deployed RSOD models, there is an urgent need for an efficient and lightweight integrity verification mechanism.

As a classical technique for data integrity verification [18], fragile watermarking has achieved remarkable success in various multimedia domains, including remote sensing images [19], audio [20], and video [21]. This technique typically embeds highly sensitive watermark information into the host data such that any unauthorized modification leads to irreversible changes in the watermark, enabling the detection of data tampering. However, RSOD models are organized using parameterized neural network structures, which are characterized by high nonlinearity and high-dimensional representations. Their training logic and inference mechanisms differ significantly from those of traditional multimedia data. As a result, traditional fragile watermarking methods cannot be directly applied to the integrity verification of RSOD models.

To ensure the integrity of deep learning models, researchers have proposed fragile model watermarking techniques [22], which can be broadly classified based on the verification mechanism into two categories: white-box verification [23,24,25] and black-box verification [26,27,28,29,30,31,32]. In white-box schemes, the verification process requires access to the model’s internal parameters, typically treating model weights as numerical matrices. However, such reliance on internal access not only raises the risk of model exposure but also makes these methods impractical for deployment on resource-constrained edge devices.

In fragile watermarking with black-box verification, model integrity is verified by querying the model’s inference interface with a set of pre-generated sensitive samples [26]. These samples, inspired by adversarial examples, are designed to lie near the model’s decision boundary and produce predefined outputs when the model is intact. Based on whether internal model information is required during sample generation, existing methods can be grouped into two categories: those that operate in a fully black-box settings using generative models [27,28,29,30] and those that leverage internal model information for sample optimization [31,32]. Black-box verification methods enable convenient, efficient, and low-overhead integrity checking, making them suitable for deployed RSOD models.

However, the abovementioned fragile model watermarking methods are all designed for classification models. Unlike classification models that produce a single-label prediction, RSOD models detect multiple object categories, and their corresponding bounding boxes from a single-input image, resulting in more complex model structures and more diverse parameter distributions. Moreover, in classification tasks, each sample corresponds to a single label, allowing the input to be globally optimized based on the loss. In contrast, RSOD tasks typically involve multiple targets of different categories and locations within a single sample, making global input optimization ineffective. Instead, optimization needs to be performed on specific regions associated with the sensitive triggers. Therefore, due to these factors, existing methods cannot be directly applied to RSOD models, highlighting the need for specific generation and optimization strategies for sensitive object triggers.

To address this issue, we propose a sensitive object trigger-based method for black-box integrity verification of RSOD models. To the best of our knowledge, this is the first fragile watermarking scheme specifically designed for RSOD models. Given a specific RSOD model, our method constructs a corresponding fragile watermark verification dataset. Specifically, for the object recognition task performed by the model, the algorithm generates a sensitive object trigger and the corresponding fragile watermark samples for each detectable object class. The complete set of watermark samples for all classes is preserved as the fragile watermark verification dataset, which is used for subsequent integrity verification. The proposed method makes no modifications to the model and has no impact on its performance. The main contributions are summarized as follows:

We propose the first fragile watermarking method for RSOD models based on sensitive object triggers, enabling convenient and efficient black-box verification of model integrity.
We design a target class feature-driven trigger initialization strategy, where a trained surrogate model guides the generation of the trigger using features of target class objects. This enables the initialized sensitive object trigger to possess weak semantic features of the target class.
We introduce a joint optimization method based on the original model and a tampered model. The original model guides the trigger to maintain recognizability, while the tampered model encourages the trigger to remain sensitive to parameter changes. This dual supervision drives the trigger to gradually approach the model’s decision boundary.
Extensive experiments demonstrate that the proposed method enables convenient and reliable integrity verification across multiple representative RSOD models, exhibiting strong generalizability and practical applicability.

The rest of this article is organized as follows. Section 2 provides a brief review of related work in fragile model watermarking. Section 3 introduces the problem statement and the threat model. The proposed fragile watermarking algorithm is detailed in Section 4. The experimental results and analysis are presented in Section 5. Finally, Section 6 concludes this article.

2. Related Works

Fragile watermarking, as a classical technique for data integrity verification, has been extensively studied for traditional multimedia. In fragile watermarking algorithms for raster data such as remote sensing and traditional images, representative methods based on different watermark modulation mechanisms primarily include LSB embedding and statistical feature-based modulation [18]. The former typically utilizes the LSBs of spatial-domain pixel values or transform-domain coefficients as watermark carriers. The latter embeds watermarks by modulating the statistical properties of the data, with typical approaches including histogram shifting [33] and patchwork [34]. These studies have provided effective technical support for the integrity protection of traditional multimedia.

However, with the widespread adoption of deep learning models in recent years, new challenges have emerged in integrity verification. Traditional fragile watermarking methods designed for multimedia data are insufficient to meet the integrity protection requirements of deep learning models. Therefore, researchers have begun to investigate fragile watermarking techniques specifically designed for deep learning models [22]. Existing fragile model watermarking methods can be broadly classified into two categories: fragile watermarking with white-box verification and with black-box verification.

2.1. Fragile Model Watermarking with White-Box Verification

Fragile model watermarking with white-box verification treats model weights as matrix-like data structures, exhibiting certain similarities to the fragile watermarking methods used in traditional multimedia data. Guan et al. [23] were the first to propose a fragile watermarking algorithm for integrity verification of convolutional neural networks (CNNs). In their method, redundant weight parameters are organized into sequences to serve as watermark carriers, and watermark information is embedded using the histogram shifting algorithm. Botta et al. [35] grouped model parameters into fixed-size blocks and applied the Karhunen–Loève Transform (KLT), embedding watermark information into the LSBs of the frequency-domain coefficients. Abuadbba et al. [25] reconstructed model parameters into two-dimensional matrices and applied the Discrete Wavelet Transform (DWT) to convert the weight matrices into the frequency domain. The hash value was used as the watermark and embedded into the LSBs of frequency coefficients in a randomized order. Li et al. [36] reordered model weights at fixed intervals and computed a masked weighted sum for each group, storing the result as a reference signature. During inference, the signature is recomputed and compared with the stored one to detect potential tampering. Zhao et al. [24] also adopted an LSB-based fragile watermarking scheme in the spatial domain, with a focus on tampering recovery. Their method embedded authentication and recovery information that was generated from the most significant bits (MSBs) into the LSBs. Huang et al. [37] selected less important convolutional kernels and organized them into sequences to serve as watermark carriers. The watermark was embedded based on histogram shifting. Gao et al. [38] rearranged the neural network weight matrices and employed adaptive bit training to embed the critical information of each parameter into the LSBs of adjacent parameters, enabling both tamper detection and model recovery.

The abovementioned methods enable integrity verification for deep learning models. However, they require access to model parameters during the verification process, which increases the risk of model exposure. In addition, the watermark synchronization and extraction process require a certain amount of computational resources.

2.2. Fragile Model Watermarking with Black-Box Verification

Fragile model watermarking with black-box verification rely on model inference for integrity checking. These methods utilize a preconstructed set of sensitive samples located near the model’s decision boundary and verify model integrity by observing the model’s predictions on these samples. Such algorithms can be further categorized into two types based on the sensitive sample generation stage: those that do not require access to model weights and those that do.

Methods that do not require access to model parameters during the generation of sensitive samples are designed for fully black-box scenarios. These approaches aim to directly generate samples located near the model’s decision boundary. Wang et al. [29] employed a Variational Autoencoder (VAE)-based generative model to perform controllable perturbations in the latent space, generating sensitive samples close to the decision boundary. Yin et al. [28] utilized a Generative Adversarial Network (GAN) combined with a task-specific loss function to optimize the input noise in latent space such that the generated trigger samples are highly sensitive to small modifications of the model. Zhao et al. [30] adopted a VAE-GAN framework to map original samples into the latent space, apply semantic perturbations, and decode them into sensitive samples near the model’s decision boundary.

Methods that require access to model parameters during the sensitive sample generation phase leverage the model to assist in optimizing the samples. Most of these studies preserve the original model without modification. He et al. [26] optimized a set of sensitive samples by maximizing the gradient sensitivity of model outputs with respect to weight perturbations, aiming to generate inputs that are highly responsive to small changes in model parameters. Xu et al. [39] transformed the complex nonlinear activation functions in deep neural networks into low-order polynomials, and they used the gradient norm of the model output for its parameters as the objective function to guide the generation of sensitive samples. Aramoon et al. [31] designed an objective function to generate samples that lie close to the model’s classification boundary while simultaneously activating as many neurons as possible, thereby enhancing both the sensitivity and coverage of the watermark. Kuttichira et al. [40] used a VAE to map inputs into a low-dimensional latent space, and they applied Bayesian Optimization (BO) within this space to identify inputs most sensitive to weight perturbations in order to maximize the prediction difference between the original model and a potentially tampered one. Lao et al. [32] selected highly uncertain adversarial samples that lie far from the natural data distribution but close to the decision boundary. A lightweight fine-tuning process was then applied to ensure that the model’s predictions changed only for these selected key samples. Gao et al. [27] added an auxiliary binary classification layer on top of the original multiclass model, transforming the task into a boundary-structured binary classification problem. They then generated pairs of sensitive samples lying on opposite sides of the new decision boundary by combining activation maximization with adversarial perturbation optimization. Yin et al. [41] trained a generative model to produce “fragile samples” as watermark triggers, which are highly sensitive to slight model fine-tuning. This was achieved using the target model’s output probability distribution and prediction mask, without requiring access to or modification of model parameters. Yuan et al. [42,43] proposed semi-fragile neural network watermarking methods based on adversarial examples. These approaches introduce imperceptible perturbations to input samples and optimize an objective function to produce model fingerprints that are visually similar to the original inputs while imitating the output of target samples. The resulting watermarks are robust to pruning, quantization, and fine-tuning, but they are fragile to backdoor injection and parameter tampering.

The above fragile model watermarking methods with black-box verification can validate model integrity by issuing a small number of API queries and comparing the model’s output on the sensitive samples with the expected labels. This type of algorithm enables more lightweight and convenient watermark verification. However, the existing methods are all designed for classification models and cannot be directly applied to RSOD models. More specifically, these methods typically generate sensitive samples by optimizing the entire input. In object detection scenarios, however, each input contains multiple objects and background regions, making global input optimization meaningless.

2.3. Remote Sensing Object Detection

RSOD has become a fundamental task in the analysis of aerial and satellite imagery. Over the past decade, numerous deep learning-based models have been developed for RSOD, which can be broadly categorized into two-stage and one-stage detectors. Two-stage detectors, such as Faster-RCNN [44], first generate region proposals and then perform classification and regression, offering high accuracy but at the cost of slower inference speed. In contrast, one-stage detectors, including the YOLO series [45,46] and SSD [47], directly predict bounding boxes and class scores in a single pass, making them more efficient and suitable for real-time applications.

In this study, we adopted both paradigms to evaluate our proposed method. Specifically, we employed Faster-RCNN as a representative two-stage detector, and YOLOv5, YOLOv8, and SSD as representative one-stage detectors to demonstrate the effectiveness of our attack across different RSOD detectors.

3. Problem Statement and Threat Model

3.1. Problem Statement

RSOD models typically undergo multiple stages during real-world application [10,41], including development, transmission, deployment, and remote updates, with involvement from various stakeholders such as model developers, deployment and maintenance staff, and edge device administrators. At each of these stages, the model is exposed to potential risks, making its integrity highly vulnerable. Once the integrity of an RSOD model is compromised, it may lead to distorted recognition results, thereby affecting intelligent decision making in critical scenarios. Therefore, there is an urgent need for a mechanism that can periodically inspect the model or quickly and efficiently verify its integrity prior to use.

The verification problem of RSOD models can be defined as follows: A user or administrator obtains a fragile watermark verification dataset

D_{v}

corresponding to the specific model version

F_{w}

from the model developer and uses

D_{v}

to query the API of the deployed model

F_{w}

. If there exists any

{\tilde{x}}_{i} \in D_{v}

for which

F_{w} ({\tilde{x}}_{i}) \neq {\tilde{y}}_{i} \cap O_{t} \in {\tilde{y}}_{i}

, it suggests that the model’s integrity has been compromised.

3.2. Threat Model

The objective is to construct fragile watermark verification datasets for integrity verification of RSOD models. The dataset construction process is carried out by the model developer, where the model architecture and parameters are fully known during this phase. The fragile watermarking must not degrade the model’s inference performance and should be uniquely bound to a specific version of the model.

Formally, let the RSOD model be denoted as

M_{w}

, where w represents the model parameters. The watermark dataset is defined as

D_{wm} = {(x_{i}, y_{i})}_{i = 1}^{N}

, where each input

x_{i}

is associated with a predefined expected output

y_{i}

. The integrity condition requires that

M_{w} (x_{i}) = y_{i}, \forall (x_{i}, y_{i}) \in D_{wm} .

(1)

Any deviation from this behavior under a potentially modified model

M_{w^{'}}

indicates a violation defined as

\exists (x_{j}, y_{j}) \in D_{wm}, s . t . M_{w^{'}} (x_{j}) \neq y_{j} .

(2)

We assume that only the model developer and the end user who queries the model are trusted entities. All other parties involved in the model lifecycle are considered potentially untrustworthy. This includes entities that handle model transmission, deployment, remote updates, and on-device management. As attackers may include authorized personnel (e.g., system administrators or engineers), we assume that adversaries may have full access to the protected model, including its architecture and parameters, but they are unaware of the watermark signature or the specific watermark sample set.

The model faces multiple threats throughout its lifecycle. During transmission, it is vulnerable to interception or unauthorized access, allowing adversaries to intercept and replace the model with a malicious version. During deployment, the model may be swapped out for a compromised one or injected with backdoors. In the remote update phase, attackers can forge update packages to deliver tampered models. The common objective of these attacks is to manipulate the model’s decision-making behavior.

Additionally, we also consider that administrators may compress the model to save storage, unintentionally affecting its internal behavior. For example, quantization or pruning may modify the parameters

w \to w^{'}

, introducing slight changes

θ

, which can be defined as

w^{'} = w + θ, where ∥ θ ∥ ≪ ∥ w ∥ .

(3)

Therefore, the considered threat model includes various attack types such as backdoor attacks, fine-tuning attacks, pruning attacks, random parameter perturbation, and model compression. These attacks may cause

M_{w^{'}}

to deviate from its original behavior on the watermark set

D_{wm}

, thereby enabling effective integrity verification.

4. Proposed Method

In this section, we present a method for generating sensitive object triggers and the corresponding fragile watermark dataset tailored to a given RSOD model, aiming to enable black-box verification of model integrity. For each object class that the RSOD model is capable of detecting, a corresponding sensitive object trigger and fragile watermark samples are created. The overview of the trigger generation process is illustrated in Figure 2. Initially, a surrogate model is used to generate the initial sensitive object triggers that contain weak semantic features of the target class. These triggers are then jointly optimized using both the original model and a slightly tampered version. The process yields the final sensitive object triggers and fragile watermark samples. During the model integrity verification phase, the deployed model’s API is queried with these watermark samples, and the predictions are analyzed to assess whether the model has been tampered with.

4.1. Fragile Watermark Verification Dataset Generation

For the given original model

F_{o}

deployed for object detection, which is designed to recognize n object classes

{c_{1}, c_{2}, \dots, c_{n}}

, we constructed a sensitive object trigger and the corresponding fragile watermark samples for each class. The complete set of fragile watermark samples across all classes formed the final watermark verification dataset.

4.1.1. Generation of Initial Sensitive Object Trigger

Inspired by clean-label backdoor watermarking [48], real samples from the target class can be leveraged to guide the optimization of the trigger, enabling it to acquire the semantic features of the target class. In this stage, a surrogate model is used to optimize a randomly initialized perturbation, resulting in an initial trigger with weak semantic features of the target class. Weak semantic features refer to triggers that are not yet within the decision boundary but remain relatively close to it, facilitating subsequent optimization steps that move them closer to the boundary. The detailed procedure is as follows.

Surrogate model training: We first train an RSOD model with standard feature extraction capability on a public dataset

D_{P}

and designate it as the surrogate model

F_{s}

. The specific architecture of

F_{s}

is not critical, as long as it can provide basic recognition capability for the n object classes involved in the task. After training, the model parameters are frozen for use in subsequent stages.

Target class samples selection: For the k-th object class

c_{k}

, we randomly select 50 clean samples from the public dataset

D_{P}

that contain objects of class

c_{k}

. These selected samples are denoted as

D_{target} = {s_{1}, s_{2}, \dots, s_{50}}

. Each sample

s_{i}

may contain one or more target objects of

c_{k}

. For each

s_{i}

, we record only the bounding boxes of objects belonging to class

c_{k}

from its annotation file. These bounding boxes are denoted as

A_{i} = {b_{i 1}, b_{i 2}, \dots, b_{i m}}

, where

b_{i j}

represents the bounding box of the j-th object in sample

s_{i}

.

Training input with perturbation: We initialize a random perturbation

δ_{1} \leftarrow Uniform {(- 1, 1)}^{64 \times 64 \times 3}

and embed it dynamically into

D_{target}

as training input. The training process involves continuously optimizing and updating

δ

, which is denoted as

δ_{l}

at iteration l. The perturbation is embedded into all bounding box regions of class

c_{k}

objects within sample

s_{i}

for training. The training input

s_{i}^{'}

in the l-th iteration is defined as

s_{i}^{'} = s_{i} + \sum_{b_{i j} \in A_{i}} {embed}_{b_{i j}}^{broadcasting} (δ_{l})

(4)

where

{embed}_{b_{i j}}^{broadcasting}

is defined as an operation that embeds the perturbation

δ_{l}

into the bounding box region

b_{i j}

via broadcasting and tensor additionin order to align the spatial dimensions of

δ_{l}

with

b_{i j}

. Figure 3 illustrates the process of embedding one

δ_{l}

.

Iterative optimization: The perturbed sample

s_{i}^{'}

is fed into

F_{s}

to calculate the loss of class prediction

L_{target}

with respect to the target class

c_{k}

. Specifically, at each iteration, the current perturbation

δ_{l}

is updated via backpropagation by calculating the gradient

\nabla_{δ_{l}} L_{target}^{l}

, which is followed by a single step of gradient descent to obtain

δ_{l + 1}

for the next iteration. The optimization process is formulated as

δ_{l + 1} \leftarrow δ_{l} - η_{1} \cdot \nabla_{δ_{l}} L_{target}^{l} (F_{s} (s_{i}^{'}), c_{k})

(5)

where

η_{1}

is the learning rate. The optimization stops when the model’s prediction confidence on class

c_{k}

for the perturbed sample exceeds 50%.

Finally, we obtain an initial sensitive object trigger

δ^{k}

with weak semantic features corresponding to class

c_{k}

. The weak semantic features indicate that

δ^{k}

bears a certain similarity to

c_{k}

and lies close to its decision boundary, yet it still remains outside of it. This procedure is formally summarized in Algorithm 1.

4.1.2. Trigger Optimization

To ensure that the sensitive object trigger lies near the model’s decision boundary, we introduced a joint optimization framework involving the original model

F_{o}

and a tampered model

F_{t}

. During the optimization process,

F_{o}

is responsible for maintaining the recognizability of the trigger, while

F_{t}

is used to enhance its fragility. The detailed steps are as follows.

Dynamic training input: Two samples

s_{1}^{k}

and

s_{2}^{k}

are randomly selected from the model training dataset, and the dynamic trigger

δ^{k}

is embedded into a random background region of each sample. The embedded region is labeled as containing an object of class

c_{k}

and assigned a bounding box

{\hat{b}}_{t} = (x_{1}, y_{1}, x_{2}, y_{2})

, where

(x_{1}, y_{1})

and

(x_{2}, y_{2})

denote the top-left and bottom-right coordinates of the bounding box, respectively. The embedding process is defined as

{\hat{s}}_{1}^{k} = s_{1}^{k} + {embed}_{{\hat{b}}_{t}} (δ^{k})

(6)

where

{embed}_{{\hat{b}}_{t}}

denotes the operation that inserts the trigger into region

{\hat{b}}_{t}

using tensor addition.

Algorithm 1: Generation of initial sensitive object trigger

Tampered model construction: In RSOD models, the detection head is responsible for classification and bounding box regression and thus directly affects the model’s detection results. To simulate slight parameter tampering attacks, we randomly select 10% of the weights from the detection head of the original model

F_{0}

, which are denoted as a subset

{w_{1}, w_{2}, \dots, w_{m}} \subseteq W_{head}^{F_{0}}

. Small-scale random Gaussian noise is added to the selected weights to construct a tampered model

F_{t}

. This approach ensures that the overall performance of

F_{t}

remains largely unaffected while effectively simulating subtle and irregular parameter perturbations. The perturbed weights are defined as

w_{i}^{'} = w_{i} + θ_{i} where θ_{i} \sim N (0, 0 . 0033^{2})

(7)

where

θ_{i}

is the additive noise sampled from a normal distribution. During the backpropagation process, the parameters of both

F_{o}

and

F_{t}

are frozen to ensure that the original model remains unchanged throughout optimization.

Joint optimization: The perturbed sample

{\hat{s}}_{1}^{k}

is simultaneously fed into

F_{o}

and

F_{t}

. The output of

F_{o}

is used to calculate

L_{F_{o}}

, which ensures the recognizability of

δ^{k}

. The output of

F_{t}

is used to calculate

L_{F_{t}}

, which enhances the fragility of

δ^{k}

.

Specifically,

{\hat{s}}_{1}^{k}

is fed into

F_{o}

, and the classification loss

L_{cls}

and bounding box loss

L_{bbox}

are calculated with respect to the target class

c_{k}

and bounding box

{\hat{b}}_{t}

, respectively. The prediction loss

L_{F_{o}}

is defined as

L_{F_{o}} = λ_{1} \cdot L_{cls} (F_{o} ({\hat{s}}_{1}^{k}), c_{k}) + λ_{2} \cdot L_{bbox} (F_{o} ({\hat{s}}_{1}^{k}), {\hat{b}}_{t})

(8)

where

λ_{1}

and

λ_{2}

are hyperparameters.

At the same time,

{\hat{s}}_{1}^{k}

is fed into

F_{t}

, and a set of confidence scores for class

c_{k}

is obtained as

{{score}_{i}^{k}}_{i = 1}^{v}

(9)

where v denotes the number of candidate bounding boxes detected for class

c_{k}

. The maximum confidence score in this set is defined as

{score}_{\max} = max ({{score}_{i}^{k}}_{i = 1}^{v})

(10)

The fragility loss

L_{F_{t}}

is defined as

L_{F_{t}} = max (0, {score}_{\max} - τ)

(11)

where

τ

is a threshold. If

{score}_{\max} > τ

, it indicates that

δ^{k}

is likely to be detected by

F_{t}

, and

L_{F_{t}}

serves as a penalty term. Otherwise, if

{score}_{\max} \leq τ

, it means that the tampered model fails to detect

δ^{k}

, suggesting that the optimization direction is reasonable. In this case, the value of

L_{F_{t}}

becomes 0.

Therefore, the joint optimization process for

δ^{k}

is defined as

δ_{l + 1}^{k} \leftarrow δ_{l}^{k} - η_{2} \nabla_{δ_{l}^{k}} (L_{F_{o}} + μ \cdot L_{F_{t}})

(12)

where

δ_{l}^{k}

denotes the trigger in the l-th iteration,

η_{2}

is the learning rate, and

μ

is a hyperparameter.

Iterative optimization: The updated trigger

δ_{l + 1}^{k}

is re-embedded into

s_{1}^{k}

for the next round of joint optimization. The iteration continues until the loss converges,

F_{o}

successfully detects the trigger, and

F_{t}

fails to detect it. Once these conditions are satisfied, the optimization process terminates. As a result, we obtain the final sensitive object trigger

{\hat{δ}}^{k}

and its corresponding fragile watermark sample

{\hat{s}}_{1}^{k}

. Examples of the generated

{\hat{δ}}^{k}

and

{\hat{s}}_{1}^{k}

for airplane, ship, and baseball diamond classes are shown in Figure 4a,b.

The complete procedure is summarized in Algorithm 2. Similarly,

{\hat{s}}_{2}^{k}

is obtained following the same method. By repeating the above procedure for all object classes

{c_{1}, c_{2}, \dots, c_{n}}

, we finally construct the fragile watermark verification dataset

D_{w} = {({\hat{s}}_{1}^{1}, {\hat{s}}_{2}^{1}), ({\hat{s}}_{1}^{2}, {\hat{s}}_{2}^{2}), \dots, ({\hat{s}}_{1}^{n}, {\hat{s}}_{2}^{n})}

.

Algorithm 2: Trigger optimization

4.2. Integrity Verification

In the integrity verification phase, we used the verification accuracy rate of fragile watermark samples, denoted as

A c c_{fragile}

, as a quantitative metric to determine whether the deployed model has been tampered with. Specifically, we queried the deployed model through its API and input the fragile watermark dataset

D_{w}

; we then observed its responses to all samples. Examples of verification responses are presented in Figure 4c. The metric

A c c_{fragile}

is defined as

A c c_{fragile} = \frac{1}{2 n} \sum_{k = 1}^{n} \sum_{j = 1}^{2} I (F_{d} ({\hat{s}}_{j}^{k}) = c_{k}), {\hat{s}}_{j}^{k} \in D_{w}

(13)

where

F_{d}

denotes the deployed RSOD model under inspection, and

I (\cdot)

is the indicator function, which returns 1 if the condition is true and 0 otherwise. Only when

A c c_{fragile} = 1

do we consider the model intact; otherwise, it is regarded as tampered.

5. Experimental Evaluation

5.1. Experimental Setup

Model and dataset settings: In this study, different model architectures and datasets were selected for the surrogate model construction and the algorithm performance evaluation. The DIOR [49] remote sensing dataset was used as the public dataset

D_{P}

to train the surrogate model

F_{s}

, which adopted the classic two-stage object detection framework Faster-RCNN [44]. With regard to the algorithm performance evaluation, to verify the generality and effectiveness of the proposed method across different object detection frameworks, six representative models commonly used in RSOD tasks were selected: YOLOv5l [45], YOLOv5n [45], YOLOv5s [45], YOLOv8 [46], SSD [47], and Faster-RCNN. Among them, YOLOv5l, YOLOv5n, and YOLOv5s are the Large, Nano, and Small variants of the YOLOv5 series, respectively, offering different model capacities and inference speeds suitable for edge scenarios with limited computational resources. YOLOv8 is an improved version in the YOLO series and is widely deployed in practical applications. SSD (Single Shot MultiBox Detector) and Faster-RCNN are classic one-stage and two-stage object detection frameworks, respectively.

Regarding dataset selection, three commonly used RSOD datasets were employed in the experiments: NWPU VHR-10 [50], RSOD24 [51], and LEVIR [52]. Specifically, the NWPU VHR-10 dataset contains 10 categories of remote sensing objects, with a total of 800 images and 3775 object instances; the RSOD24 dataset includes 4 object categories, with 976 images and 6950 instances; and the LEVIR dataset contains 3 categories, with 21,952 images and 11,028 object instances in total. This means that for any model trained on NWPU VHR-10, RSOD24, and LEVIR, the corresponding

D_{w}

contains 20, 8, and 6 fragile watermark samples, respectively.

Training configuration: Our experiments were conducted on a workstation running the Windows operating system equipped with an Intel i7-14700KF processor and an NVIDIA GeForce RTX 4090 GPU. The learning rates

η_{1}

and

η_{2}

used in the optimization process were both set to 0.01.

Parameter settings: The hyperparameters

λ_{1}

and

λ_{2}

were set to 1 and 0.5, respectively;

μ

was set to 0.5, and the threshold

τ

was set to 0.5.

Evaluation metrics: As the proposed method makes no modifications to the original model, there is no need to evaluate its influence on the model’s performance. We used the metric

A c c_{fragile}

, defined in Equation (13), to evaluate the fragility of the proposed method. Only the value of

A c c_{fragile} = 1

indicates that the model has not been tampered with. A lower

A c c_{fragile}

indicates a higher sensitivity of the proposed method to model tampering.

5.2. Uniqueness Analysis

The proposed algorithm generates a fragile watermark verification dataset

D_{w}

for each specific RSOD model, which is uniquely bound to the corresponding RSOD model and cannot be falsely detected by other models performing the same task. In this section, the uniqueness of the proposed algorithm is evaluated. Six models were trained on three different datasets corresponding to different tasks, and a dedicated

D_{w}

was constructed for each model. These

D_{w}

sets were then cross-validated on different models trained for the same task, and the corresponding

A c c_{fragile}

results are shown in Figure 5. The horizontal axis represents the models trained on different datasets, while the vertical axis indicates the

D_{w}

generated for each model. Figure 6 presents verification examples on both corresponding and non-corresponding models. The experimental results demonstrate that

A c c_{fragile}

reached 100% on the corresponding models, indicating that the proposed method can accurately verify the integrity of the model in its original state. In contrast,

A c c_{fragile}

remained at 0% on non-corresponding models, as fragile watermark samples that had not been optimized with the decision boundary guidance of a specific model could not be recognized. These results verify that the proposed algorithm possesses strong uniqueness.

5.3. Effectiveness Analysis

Considering that RSOD models may suffer from various attacks such as backdoor injection, pruning, fine-tuning, parameter perturbation, and quantization compression during deployment, this section evaluates the effectiveness of the proposed method under these scenarios.

5.3.1. Backdoor Injection

In the post-training backdoor injection scenario, the attacker injects backdoors into a pre-trained clean model by fine-tuning it on a small set of poisoned samples containing specific triggers. In this experiment, we adopted three representative backdoor attack methods for object detection, which are BadDet [53], BAWE [54], and PTAVR [55], to simulate model tampering via backdoor injection. For the BadDet method, a poisoned dataset was constructed by embedding a specific trigger pattern into 1% of the clean training samples, which was then used for fine-tuning to obtain the backdoor model. For the BAWE method, a wavelet-based transformation was applied to embed invisible textual triggers into the images—also with a poisoning rate of 1%. In contrast, the PTAVR method adopted a different poisoning strategy, where predefined trigger patterns were embedded into pure white background images to create independent poisoned samples, without modifying any existing training samples. In this setting, 1% of additional poisoned samples were mixed into the original dataset for fine-tuning. Across the experiments involving six models and three datasets, all models were fine-tuned for 10 epochs to achieve backdoor injection. The

A c c_{fragile}

values for different experimental cases under each backdoor attack are summarized in Table 1.

All three backdoor attacks had a significant impact on fragile watermark verification, with the

A c c_{fragile}

value rapidly dropping to low levels or even 0% in most experimental cases. Among them, BadDet and PTAVR exhibited the most severe effects, driving the

A c c_{fragile}

to zero across multiple model and dataset types. In contrast, under BAWE attacks, some models retained a small verification success rate on certain datasets. However, the

A c c_{fragile}

values still showed a clear overall decreasing trend. The results also reveal differences among the models under the same attack. Specifically, Faster-RCNN tended to maintain slightly higher

A c c_{fragile}

values in some scenarios, while for lightweight models such as YOLOv5n and YOLOv5s, their

A c c_{fragile}

values dropped to zero across all datasets and attack types. These results show that the proposed fragile watermarking method is highly sensitive to model tampering introduced by different backdoor attacks.

5.3.2. Fine-Tuning

We conducted experiments on six different models trained on three datasets. For each model–dataset combination, we simulated fine-tuning by extracting 10% of benign samples from the original training dataset and using them to fine-tune the fully connected layers of the model. The fine-tuning was performed for different numbers of epochs to simulate varying levels of parameter perturbation. The

A c c_{fragile}

values under different fine-tuning epochs are shown in Figure 7. The experimental results show that even with a small number of fine-tuning epochs (e.g., epoch ≤ 3), the verification accuracy rate of fragile watermark samples

A c c_{fragile}

had already exhibited a noticeable decline, indicating that the watermark can effectively capture slight model modifications. As the number of fine-tuning epochs increased, the verification accuracy rate continued to drop rapidly across all models, demonstrating strong sensitivity to model tampering. When fine-tuning reached 60 epochs, almost all fragile watermark samples failed to be correctly verified. These results show that the proposed fragile watermarking method can sensitively reflect model tampering under fine-tuning attacks.

5.3.3. Pruning

Pruning, as a malicious tampering attack, reduces the model structure and parameters, which may lead to performance degradation or abnormal behaviors. In this experiment, we adopted a structured and progressive pruning strategy targeting the convolutional layers in the backbone and neck of the detection models. Specifically, we followed a channel pruning approach, where channels with the lowest importance scores, measured by the

ℓ_{1}

-norm of their filter weights, were progressively removed. The pruning ratio started from 0% and was incrementally increased to 60% in steps, simulating different levels of tampering severity. The

A c c_{fragile}

values under different pruning rates are presented in Figure 8. Even when the pruning rate was as low as 0.1% or 0.2%,

A c c_{fragile}

already shows a noticeable decline, indicating that the watermark can promptly detect slight parameter reductions in the model. As the pruning rate increased,

A c c_{fragile}

dropped rapidly across all models, and when the pruning rate exceeded 10%,

A c c_{fragile}

approached zero. These results demonstrate that the proposed fragile watermarking method can sensitively reflect model tampering under pruning attacks and effectively detect even minor structural modifications to the model.

5.3.4. Parameter Perturbation

In the parameter perturbation attack scenario, the attacker directly modifies the model parameters by injecting small-scale noise into the weights. As described in this section, parameter perturbation attacks were also conducted on six models across three datasets. For each trained model, Gaussian noise was added to the parameters of both the BatchNorm layers and the convolutional layers to analyze the impact of different perturbation targets. The Gaussian noise intensities were set to

σ = 0.001

,

0.005

, and

0.01

to simulate varying levels of parameter perturbation. The experimental results are summarized in Table 2.

As the Gaussian noise intensity increased, the

A c c_{fragile}

values gradually decreased to varying degrees, and the perturbation targets also showed a noticeable impact on the watermark verification results. Overall, under the same noise intensity, perturbations applied to convolutional weights had a greater impact on fragile watermark verification compared to those applied to BatchNorm layers. Nevertheless, model tampering under both conditions could be effectively detected. In particular, when the noise intensity reached

σ = 0.01

,

A c c_{fragile}

dropped to nearly zero for all models under both perturbation settings. These results demonstrate that the proposed fragile watermarking method is highly sensitive to even minor parameter modifications and can effectively reflect model integrity risks under subtle perturbations.

5.3.5. Quantization Compression

Quantization compression is a commonly used model compression technique that reduces model storage requirements and computational complexity by lowering the bit width of model parameters. In this section, the quantization compression attack was implemented as follows. For each layer of the RSOD network, the maximum and minimum parameter values were determined to compute the normalization range. The original 32-bit floating-point weights were then normalized to this range corresponding to either 16-bit or 8-bit representation. Finally, the normalized values were rounded to the nearest integers to complete the quantization process. The

A c c_{fragile}

results under quantization compression are summarized in Table 3 for the six models evaluated on different datasets, covering the original (uncompressed) models as well as those quantized to 16-bit and 8-bit precision.

As the quantization bit width decreased, the watermark verification accuracy rate dropped rapidly. Under 16-bit quantization,

A c c_{fragile}

fell below 20% in 11 out of 18 experimental cases. Only in four cases—Faster-RCNN with the RSOD24 dataset and YOLOv5l, YOLOv8, and Faster-RCNN with the LEVIR dataset—did

A c c_{fragile}

remain above 30%, but the results were still below 40%. When the bit width was further reduced to 8-bit,

A c c_{fragile}

dropped to 0% in 10 out of 18 cases and remained below 17% in the remaining 8 cases. These results demonstrate that the proposed fragile watermarking method is highly sensitive to model parameter perturbations introduced by quantization compression and is capable of promptly detecting integrity risks even under minor quantization operations.

6. Conclusions

In this paper, we proposed a fragile watermarking method for RSOD models, enabling efficient black-box integrity verification. The method constructs class-specific sensitive object triggers and adopts a dual-model collaborative optimization strategy to generate fragile watermark samples located near the decision boundary. The proposed method can detect subtle model tampering without requiring access to internal parameters. Extensive experimental results show that the proposed method is highly sensitive to various types of model modifications, including backdoor injection, fine-tuning, pruning, random parameter perturbation, and model compression. Due to its lightweight inference-based design, the approach is well suited for deployment in resource-constrained edge environments. However, the proposed method still has limitations in terms of applicability. It is designed for a fully fragile watermarking scenario and imposes strict constraints on changes to the model’s structure and parameters. As a result, it may not be compatible with certain legitimate model operations in real-world applications, such as slight fine-tuning or quantitative compression. In future work, we will consider various model tampering simulations and explore adaptive thresholding strategies for watermark verification to improve the method’s flexibility in practical deployments.

Author Contributions

Conceptualization, X.X. and W.C.; methodology, X.X.; validation, Z.W. and W.T.; writing—original draft preparation, X.X.; writing—review and editing, W.C. and N.R.; supervision, C.Z.; funding acquisition, W.C. and N.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (No. 42201444, No. 42471440), the Natural Science Foundation of Jiangsu Province (No. BK20240898), and the Open Topic of Hunan Engineering Research Center of Geographic Information Security and Application (No. HNGISA2024003).

Data Availability Statement

The data associated with this research are available online. The NWPU VHR-10 dataset is available at https://www.kaggle.com/datasets/kevin33824/nwpu-vhr-10 (accessed on 18 January 2025). The RSOD dataset is available at https://www.kaggle.com/datasets/kevin33824/rsod24 (accessed on 18 January 2025). The LEVIR dataset is available at https://aistudio.baidu.com/datasetdetail/53714 (accessed on 18 January 2025). The DIOR dataset is available at https://www.kaggle.com/datasets/shuaitt/diordata (accessed on 18 January 2025). Citations are also provided in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, W.; Zhang, X.; Wang, Y.; Wang, L.; Huang, X.; Li, J.; Wang, S.; Chen, W.; Li, X.; Feng, R.; et al. A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS J. Photogramm. Remote Sens. 2023, 202, 87–113. [Google Scholar] [CrossRef]
Zhao, T.; Wang, S.; Ouyang, C.; Chen, M.; Liu, C.; Zhang, J.; Yu, L.; Wang, F.; Xie, Y.; Li, J.; et al. Artificial intelligence for geoscience: Progress, challenges and perspectives. Innovation 2024, 5, 100691. [Google Scholar] [CrossRef]
Gui, S.; Song, S.; Qin, R.; Tang, Y. Remote sensing object detection in the deep learning era—A review. Remote Sens. 2024, 16, 327. [Google Scholar] [CrossRef]
Yang, Y.; Miao, Z.; Zhang, H.; Wang, B.; Wu, L. Lightweight attention-guided YOLO with level set layer for landslide detection from optical satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3543–3559. [Google Scholar] [CrossRef]
Jia, N.; Sun, Y.; Liu, X. TFGNet: Traffic salient object detection using a feature deep interaction and guidance fusion. IEEE Trans. Intell. Transp. Syst. 2023, 25, 3020–3030. [Google Scholar] [CrossRef]
Wang, J.; Shen, T.; Tian, Y.; Wang, Y.; Gou, C.; Wang, X.; Yao, F.; Sun, C. A parallel teacher for synthetic-to-real domain adaptation of traffic object detection. IEEE Trans. Intell. Veh. 2022, 7, 441–455. [Google Scholar] [CrossRef]
Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
Ingle, P.Y.; Kim, Y.G. Real-time abnormal object detection for video surveillance in smart cities. Sensors 2022, 22, 3862. [Google Scholar] [CrossRef]
Munir, A.; Aved, A.; Blasch, E. Situational awareness: Techniques, challenges, and prospects. AI 2022, 3, 55–77. [Google Scholar] [CrossRef]
Xu, Y.; Bai, T.; Yu, W.; Chang, S.; Atkinson, P.M.; Ghamisi, P. AI security for geoscience and remote sensing: Challenges and future trends. IEEE Geosci. Remote Sens. Mag. 2023, 11, 60–85. [Google Scholar] [CrossRef]
Brewer, E.; Lin, J.; Runfola, D. Susceptibility & defense of satellite image-trained convolutional networks to backdoor attacks. Inf. Sci. 2022, 603, 244–261. [Google Scholar]
Mittal, P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices. Artif. Intell. Rev. 2024, 57, 242. [Google Scholar] [CrossRef]
Wu, Y.; Guo, H.; Chakraborty, C.; Khosravi, M.R.; Berretti, S.; Wan, S. Edge computing driven low-light image dynamic enhancement for object detection. IEEE Trans. Netw. Sci. Eng. 2022, 10, 3086–3098. [Google Scholar] [CrossRef]
Wu, X.; Li, W.; Hong, D.; Tao, R.; Du, Q. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geosci. Remote Sens. Mag. 2021, 10, 91–124. [Google Scholar] [CrossRef]
Zhang, J.; Jia, X.; Hu, J.; Tan, K. Moving vehicle detection for remote sensing video surveillance with nonstationary satellite platform. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5185–5198. [Google Scholar] [CrossRef]
Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
Huang, Z.; Yang, S.; Zhou, M.; Gong, Z.; Abusorrah, A.; Lin, C.; Huang, Z. Making accurate object detection at the edge: Review and new approach. Artif. Intell. Rev. 2022, 55, 2245–2274. [Google Scholar] [CrossRef]
Sreenivas, K.; Kamkshi Prasad, V. Fragile watermarking schemes for image authentication: A survey. Int. J. Mach. Learn. Cybern. 2018, 9, 1193–1218. [Google Scholar] [CrossRef]
Jiang, L.; Zheng, H.; Zhao, C. A fragile watermarking in ciphertext domain based on multi-permutation superposition coding for remote sensing image. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5664–5667. [Google Scholar]
Renza, D.; Ballesteros L., D.M.; Lemus, C. Authenticity verification of audio signals based on fragile watermarking for audio forensics. Expert Syst. Appl. 2018, 91, 211–222. [Google Scholar] [CrossRef]
Aberna, P.; Agilandeeswari, L. Digital image and video watermarking: Methodologies, attacks, applications, and future directions. Multimed. Tools Appl. 2024, 83, 5531–5591. [Google Scholar] [CrossRef]
Gao, Z.; Cheng, Y.; Yin, Z. A survey of fragile model watermarking. Signal Process. 2025, 238, 110088. [Google Scholar] [CrossRef]
Guan, X.; Feng, H.; Zhang, W.; Zhou, H.; Zhang, J.; Yu, N. Reversible watermarking in deep convolutional neural networks for integrity authentication. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2273–2280. [Google Scholar]
Zhao, G.; Qin, C.; Yao, H.; Han, Y. DNN self-embedding watermarking: Towards tampering detection and parameter recovery for deep neural network. Pattern Recognit. Lett. 2022, 164, 16–22. [Google Scholar] [CrossRef]
Abuadbba, A.; Kim, H.; Nepal, S. DeepiSign: Invisible fragile watermark to protect the integrity and authenticity of CNN. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Virtual, 22–26 March 2021; pp. 952–959. [Google Scholar]
He, Z.; Zhang, T.; Lee, R. Sensitive-sample fingerprinting of deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4729–4737. [Google Scholar]
Gao, Z.; Tang, Z.; Yin, Z.; Wu, B.; Lu, Y. Fragile Model Watermark for integrity protection: Leveraging boundary volatility and sensitive sample-pairing. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; pp. 1–6. [Google Scholar]
Yin, Z.; Yin, H.; Zhang, X. Neural network fragile watermarking with no model performance degradation. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 3958–3962. [Google Scholar]
Wang, S.; Abuadbba, S.; Agarwal, S.; Moore, K.; Sun, R.; Xue, M.; Nepal, S.; Camtepe, S.; Kanhere, S. Publiccheck: Public integrity verification for services of run-time deep models. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; pp. 1348–1365. [Google Scholar]
Zhao, G.; Qin, C. Black-box Lossless Fragile Watermarking Based on Hidden Space Search for DNN Integrity Authentication. In Proceedings of the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 450–455. [Google Scholar]
Aramoon, O.; Chen, P.Y.; Qu, G. Aid: Attesting the integrity of deep neural networks. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 19–24. [Google Scholar]
Lao, Y.; Zhao, W.; Yang, P.; Li, P. DeepAuth: A DNN authentication framework by model-unique and fragile signature embedding. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 9595–9603. [Google Scholar]
Coatrieux, G.; Pan, W.; Cuppens-Boulahia, N.; Cuppens, F.; Roux, C. Reversible watermarking based on invariant image classification and dynamic histogram shifting. IEEE Trans. Inf. Forensics Secur. 2012, 8, 111–120. [Google Scholar] [CrossRef]
Zhang, X.; Wang, S. Fragile watermarking with error-free restoration capability. IEEE Trans. Multimed. 2008, 10, 1490–1499. [Google Scholar] [CrossRef]
Botta, M.; Cavagnino, D.; Esposito, R. NeuNAC: A novel fragile watermarking algorithm for integrity protection of neural networks. Inf. Sci. 2021, 576, 228–241. [Google Scholar] [CrossRef]
Li, J.; Rakin, A.S.; He, Z.; Fan, D.; Chakrabarti, C. Radar: Run-time adversarial weight attack detection and accuracy recovery. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Virtual, 1–5 February 2021; pp. 790–795. [Google Scholar]
Huang, Y.; Zheng, H.; Xiao, D. Convolutional neural networks tamper detection and location based on fragile watermarking. Appl. Intell. 2023, 53, 24056–24067. [Google Scholar] [CrossRef]
Gao, Z.; Yin, Z.; Zhan, H.; Yin, H.; Lu, Y. Adaptive watermarking with self-mutual check parameters in deep neural networks. Pattern Recognit. Lett. 2024, 180, 9–15. [Google Scholar] [CrossRef]
Xu, G.; Li, H.; Ren, H.; Sun, J.; Xu, S.; Ning, J.; Yang, H.; Yang, K.; Deng, R.H. Secure and verifiable inference in deep neural networks. In Proceedings of the 36th Annual Computer Security Applications Conference, Austin, TX, USA, 7–11 December 2020; pp. 784–797. [Google Scholar]
Kuttichira, D.P.; Gupta, S.; Nguyen, D.; Rana, S.; Venkatesh, S. Verification of integrity of deployed deep learning models using Bayesian optimization. Knowl.-Based Syst. 2022, 241, 108238. [Google Scholar] [CrossRef]
Yin, H.; Yin, Z.; Gao, Z.; Su, H.; Zhang, X.; Luo, B. FTG: Score-based black-box watermarking by fragile trigger generation for deep model integrity verification. J. Inf. Intell. 2024, 2, 28–41. [Google Scholar] [CrossRef]
Yuan, Z.; Zhang, X.; Wang, Z.; Yin, Z. Semi-fragile neural network watermarking for content authentication and tampering localization. Expert Syst. Appl. 2024, 236, 121315. [Google Scholar] [CrossRef]
Yuan, Z.; Zhang, X.; Wang, Z.; Yin, Z. Semi-Fragile Neural Network Watermarking Based on Adversarial Examples. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2775–2790. [Google Scholar] [CrossRef]
Faster, R. Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 9199, 2969239–2969250. [Google Scholar]
Jocher, G. Ultralytics YOLOv5 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 18 January 2025). [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 18 January 2025).
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Chen, W.; Wei, G.; Xu, X.; Xu, Y.; Peng, H.; She, Y. Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization. Symmetry 2024, 16, 1494. [Google Scholar] [CrossRef]
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
Wang, C.; Bai, X.; Wang, S.; Zhou, J.; Ren, P. Multiscale visual attention networks for object detection in VHR remote sensing images. IEEE Geosci. Remote Sens. Lett. 2018, 16, 310–314. [Google Scholar] [CrossRef]
Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z. Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans. Image Process. 2017, 27, 1100–1111. [Google Scholar] [CrossRef]
Chan, S.H.; Dong, Y.; Zhu, J.; Zhang, X.; Zhou, J. Baddet: Backdoor attacks on object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 396–412. [Google Scholar]
Shen, M.; Huang, R. Backdoor attacks with wavelet embedding: Revealing and enhancing the insights of vulnerabilities in visual object detection models on transformers within digital twin systems. Adv. Eng. Informatics 2024, 60, 102355. [Google Scholar] [CrossRef]
Liu, A.; Liu, X.; Zhang, X.; Xiao, Y.; Zhou, Y.; Liang, S.; Wang, J.; Cao, X.; Tao, D. Pre-trained trojan attacks for visual recognition. Int. J. Comput. Vis. 2025, 133, 3568–3585. [Google Scholar] [CrossRef]

Figure 1. Potential model tempering stages in RSOD model deployment.

Figure 2. Overview of sensitive object trigger generation process. (a) Generation of initial triggers with weak semantic features based on a surrogate model. (b) Joint optimization of sensitive object triggers.

Figure 3. Illustration of embedding one

δ_{l}

within

b_{i j}

.

Figure 3. Illustration of embedding one

δ_{l}

within

b_{i j}

.

Figure 4. Illustration of generated sensitive object triggers, fragile watermark samples, and corresponding verification results. The generated (a) sensitive object triggers, (b) fragile watermark samples, and (c) verification results for the airplane, ship, and baseball diamond categories, where red bounding boxes highlight the trigger-embedded regions.

Figure 5. Cross-validation results of fragile watermark verification datasets across different models. Cross-validation on (a) NWPU VHR-10, (b) RSOD24, and (c) LEVIR datasets.

Figure 6. Prediction results of fragile watermark samples on corresponding and non-corresponding models. (a) Predictions on corresponding models. (b) Predictions on non-corresponding models.

Figure 7. Fragile watermark verification under fine-tuning. The variation in

A c c_{fragile}

under different fine-tuning epochs on models (a) YOLOv5l, (b) YOLOv5n, (c) YOLOv5s, (d) YOLOv8, (e) SSD, and (f) Faster-RCNN.

Figure 7. Fragile watermark verification under fine-tuning. The variation in

A c c_{fragile}

under different fine-tuning epochs on models (a) YOLOv5l, (b) YOLOv5n, (c) YOLOv5s, (d) YOLOv8, (e) SSD, and (f) Faster-RCNN.

Figure 8. Fragile watermark verification under pruning. The variation in

A c c_{fragile}

under different pruning rates on models (a) YOLOv5l, (b) YOLOv5n, (c) YOLOv5s, (d) YOLOv8, (e) SSD, and (f) Faster-RCNN.

Figure 8. Fragile watermark verification under pruning. The variation in

A c c_{fragile}

under different pruning rates on models (a) YOLOv5l, (b) YOLOv5n, (c) YOLOv5s, (d) YOLOv8, (e) SSD, and (f) Faster-RCNN.

Table 1. Fragile watermark verification under different backdoor attacks.

Dataset	Backdoor Attack Types	${Acc}_{fragile}$ _(%)
Dataset	Backdoor Attack Types	YOLOv5l	YOLOv5n	YOLOv5s	YOLOv8	SSD	Faster-RCNN
NWPU VHR-10	BadDet	0	0	0	0	0	0
	BAWE	5	0	0	5	0	10
	PTAVR	0	0	0	0	0	0
RSOD24	BadDet	12.5	0	0	0	0	12.5
	BAWE	12.5	0	0	12.5	0	25
	PTAVR	0	0	0	0	0	12.5
LEVIR	BadDet	33.33	0	0	16.67	0	33.33
	BAWE	33.33	0	0	33.33	16.67	50
	PTAVR	16.67	0	0	0	0	16.67

Table 2. Fragile watermark verification under different parameter perturbations.

Dataset	Perturbation Target	Gaussian Noise Intensity	${Acc}_{fragile}$ _(%)
Dataset	Perturbation Target	Gaussian Noise Intensity	YOLOv5l	YOLOv5n	YOLOv5s	YOLOv8	SSD	Faster-RCNN
NWPU VHR-10	BN Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	30	20	25	35	30	35
		$σ = 0.005$	25	15	15	30	25	25
		$σ = 0.01$	0	0	0	0	0	0
	Conv Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	20	5	5	15	10	20
		$σ = 0.005$	5	0	5	5	10	10
		$σ = 0.01$	0	0	0	0	0	0
RSOD24	BN Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	25	12.5	12.5	37.5	0	37.5
		$σ = 0.005$	12.5	0	12.5	12.5	0	12.5
		$σ = 0.01$	0	0	0	0	0	0
	Conv Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	25	12.5	0	25	0	25
		$σ = 0.005$	12.5	0	0	0	0	12.5
		$σ = 0.01$	0	0	0	0	0	0
LEVIR	BN Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	50	16.67	16.67	33.33	16.67	66.67
		$σ = 0.005$	33.33	0	0	16.67	16.67	33.33
		$σ = 0.01$	16.67	0	0	0	0	0
	Conv Layer	Original	100	100	100	100	100	100
		$σ = 0.001$	50	16.67	16.67	33.33	33.33	33.33
		$σ = 0.005$	16.67	0	0	16.67	0	16.67
		$σ = 0.01$	0	0	0	0	0	0

Table 3. Fragile watermark verification under quantization compression.

Dataset	Quantization Compression	${Acc}_{fragile}$ _(%)
Dataset	Quantization Compression	YOLOv5l	YOLOv5n	YOLOv5s	YOLOv8	SSD	Faster-RCNN
NWPU VHR-10	Original	100	100	100	100	100	100
	16-bit	15	0	5	10	5	20
	8-bit	10	0	0	5	0	10
RSOD24	Original	100	100	100	100	100	100
	16-bit	25	0	12.5	25	12.5	37.5
	8-bit	12.5	0	0	0	0	12.5
LEVIR	Original	100	100	100	100	100	100
	16-bit	33.33	0	16.67	33.33	16.67	33.33
	8-bit	16.67	0	0	0	16.67	16.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, X.; Wang, Z.; Chen, W.; Tang, W.; Ren, N.; Zhu, C. Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models. Remote Sens. 2025, 17, 2379. https://doi.org/10.3390/rs17142379

AMA Style

Xu X, Wang Z, Chen W, Tang W, Ren N, Zhu C. Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models. Remote Sensing. 2025; 17(14):2379. https://doi.org/10.3390/rs17142379

Chicago/Turabian Style

Xu, Xin, Zihao Wang, Weitong Chen, Wei Tang, Na Ren, and Changqing Zhu. 2025. "Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models" Remote Sensing 17, no. 14: 2379. https://doi.org/10.3390/rs17142379

APA Style

Xu, X., Wang, Z., Chen, W., Tang, W., Ren, N., & Zhu, C. (2025). Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models. Remote Sensing, 17(14), 2379. https://doi.org/10.3390/rs17142379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sensitive Object Trigger-Based Fragile Watermarking for Integrity Verification of Remote Sensing Object Detection Models

Abstract

1. Introduction

2. Related Works

2.1. Fragile Model Watermarking with White-Box Verification

2.2. Fragile Model Watermarking with Black-Box Verification

2.3. Remote Sensing Object Detection

3. Problem Statement and Threat Model

3.1. Problem Statement

3.2. Threat Model

4. Proposed Method

4.1. Fragile Watermark Verification Dataset Generation

4.1.1. Generation of Initial Sensitive Object Trigger

4.1.2. Trigger Optimization

4.2. Integrity Verification

5. Experimental Evaluation

5.1. Experimental Setup

5.2. Uniqueness Analysis

5.3. Effectiveness Analysis

5.3.1. Backdoor Injection

5.3.2. Fine-Tuning

5.3.3. Pruning

5.3.4. Parameter Perturbation

5.3.5. Quantization Compression

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI