SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks

Pan, Shaoliang; Yin, Xiaojun; Ding, Mingrui; Liu, Pengshuai

doi:10.3390/electronics14091853

Open AccessArticle

SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks

School of Information Science and Technology, Shihezi University, Shihezi 832003, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1853; https://doi.org/10.3390/electronics14091853

Submission received: 15 March 2025 / Revised: 24 April 2025 / Accepted: 25 April 2025 / Published: 1 May 2025

Download

Browse Figures

Versions Notes

Abstract

Ensuring the security of remote sensing images is essential to prevent unauthorized access, tampering, and misuse. Deep learning-based digital watermarking offers a promising solution by embedding imperceptible information to protect data integrity. This paper proposes SIR-DCGAN, an attention-guided robust watermarking method for remote sensing image protection. It incorporates an IR-FFM feature fusion module to enhance feature reuse across different layers and an SE-AM attention mechanism to emphasize critical watermark features. Additionally, a noise simulation sub-network is introduced to improve resistance against common and combined attacks. The proposed method achieves high imperceptibility and robustness while maintaining low computational cost. Extensive experiments on both remote sensing and natural image datasets validate its effectiveness, with performance consistently surpassing existing approaches. These results demonstrate the practicality and reliability of SIR-DCGAN for secure image distribution and copyright protection.

Keywords:

remote sensing image security protection; digital watermarking; deep convolutional generation adversarial network; attention mechanism; feature fusion

1. Introduction

In the field of image security [1], guaranteeing the reliability and authenticity of transmitted data is fundamental to safeguarding information. Although numerous protective techniques have been suggested, such as image encryption, these approaches often prove inadequate when applied to remote sensing images. While encryption can successfully block unauthorized access, it frequently disrupts subsequent data analysis and practical use. On the other hand, digital watermarking [2] provides a more sophisticated solution by embedding invisible marks that maintain visual quality while allowing for copyright verification.

This issue is particularly evident in the realm of remote sensing. Remote sensing images, which are used in satellite monitoring, military intelligence, and environmental observation, are typically high-resolution and multispectral, meaning that even slight distortions or unauthorized modifications can result in substantial misinterpretations. Unlike conventional multimedia images, remote sensing data requires a security solution that minimizes perceptual degradation while maintaining resilience to various distortions, channel noise, and deliberate attacks. In this regard, digital watermarking presents a compelling solution, offering a balance between robustness, embedding capacity, and imperceptibility.

In recent years, digital watermarking has garnered significant academic attention as a means of copyright protection for remote sensing imagery. A variety of algorithms specifically designed for this purpose have emerged, many of which are rooted in traditional frequency-based approaches, including those employing cosine domain transformations, multi-scale wavelet analysis, and matrix factorization techniques such as singular value decomposition (SVD) [3,4,5]. These methods utilize spectral transformations and decomposition principles to achieve improvements in the imperceptibility, resilience, and security of the watermark. However, conventional watermarking strategies still encounter multiple challenges in real-world scenarios.

Susceptibility to distortion: many classical techniques show weak resistance to common perturbations, including compression artifacts (e.g., JPEG), random noise interference, and spatial transformations, resulting in the partial or complete degradation of embedded watermark signals.
Low adaptability of handcrafted features: these solutions typically depend on manually engineered embedding patterns and robustness heuristics, which are often insufficient for the high complexity and variability inherent in remote sensing imagery and the diverse threat models involved.
High computational demands: some watermarking systems, although robust, impose significant computational overhead, limiting their feasibility for real-time use in large-scale remote sensing applications.

Recent breakthroughs in deep learning have opened up novel avenues for improving the robustness, imperceptibility, and adaptive capabilities of digital watermarking systems. A growing body of research in image processing demonstrates the potential of deep learning models to surpass the limitations of conventional watermarking approaches. Deep learning-based watermarking techniques for remote sensing images can generally be categorized into two main branches. Among them, convolutional neural networks (CNNs) [6] and generative adversarial networks (GANs) [7] have become prominent tools in this domain. One of the pioneering works by Haribabu et al. [8] introduced a CNN-based model that embedded watermark information through an encoder–decoder structure inspired by autoencoders. Subsequently, Mun et al. [9] introduced an iterative learning framework built upon CNNs, further boosting watermark robustness. These neural network-based solutions can be viewed as evolved variants of frequency-domain approaches and are now integral to many modern watermarking strategies. In a representative example, Ahmadi et al. [10] presented ReDMark, an end-to-end framework employing residual CNNs and incorporating simulated attacks as differentiable transformations. This technique enhances both security and robustness by dispersing watermark signals across a broader spatial domain. Meanwhile, Luo et al. [11] introduced a deep-learning watermarking method that does not rely on predefined distortion models during training. Instead, it gains resilience through adversarial learning combined with channel coding, demonstrating superior generalization against unforeseen perturbations. Despite these advancements, difficulties persist in accurately extracting semantically meaningful image features and effectively integrating them into the watermark embedding process, which continues to constrain overall performance under noise and attack scenarios.

In the realm of generative adversarial networks (GANs), a wide array of image watermarking approaches have been proposed, leveraging the adversarial interplay between generator and discriminator components to achieve a favorable balance among embedding strength, stealth, and robustness [12]. One of the earliest milestones in this direction is the HiDDeN framework introduced by Zhu et al. [13], which adopts an encode-perturb-decode architecture. In this setup, the encoder embeds watermark signals into images with minimal visual disruption, while the decoder is trained to accurately retrieve the watermark even under distortions. Hao et al. [14] explored GANs specifically in remote sensing contexts by designing a model where the generator is responsible for both embedding and recovering watermark information under varying noise levels, while the discriminator not only detects watermark presence but also introduces simulated degradations. To encourage watermark placement in frequency bands less noticeable to the human eye, a high-pass filter was applied prior to the discriminator, thereby improving both stealth and durability.

Building on these efforts, Liu et al. [15,16,17,18] devised a modular dual-stage learning pipeline consisting of a clean-condition adversarial training phase followed by a distortion-aware refinement process. This architecture integrates a deeply layered redundant encoding module [19] that enhances information redundancy during embedding and maintains resilience under challenging noise conditions. Jia et al. [20] further tackled real-world degradation scenarios by introducing a mini-batch training approach that incorporates both real and synthetically compressed images using JPEG-based artifacts, thereby improving generalization. In a similar vein, Huang et al. [21] advanced the field with an attention-driven adversarial watermarking framework (ARWGAN), where attention layers actively guide watermark placement toward regions of lower perceptual significance. Lastly, Fernandez et al. [22] proposed a fully self-supervised embedding paradigm, which uses unsupervised feature extraction under various data augmentations to generate robust, fused representations of image features and watermark signals, enhancing overall stability across perturbations.

The aforementioned research demonstrates that convolutional and generative adversarial network-based watermarking models can effectively integrate the stages of embedding and retrieval into a unified learning framework. This end-to-end optimization bypasses the constraints imposed by manually engineered features in conventional approaches. Nonetheless, the majority of such methods have been developed for general-purpose natural images, leaving the domain of remote sensing relatively underexplored. Remote sensing imagery presents a set of distinct challenges, including ultra-high spatial resolution, multispectral content, diverse scene structures, and the demand for robustness under dynamically changing conditions, which differentiate it substantially from standard image datasets. These unique demands highlight the necessity for watermarking strategies specifically tailored to the characteristics of remote sensing data. Accordingly, this work is dedicated to addressing that gap through targeted algorithmic innovation.

Existing GAN-based watermarking methods, including HiDDeN, ReDMark, and ARWGAN, typically target general image domains and lack specific adaptation to remote sensing scenarios. In contrast, our method leverages a DCGAN-based architecture enhanced with a multi-scale feature fusion module and attention mechanisms tailored for the challenges of high-resolution remote sensing images. In particular, deep convolutional generative adversarial networks (DCGANs) offer unique advantages over conventional CNNs and GANs in remote sensing image watermarking. A DCGAN combines the feature extraction strength of CNNs with the adversarial learning mechanism of GANs, making it more effective in image generation and classification tasks. Unlike GANs’ fully connected networks, A DCGAN introduces convolutional and transposed convolutional layers in both the generator and discriminator, enabling superior multi-scale feature extraction suited to the high-resolution and multispectral nature of remote sensing images.

A DCGAN’s adversarial training mechanism enhances both the imperceptibility and robustness of embedded watermarks. It also improves training stability and image quality, ensuring resistance to noise, cropping, compression, and other common attacks while preserving watermark integrity. These attributes make the DCGAN a promising solution for secure remote sensing image protection.

Recent studies have demonstrated the DCGAN’s significant potential in remote sensing applications. Sürücü et al. [23] developed a multispectral DCGAN (MS-DCGAN) model to generate synthetic multispectral images and proposed a TransStacking model to distinguish between fake and real images with high accuracy. Wu et al. [24] introduced a DCGAN-based image colorization method for remote sensing imagery, incorporating multi-scale convolution and residual networks to enhance both visual quality and quantitative metrics. Jia et al. [25] combined a DCGAN with graph convolutional networks (GCNs) to develop the DGCGAN model, improving classification accuracy and robustness. Zhu et al. [26] enhanced sample diversity in aquaculture information extraction using an improved DCGAN algorithm. Shanmugam et al. [27] proposed DcGAN-HAD, a dual-discriminator conditional DCGAN for anomaly detection in hyperspectral images, which significantly improved detection accuracy and efficiency through optimized training.

These advances illustrate the DCGAN’s promising potential in remote sensing image watermarking, particularly in feature extraction, image generation, and resilience under complex conditions. This study adopts a DCGAN-based approach to explore robust, imperceptible, and efficient watermarking solutions for remote sensing imagery.

Recent advancements in image protection, such as the semantically enhanced selective encryption method proposed by Liu et al. [28], leverage semantic segmentation and parallel computing to achieve efficient visual obfuscation. While effective for privacy-preserving applications, such encryption-based approaches often modify pixel values and risk destroying spatial–spectral integrity, making them less suitable for scenarios requiring scientific analysis or image fidelity, such as remote sensing.

In contrast, Zhang et al. [29] introduced a frequency-domain attention-guided adaptive watermarking model that improves robustness under signal distortions by embedding in the DCT domain. However, the fixed-frequency embedding approach may struggle with high-frequency variability in real-world remote sensing images. Furthermore, frequency-domain techniques often lack spatial adaptability, which limits their ability to preserve semantic consistency across multi-scale geospatial regions.

To address these limitations, our method employs a spatial-domain generative strategy using a DCGAN enhanced with a multi-scale feature fusion module and attention-guided embeddings. This design enables our framework to dynamically adapt watermark placement based on spatial semantics while preserving spectral integrity, traits that are particularly critical for remote sensing applications.

In conclusion, while frameworks like HiDDeN have made notable strides in improving the imperceptibility of embedded watermarks, they often fall short in extracting semantically rich and robust features, especially when applied to remote sensing data. These models typically fail to make full use of the distinctive spectral and spatial patterns inherent in remote sensing imagery, which limits their ability to withstand noise interference and ultimately undermines watermark reliability. To overcome these shortcomings, we propose a novel watermarking framework for remote sensing images by incorporating both an attention-driven mechanism and a multi-scale feature fusion strategy within a deep convolutional GAN architecture. Rather than relying on traditional feature pipelines, our method seeks to enhance the depth and robustness of learned representations, thereby improving resistance to various noise perturbations. By jointly optimizing feature extraction and adaptive embedding through specialized modules, this work offers a more resilient and context-aware solution for protecting the integrity of remote sensing imagery.

To tackle the limited feature representation and suboptimal robustness observed in existing deep learning-based watermarking techniques, this study puts forward a novel robust watermarking framework tailored for remote sensing imagery. The method integrates an attention-driven strategy with multi-scale feature fusion, all constructed within a deep convolutional generative adversarial network paradigm. Compared to conventional approaches, this work makes the following key contributions:

We design a unified framework for watermarking remote sensing images by coupling a DCGAN-based backbone with an attention enhancement module. This combination not only improves the imperceptibility of the embedded watermark but also strengthens its resilience against degradation. The incorporation of a domain-aware feature extraction unit allows for more effective utilization of spatial–spectral patterns during watermark encoding.
A dedicated multi-level feature fusion module is proposed to boost the stability and anti-perturbation capacity of the embedding process. By capturing hierarchical remote sensing image features across scales, this module ensures that the watermark is distributed in a way that balances invisibility and robustness, particularly under compression and noise-based attacks.
An adaptive attention mechanism is incorporated to refine the spatial distribution of the watermark. This component automatically identifies less perceptually sensitive yet semantically stable regions, allowing the watermark to be embedded in areas where it remains concealed while preserving robustness under complex image transformations.
The decoding architecture is optimized to better withstand interference from noisy or lossy environments. By refining the extraction pipeline, the framework achieves reliable watermark recovery even under severe compression or transmission noise, thereby enhancing real-world applicability.

Comprehensive experimental validation shows that the proposed method consistently surpasses prior models across multiple evaluation metrics, including imperceptibility, robustness, and resistance to distortion. Notably, it maintains stable performance under adversarial conditions such as JPEG compression and noise contamination, demonstrating its viability for robust and secure remote sensing image copyright protection.

2. Methods

In this study, we present SIR-DCGAN, a remote sensing image watermarking strategy built upon a deep convolutional generative adversarial framework. The proposed model is designed to strike a dual objective: enhancing both the imperceptibility and resilience of embedded watermarks. The DCGAN backbone is selected for its reliable adversarial training behavior and proven capacity in producing high-fidelity images. Within this architecture, the generator module embeds the watermark into the input image, while the discriminator learns to identify discrepancies between authentic and watermarked versions. Through this adversarial interplay, the generator is incentivized to produce outputs that are visually indistinct from the original inputs while remaining resilient under distortion.

To achieve a refined balance between accurate watermark insertion, robustness during retrieval, and minimal visual disruption to the host image, we construct a cascaded architecture comprising an embedding module, a simulated attack unit, and a decoding network. These components are trained jointly in an end-to-end fashion, allowing the system to co-optimize visual quality and robustness throughout the learning process. To overcome the limitations of conventional approaches in capturing multi-scale semantics, we incorporate the Inception-ResNet [30] feature fusion module (IR-FFM), which blends multi-scale convolution filters with cross-layer dense connections. This design promotes more effective interaction between shallow and deep features, accelerating convergence and strengthening the resistance of the watermark to noise perturbations.

In addition, to minimize visual degradation typically caused by embeddings, we develop a squeeze-and-excitation attention module (SE-AM). This component generates global attention masks by extracting context-aware features, thus guiding the embedding process toward texture-dense yet perceptually tolerant regions. This selective reinforcement of embedding zones significantly improves the visual fidelity of the output image while preserving the watermark.

To further enhance robustness under compound attack scenarios, a dedicated noise simulation network is introduced. This module emulates a variety of real-world distortions, including additive noise, compression artifacts, and cropping, and injects them into the training pipeline. The model receives feedback from these simulated distortions during training, enabling it to iteratively refine its embedding strategy and adapt to evolving threat patterns. Through this dynamic optimization mechanism, the watermarking model becomes increasingly resilient to diverse forms of interference.

Quantitative evaluations confirm that the proposed framework delivers superior performance under a wide range of attack conditions, including JPEG compression, random noise, and hybrid perturbations. At the same time, it maintains high visual quality and efficient embeddings, offering a practical and robust solution for securing remote sensing image content.

SIR-DCGGAN achieves high-quality watermark embedding and accurate extraction through the joint optimization of an encoder, a noise sub-network, a decoder, and a discriminator. During implementation, the original input

I_{c o}

and the watermark

M

are fed and processed by the encoder to output a watermarked image

I_{e n}

. Subsequently, after

I_{e n}

undergoes attacks via the noise layer, it is input to the decoder to extract the embedded watermark

M

. Simultaneously, the discriminator classifies both the host image and the watermarked image to ensure the quality of the watermark embedding. Throughout the network training process, by optimizing the generation loss, decoding loss, and discrimination loss, the model simultaneously achieves imperceptible watermark embedding and highly robust extraction.

The total loss guiding the training process is constructed as follows:

L_{a l l} = λ \times L_{a d v e r s a r i a l - l o s s} + M S E (y, x_{h}) + M S E (x_{s}, x_{s}^{'})

(1)

Here, the coefficient

λ

serves to balance the influence of the adversarial loss within the total objective. The term

M S E (y, x_{h})

measures the mean squared error between the original host image

x_{h}

and the generated watermarked image

y

, thereby constraining the degree of distortion introduced during watermark embedding and preserving visual fidelity. Similarly,

M S E (x_{s}, x_{s}^{'})

evaluates the discrepancy between the watermarked image before attack

x_{s}

and its degraded version

x_{s}^{'}

following noise perturbation, thus quantifying the model’s resistance to distortion and its ability to retain watermark information. The formulation for the adversarial loss component,

L_{a d v e r s a r i a l - l o s s}

, is given as follows:

L_{a d v e r s a r i a l - l o s s} = - (y l o g (D (x_{s})) + (1 - y) l o g (1 - D (G (x_{s}^{'}))))

(2)

where

D (\cdot)

and

G (\cdot)

denote the outputs of the discriminator and generator networks, respectively;

y

is the ground-truth label;

D (x_{s})

represents the probability assigned by the discriminator to the real watermarked image

x_{s}

; and

D (G (x_{s}^{'}))

denotes the probability assigned to the generated watermarked image

G (x_{s}^{'})

.

The design of the loss components simultaneously constrains visual fidelity and promotes embedding resilience, aiming to minimize perceptual differences between the original and watermarked images while enhancing resistance to distortions. This joint optimization strategy ensures that the generated watermarked outputs maintain both high quality and robustness.

In summary, by incorporating the SE-AM and IR-FFM modules, as illustrated in Figure 1, SIR-DCGGAN not only effectively enhances the invisibility and embedding robustness of the watermark, but also significantly improves the model's adaptability to complex attack scenarios. Experimental results demonstrate that, compared with the state-of-the-art (SOTA) watermarking model, SIR-DCGGAN exhibits marked advantages in both watermark quality and resistance to attacks.

2.1. Encoder

The encoder structure is shown in Figure 2. The input consists of a host image

I_{i n}

of size

C \times H \times W

and a binary watermark

M \in {(0, 1)}^{L}

of length

L

. The output is the watermarked image

I_{e n}

with the same dimensions as

I_{i n}

. The core modules include the squeeze-and-excitation network attention module and the Inception-ResNet feature fusion module.

The squeeze-and-excitation network attention module generates an attention mask

M_{A}

to optimize the watermark embedding region, balancing watermark imperceptibility and robustness. This module extracts deep features to generate

M_{A}

, guiding watermark embedding into texture-rich but visually inconspicuous areas, thereby reducing distortion and enhancing robustness. The module consists of

{L a y e r}_{v}

, which extracts deep features, and

{L a y e r}_{A}

, which generates the probability distribution of feature channels via SoftMax, ultimately producing

M_{A}

.

IR-FFM extracts both shallow and deep features through dense connections, integrating them with the watermark to improve robustness.

{L a y e r}_{D}

extracts and reuses image features, while

{L a y e r}_{F}

further refines the feature distribution, ultimately generating the watermarked features

F_{W}

Combined with

M_{A}

, the adjusted features are represented as

F_{A W} = F_{W} \times M_{A}

. The encoder ensures high-quality watermark embedding by minimizing the following loss function:

L_{E} = M S E (I_{i n}, I_{e n}) = | | I_{i n} - I_{e n} {| |}_{2}^{2} / (C \times H \times W)

(3)

Here,

I_{i n}

and

I_{e n}

denote the original host image and the corresponding watermarked version. The term

| | I_{i n} - I_{e n} {| |}_{2}^{2}

calculates the Euclidean distance in pixel space, capturing the visual perturbation caused by embedding the watermark. To maintain consistency across images of different dimensions, this loss is normalized by the total number of pixels,

C \times H \times W

. Minimizing the embedding loss

L_{E}

guides the encoder to retain the visual fidelity of the host image, ensuring that the watermark remains imperceptible while sustaining robustness under potential attacks. This design contributes to producing high-quality watermarked images with minimal visual artifacts.

Table 1 shows the architecture of the generator in the proposed SIR-DCGAN framework. The network begins with a latent noise vector, which is passed through a series of transposed convolutional layers to generate a 64 × 64 output image. Two custom-designed modules—IR-FFM and SE-AM—are integrated after the first and second deconvolutional layers, respectively, to enhance feature fusion and attention allocation. The IR-FFM module improves multi-scale semantic retention, while the SE-AM module adaptively highlights regions suitable for watermark embedding. This architecture balances robustness and imperceptibility while maintaining a lightweight design.

2.2. Noise Network

The noise sub-network applies multiple image distortions to the watermarked image

I_{e n}

, including Crop, Cropout, Dropout, JPEG compression, Gaussian blur, and Rotate, to enhance the model’s robustness against image degradation. Specifically, the Identity operation denotes no attack, where

I_{e n}

remains unchanged. The Crop operation randomly removes a square region of the image with a proportion of

P

. Cropout and Dropout replace the cropped or randomly selected pixel regions with the corresponding regions from the host image

I_{c o}

. Gaussian blur applies a convolution kernel with

w i d t h r = 3 a n d v a r i a n c e σ = 1

to blur

I_{e n}

. The Rotate operation rotates

I_{e n}

counterclockwise by

α = 15 °

. JPEG compression is performed using a quality factor of Q ∈ (0, 100).

Due to the non-differentiability of quantization and rounding operations in JPEG compression during training, a differentiable approximation is required for gradient propagation. Common approaches, such as JPEG-Mask and JPEG-Drop, randomly zero out high-frequency discrete cosine transform (DCT) coefficients. Furthermore, to approximate the rounding operation and ensure differentiability, the following transformation is applied:

{⌊ x ⌋}_{a p p r o x} = ⌊ x ⌋ + (x - ⌊x⌋)^{3}

(4)

where

⌊ x ⌋

denotes the floor function and

(x - ⌊x⌋)^{3}

smooths the rounding operation, ensuring differentiability for gradient propagation. This approximation smooths the rounding function, enabling gradient-based optimization in the presence of JPEG compression.

2.3. Decoder

The decoder structure is illustrated in Figure 3. It operates as the inverse process of the encoder, excluding the attention mechanism (AM), and primarily consists of multiple Inception-ResNet modules to extract the watermark from the distorted watermarked image

I_{n o}

. Since the watermark is redundantly embedded across the entire image, where a single bit of information corresponds to modifications in multiple pixels, the Inception-ResNet module is introduced to enhance extraction accuracy. This module integrates convolutional layers with different kernel sizes to capture multi-scale feature information, improving the recovery of the extracted watermark

M^{'}

. The decoder optimizes its parameter

θ_{D}

by minimizing the decoding loss

L_{D}

, defined as

L_{D} = M S E (M, M^{'}) = | | M - M^{'} {| |}_{2}^{2} / L

(5)

Here,

M

refers to the watermark prior to embedding, while

M^{'}

corresponds to the one retrieved after decoding, and

L

denotes the bit length of the full watermark sequence. This component of the loss function calculates the average squared deviation between the original and extracted watermark data, effectively assessing the decoder’s recovery precision. Minimizing

L_{D}

guides the decoder to achieve reliable extraction performance, maintaining high reconstruction integrity even when the watermarked image undergoes distortion or degradation.

2.4. Discriminator

Figure 4 shows the architecture of the discriminator, which receives both the host image

I_{c o}

and the watermarked image

I_{e n}

as inputs. These images undergo three convolutional layers, each with 64 filters and a

3 \times 3

kernel, followed by global average pooling and a final fully connected layer. The output is a value between 0 and 1, where values near 1 indicate a host image, while values close to 0 correspond to a watermarked version. To stabilize the training process, spectral normalization (SN) is applied. Throughout the training, the encoder functions as the generator, aiming to produce watermarked images that are indistinguishable from the original host images, making it challenging for the discriminator to differentiate between the two.

The generator loss

L_{G}

and the classification loss

L_{A}

are defined as follows:

L_{G} = l o g (1 - A {(I}_{e n}))

(6)

L_{A} = l o g (1 - A {(I}_{c o})) + l o g (A {(I}_{e n}))

(7)

where

I_{c o}

denotes the host image,

I_{e n}

represents the watermarked image, and

A (\cdot)

refers to the discriminator’s output, which estimates the likelihood that an image is a host image. The generator loss

L_{G}

encourages the encoder to create watermarked images that closely resemble the host images by minimizing the discriminator’s ability to correctly classify them. Simultaneously, the classification loss

L_{A}

ensures that the discriminator learns to distinguish between host and watermarked images effectively. Minimizing both losses leads to improvements in the visual quality of the watermarked images while also enhancing the discriminator’s ability to perform accurate classifications.

Table 2 outlines the architecture of the discriminator in the proposed SIR-DCGAN model. It adopts a standard DCGAN-based design, which progressively downsamples the input image through a series of convolutional layers. Each layer is followed by LeakyReLU activation and batch normalization (except the first), ensuring stable training and effective feature extraction. The final sigmoid output produces a probability indicating whether the input image is real or generated. This architecture supports the adversarial training process to improve watermark imperceptibility and authenticity.

3. Experiments

The experimental procedure in this work is organized into several stages. In the initial phase, the Cats and Dogs dataset [31] and the SIRI-WHU remote sensing image dataset [32] are utilized for training. Of the total images, 80% are designated for training, and the remaining 20% are set aside for testing. To assess the model’s ability to generalize, additional datasets like COCO [33], VOC2012 [34], and CelebA [35] are incorporated. The proposed method is designed to handle grayscale images of arbitrary dimensions for both watermarking and covering. These images are resized to a uniform 128 × 128 resolution and normalized to a pixel range of 0 to 1.

In the model training phase, the SIR-DCGAN, implemented in PyTorch 2.0, is trained with the Adam optimizer and a learning rate of 0.0003. The model’s performance is evaluated using PSNR and SSIM for watermark invisibility, while normalized correlation (NC) is employed to assess robustness against image processing attacks. The results show that SIR-DCGAN achieves strong performance, with PSNR and SSIM values of 40.93 dB and 0.9957, respectively, indicating minimal perceptible difference between the watermarked and original images. Additionally, the model’s average processing time is 0.37 s, surpassing other similar methods.

A comparison with other existing methods demonstrates that SIR-DCGAN consistently achieves superior PSNR and SSIM scores, while its NC values stay above 0.9 across a range of image processing attacks, confirming its robustness. These experimental findings collectively validate the effectiveness of SIR-DCGAN for remote sensing image copyright protection and content authentication, highlighting its exceptional performance with low computational overhead.

Figure 5 provides a schematic overview of the watermark embedding process, highlighting the imperceptibility of SIR-DCGGAN. It includes the original image, the embedded watermark, a representation of the attention mechanism, and the final watermarked image. As illustrated in Figure 5a,d, the original and watermarked images appear nearly identical, demonstrating the effective concealment of the watermark. Figure 5c visualizes the attention mechanism applied to various regions, with different colors signifying varying levels of attention. For instance, red areas, which are rich in texture, are designated for higher watermark embedding intensity, while other areas receive lower intensity. The embedding process is outlined in steps (a) through (d). At the receiver’s end, the concealed watermark serves as proof of authenticity and document integrity, allowing verification without ambiguity.

This experiment uses noise attacks to validate the robustness of the proposed method. The first row of Figure 6 shows the watermarked image

I_{e n}

after watermark embedding into the host image under joint attacks from multiple noise layers. The second row displays the corresponding noisy image

I_{c o}

obtained after the watermark image

I_{e n}

undergoes noise layer attacks. The third row presents the difference image

| I_{e n} - I_{c o} |

, with the grayscale space [36] appropriately magnified.

This study employs a two-stage training approach, where the decoder parameters are fixed. This method allows real JPEG compression attacks to be directly integrated into the training process, enhancing the model’s robustness against actual JPEG compression. By incorporating these compression distortions during training, the model learns to effectively handle real-world JPEG compression, improving its ability to maintain watermark integrity under such conditions.

The hyperparameters [37] used for network training are summarized in Table 3.

4. Results and Discussion

This section presents a detailed performance assessment of the proposed watermarking approach across various evaluation criteria.

To begin with, we assess the watermark imperceptibility by calculating PSNR and SSIM scores and benchmark the results against the approaches proposed by Ahmadi et al., Jia et al., and Huang et al.

Subsequently, we examine the model’s resilience to noise by computing the average watermark embedding accuracy (BA) and contrast it with the outcomes achieved by Mahapatra et al., Jia et al., and Madhu et al., illustrating its robustness under diverse interference scenarios.

Lastly, the model’s computational efficiency is evaluated through its execution time, with a comparative analysis against other existing techniques [38], underscoring both its practicality and speed.

4.1. Invisibility

To assess the perceptual transparency of the proposed watermarking technique, we utilize two standard evaluation indicators: the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM). These indicators are commonly applied to determine how closely the watermarked image approximates the original, thereby measuring the degree of visual distortion introduced during watermark insertion. A higher value in either metric typically signifies that the embedded watermark causes minimal perceptible change, preserving the overall visual fidelity of the host image. PSNR quantifies the difference in pixel values between the original and watermarked images, while SSIM assesses the structural similarity by considering luminance, contrast, and texture. Together, these metrics provide a comprehensive evaluation of watermark invisibility, ensuring that the watermarking process maintains high visual quality.

The choice of evaluation metrics in this work is grounded in both their theoretical robustness and frequent citation in the literature on general and remote sensing image watermarking. PSNR serves to capture signal fidelity at the pixel level, making it well suited for assessing minor distortions. SSIM, by contrast, focuses on preserving spatial structures, which is critical for images rich in textures and man-made features, as often seen in remote sensing applications. Additionally, normalized correlation (NC) is employed to quantify the consistency between the recovered watermark and its original version, especially when the image has been subjected to various forms of degradation. Together, PSNR, SSIM, and NC offer a multi-perspective evaluation of both watermark invisibility and durability. This comprehensive set of metrics ensures that the performance of the proposed watermarking method is rigorously assessed across multiple dimensions. While PSNR and SSIM predominantly measure visual quality, NC highlights the robustness of watermark retrieval, ensuring that both aspects are adequately considered in the evaluation process. Consequently, these metrics provide a balanced approach to gauge the efficacy of the watermarking technique in real-world applications.

To demonstrate the effectiveness of the proposed model, we conducted an experimental verification of the watermark invisibility using both PSNR and SSIM metrics. Table 2 presents the comparison results between our method and existing state-of-the-art (SOTA) watermarking models. We compared the performance of methods proposed by Ahmadi et al. [10], Jia et al. [20], Huang et al. [21], Madhu et al. [39], Wei et al. [40], and Mahapatra et al. [41], using PSNR (in dB) and SSIM as evaluation criteria to verify the improvement of watermark invisibility achieved by the network structure proposed in this study.

Our method shows the best PSNR and SSIM values on the SIRI-WHU remote sensing image dataset, with values greater than 39.99 and 0.9958, respectively, outperforming other datasets. This indicates that our method is more compatible with remote sensing images, which often contain complex textures and spatial features that are critical in this domain. Although the PSNR value on the Cats and Dogs dataset and the SSIM value on the CelebA dataset are relatively lower compared to other datasets, they still represent improvements over existing methods. These differences can be attributed to the unique characteristics of each dataset, such as variations in image content, resolution, and complexity. Therefore, these lower values do not imply poor performance; rather, they suggest that the method performs slightly less well on these datasets compared to the remote sensing datasets, which are more aligned with the intended application of the watermarking technique.

It is worth emphasizing that the proposed approach consistently achieves higher PSNR and SSIM scores across all benchmark datasets utilized in this research, demonstrating a clear performance advantage over previously published techniques. This consistent outperformance highlights the robustness and generalizability of the proposed method across diverse image types. This enhancement in both metrics underscores the method’s effectiveness in embedding watermarks with minimal visual degradation, affirming its suitability for high-fidelity digital watermarking scenarios. These results also suggest that the method is adaptable to various image conditions, reinforcing its potential for practical deployment in real-world applications.

PSNR [42] and SSIM [43] serve as indicators of visual similarity between the watermarked output and the original cover image, offering insights into the imperceptibility of the watermarking process. The comparative results provided in the accompanying table reveal that the network architecture proposed in this work achieves superior results when compared to leading existing frameworks. As illustrative examples, the highest PSNR outcomes achieved by the competing methods developed by Ahmadi et al., Jia et al., Huang et al., Madhu et al., Wei et al., and Mahapatra et al. are 35.93 dB, 36.64 dB, 35.87 dB, 37.54 dB, 37.91 dB, and 31.33 dB, respectively, all of which fall short of the performance attained by our method.

Table 4 presents a comprehensive evaluation of the proposed method’s performance across various datasets, showcasing its remarkable effectiveness and versatility. In particular, the method achieves an impressive PSNR value of 40.93 and an SSIM score of 0.9957 when tested on the SIRI-WHU remote sensing image dataset. These results not only emphasize the model’s exceptional ability to preserve visual quality but also underline its capacity to maintain structural coherence in remote sensing images. The high PSNR value signifies that the watermarked images are nearly indistinguishable from the original images, while the SSIM score further affirms that the embedding process causes minimal structural degradation, preserving the essential features of the image.

Moreover, despite being specifically designed with a focus on remote sensing imagery, the proposed model demonstrates remarkable performance when evaluated on several general-purpose datasets, such as COCO, VOC2012, and CelebA, outperforming existing methods in these domains. This finding not only reinforces the model’s exceptional capability in handling remote sensing images but also underscores its extraordinary generalization ability across a wide range of image types and tasks. The fact that the model excels in these general-purpose datasets, which are traditionally used in more conventional computer vision tasks, illustrates its robustness and versatility. It is a strong indication that the model is not limited to a specific application area but is also highly adaptable to different types of visual data, maintaining a high level of performance and quality regardless of the dataset’s nature. This generalization ability emphasizes that the model can effectively handle images from diverse domains without compromising the accuracy, visual quality, or structural integrity, ensuring consistent results. Such adaptability is crucial for practical applications, where datasets often come from various sources with differing characteristics, yet the model still manages to meet the rigorous demands of both remote sensing and general-purpose tasks with the same level of excellence.

To deepen the understanding of the relationship between the two key quality metrics, PSNR and SSIM, Figure 7 provides a visual representation of their joint distribution. This figure offers a detailed view of the correlation between these metrics, helping to elucidate the interplay between visual imperceptibility and structural integrity in watermarked images. The joint distribution depicted in Figure 7 reveals denser red regions, which represent areas where particular combinations of PSNR and SSIM values occur more frequently. These areas are indicative of the regions where the watermarked images perform exceptionally well, preserving both the quality and the structure of the original host images.

The PSNR values generally fall within the range of 35 to 44, with higher PSNR values suggesting that the watermarked images closely resemble the original host images, showing little to no visible distortion. The SSIM values, on the other hand, typically range from 0.94 to 1.0, with values approaching 1 signifying that the watermark embedding process has had minimal impact on the structural features of the image. This range of SSIM values further illustrates the model’s ability to retain the structural integrity of the watermarked images, ensuring that the watermark does not distort critical features like edges, textures, or shapes.

The noticeable clustering of PSNR and SSIM values within these ranges reinforces the conclusion that the proposed method strikes an optimal balance between embedding a strong watermark and preserving the quality and structural fidelity of the watermarked images. This consistent performance across both metrics demonstrates the model’s robustness and effectiveness in achieving imperceptible watermarks while maintaining high levels of image quality and structural coherence. As a result, this method stands out as a highly effective solution for watermarking in remote sensing images, offering a promising approach for safeguarding the integrity and confidentiality of sensitive remote sensing data while ensuring minimal visual and structural degradation.

4.2. Robustness

This section examines the robustness [44] of the proposed watermarking method against various image distortions, specifically assessing its resilience to different types of attacks. Watermark accuracy is quantified using the normalized correlation (NC) metric. When compared to leading methods in the field, our approach demonstrates a clear advantage in terms of robustness. Experiments across multiple datasets show its strong performance despite a range of noise and manipulation types.

To further assess the model’s effectiveness, we calculate watermark embedding accuracy (BA) for test images. The results are compared with the approaches by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39], considering different noise levels, as shown in Figure 8. In panels (a)–(e), the BA of our model consistently outperforms others, particularly in scenarios involving Gaussian blur, Resize, and JPEG compression. This emphasizes the superior robustness of our method to various distortions. While other models, such as those by Mahapatra et al., Jia et al., and Madhu et al., produce higher image quality, our model achieves better BA, underscoring the importance of the noise sub-network in defending against attacks. Notably, Mahapatra et al.’s model exhibits a BA below 75% in certain conditions, such as with Gaussian blur (δ ≥ 1.25) and JPEG compression (Q ≤ 90), highlighting its limited capacity to protect against image distortions.

To further evaluate robustness, the impact of multiple noise types combined is tested, as illustrated in Figure 9a–d. The findings demonstrate that our model consistently outperforms the other methods in terms of BA, especially under more complex distortion conditions. This highlights the strength of our model not only in handling individual noise types but also in managing more intricate scenarios with multiple distortions. In conclusion, the results confirm that our approach excels in both isolated and combined noise environments, reinforcing its effectiveness and resilience.

Figure 10 showcases the resilience of our method against a range of image processing attacks. We tested the system under 20 different distortions, including filtering, geometric changes, histogram equalization, and varying levels of additive noise. In each case, the watermark was successfully retrieved using the proposed extraction technique. The results demonstrate that the NC values remain above 0.85 in all tested scenarios, with most cases exceeding 0.90, reflecting the robustness of the method. As anticipated, the NC values gradually decrease with higher noise intensities, especially under strong Gaussian or salt-and-pepper noise.

Table 5 provides a detailed comparison of the robustness of the proposed approach against recent state-of-the-art methods by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39] using normalized correlation (NC) scores. The results demonstrate that the proposed method consistently performs well across a variety of image processing attacks, such as Gaussian noise, salt-and-pepper noise, speckle noise, histogram equalization, sharpening, and rotation distortions. Particularly, when subjected to median filtering, Gaussian low-pass filtering, and motion blur, the method achieves NC values of 0.9574, 0.9494, and 0.9473, respectively. These results highlight the method’s excellent robustness in preserving watermark integrity under different attack scenarios, showcasing its superior performance over the existing methods.

4.3. Computational Cost Analysis

This section evaluates the computational efficiency of the proposed watermarking method by measuring its execution time. Computational cost plays a vital role in assessing the practicality of watermarking algorithms, as it affects both the watermark embedding and extraction processes. To make this assessment, we compare the average execution time of our method with other advanced techniques, focusing on the time taken for both embedding and extraction. The results highlight the efficiency of our approach, demonstrating its potential for real-time applications.

A further evaluation of the proposed method’s performance was carried out through a computational time comparison, as presented in Table 6. The method’s overall execution time is 0.38 s. Specifically, watermark embedding takes around 0.014 s, while the extraction process requires approximately 0.023 s. These times are significantly shorter than those of comparable methods.

To further elaborate on the computational advantages shown in Table 4, we delve into the design and complexity of our approach. The SIR-DCGAN framework utilizes a DCGAN architecture, which is renowned for its efficient convolutional structure and quick training convergence. This lightweight design inherently minimizes computational demands, offering a clear advantage over heavier CNNs or transformer-based watermarking systems.

Additionally, two custom-designed modules, namely the IR-FFM (Information-Retaining feature fusion module) and the SE-AM (squeeze-and-excitation attention mechanism), are carefully optimized for remote sensing scenarios. Unlike traditional attention and fusion blocks that rely on dense tensor operations or global attention across all pixels, IR-FFM adopts multi-scale kernel fusion combined with residual structures to extract hierarchical features more efficiently. Similarly, SE-AM decouples spatial and channel-wise attention and concentrates embedding strength into visually insensitive regions, minimizing redundant computations.

Moreover, our training configuration (learning rate = 0.0015, β₁ = 0.9, and β₂ = 0.999) enables rapid convergence with only 1000 epochs and a moderate batch size of 32, avoiding prolonged training cycles. Together with a low overall parameter count (approximately 1.42 million trainable parameters), these design choices significantly reduce FLOPs during both the embedding and extraction phases. As a result, our model achieves fast inference with an average execution time of 0.37 s, including only 0.014 s for embedding and 0.023 s for extraction, which is markedly faster than other state-of-the-art methods.

4.3.1. Evaluation Conditions

All execution time comparisons were conducted under the same hardware and software environment to ensure fairness and consistency. Specifically, all methods were run on an Intel Core i7-14700kf CPU with 32 GB of RAM and an NVIDIA RTX 4090 GPU, under Windows 11 and Python 3.10 with CUDA 11.7 and PyTorch 2.0. Execution time was measured using the same Python-based timing functions to avoid discrepancies caused by framework differences. When execution times were not available from the source code, we re-implemented baseline methods based on their original papers and ensured consistent inference configurations. For the comparison with other methods, we ensured that all algorithms were executed under similar conditions by using the same software libraries and hardware specifications to eliminate any potential bias caused by system configurations.

In terms of algorithm implementation, we used Python with TensorFlow for our method, while the comparison methods (Huang et al., 2021 [21]; Samee et al., 2020 [40]; etc.) were implemented using OpenCV (https://opencv.org/) and Matlab (https://www.mathworks.com/products/matlab.html), which may have different levels of optimization for speed. These discrepancies in tools and libraries could potentially influence the execution times. However, we believe that our method’s superior performance stems primarily from the architectural optimizations discussed earlier, such as the use of the lightweight DCGAN backbone and the IR-FFM and SE-AM modules.

In addition to the above empirical runtime observations, we provide a theoretical analysis of architectural efficiency to further support the claimed computational advantages.

To this end, we compare the number of trainable parameters and the estimated number of floating-point operations (FLOPs) between our SIR-DCGAN framework and three representative deep learning-based watermarking methods: Huang et al. [21], Samee et al. [45], and Yu et al. [46]. These works represent commonly adopted encoder–decoder or GAN-based watermarking schemes. As shown in Table 7, our model uses fewer parameters and achieves significantly lower FLOPs, reflecting the efficiency of the lightweight DCGAN backbone and our task-specific attention (SE-AM) and fusion (IR-FFM) modules.

Furthermore, to provide a theoretical foundation for the observed performance gap, we include a time complexity analysis using the Big-O notation. For traditional hybrid watermarking algorithms such as DWT-SVD, the embedding and extraction procedures often rely on wavelet transforms and matrix decomposition, resulting in a complexity of approximately O(n² log n). Deep learning-based methods typically fall into O(n³) time due to their multi-layer convolutional computations over n × n images.

In contrast, the SIR-DCGAN framework benefits from a more compact encoder–decoder design with limited convolutional depth and selective activation via the SE-AM module. This enables an effective inference complexity of O(n²) in practice. Such architectural streamlining reduces memory and compute overhead while maintaining high accuracy, making our model particularly well suited for real-time remote sensing applications where computational resources may be constrained.

These findings, together with the empirical measurements presented earlier, validate that the proposed model achieves both high efficiency and competitive performance from theoretical and practical perspectives.

4.3.2. Ablation Study on the IR-FFM and SE-AM Modules

To further evaluate the individual contributions of the IR-FFM and SE-AM modules in enhancing watermark robustness, we conducted an ablation study by progressively removing each component from the full SIR-DCGAN architecture. Specifically, we compare four model variants: (i) a version without IR-FFM, (ii) a version without SE-AM, (iii) a plain DCGAN backbone without either component, and (iv) the full SIR-DCGAN framework.

We report the watermarking performance of each variant on the AID dataset (256 × 256) in Table 8, using PSNR, SSIM, and NC as evaluation metrics. The results show that removing either component degrades both visual quality and extraction accuracy, while the full model achieves the best performance across all metrics. This demonstrates the effectiveness and necessity of both modules.

To further validate the role of each module under distortion, we evaluate the NC values of these variants under JPEG compression (quality = 30). As shown in Figure 11, the full model exhibits significantly higher robustness against lossy compression, while variants lacking either or both modules suffer more pronounced degradation. This indicates that both IR-FFM and SE-AM not only improve embedding quality but also contribute to the overall resilience of the watermarking process.

4.4. Validation of Commonality of Multiple Remote Sensing Datasets

To further demonstrate the robustness and generalization ability of the proposed SIR-DCGAN watermarking framework, we extend our experiments to additional remote sensing datasets and image resolutions, which is in line with reviewer suggestions. Specifically, we conduct cross-dataset validation by testing two new publicly available remote sensing datasets, namely AID (Aerial Image Dataset) and NWPU-RESISC45 (Northwestern Polytechnical University Remote Sensing Image Scene Classification). Moreover, we evaluate the model’s performance at a higher resolution of 256 × 256 pixels to confirm its scalability and adaptability to larger image sizes, which are common in real-world remote sensing applications.

In addition to the originally used UC Merced Land Use Dataset, we select the following datasets for this experiment: AID, which contains 10,000 aerial scene images across 30 classes with spatial resolutions ranging from 0.5 to 8 m, and NWPU-RESISC45, which includes 31,500 remote sensing images categorized into 45 scene classes. From each dataset, we randomly select 1000 images for training and 200 for testing. All images are resized to 256 × 256 pixels. For consistency, the same watermark embedding and extraction pipeline is used without any retraining or parameter adjustment.

We employ the same evaluation metrics as previous experiments: PSNR (peak signal-to-noise ratio) to assess watermark invisibility, SSIM (structural similarity index) to measure structural fidelity, and NC (normalized correlation) to evaluate watermark extraction accuracy.

Table 9 reports the quantitative results across all datasets. It is evident that the proposed method maintains high performance across the datasets and resolutions. Even with larger image sizes and different distributions, SIR-DCGAN achieves only a marginal decrease in PSNR and SSIM compared to the baseline dataset while maintaining an NC above 0.996. This demonstrates the method’s strong ability to adapt to diverse remote sensing image characteristics without compromising performance.

To further assess the performance of our method in real-world applications, we conduct a comparative analysis using the AID and NWPU-RESISC45 datasets, both at a resolution of 256 × 256. The comparison involves traditional DWT-SVD [47], a recent deep learning-based U-Net model [48], and a cutting-edge ARWGAN approach [21]. The results, as presented in Table 10, show that SIR-DCGAN consistently outperforms the other techniques, yielding the highest PSNR, SSIM, and NC values across both datasets. These findings highlight the method’s superior watermark invisibility, robustness against a variety of image distortions, and resilience to degradation.

Overall, this extended evaluation confirms that our model is not overfitted to the UC Merced dataset. Its performance remains stable and reliable across different remote sensing datasets, image resolutions, and distortion types. These findings validate the general applicability and robustness of our approach in real-world remote sensing image protection scenarios.

5. Conclusions

This paper addresses the issue of remote sensing images being susceptible to tampering, attacks, or distortions during transmission and storage. We propose a deep convolutional generative adversarial network (DCGAN)-based attention-guided feature fusion watermarking method for remote sensing images (SIR-DCGGAN). A series of experiments were conducted to validate the effectiveness of this method, with the following results:

First, in terms of invisibility, the proposed approach outperforms existing methods, as evidenced by evaluations based on PSNR and SSIM. On the SIRI-WHU remote sensing image dataset, the method attains a PSNR score of 40.93 and an SSIM score of 0.9957, reflecting its exceptional ability to maintain image quality after embedding the watermark. These findings underscore the suitability and practicality of SIR-DCGAN for use in remote sensing image applications.

Second, in terms of robustness, the method demonstrates excellent resistance to interference, as evidenced by watermark embedding accuracy (BA) and normalized cross-correlation (NC) values. The method outperforms existing approaches, particularly in scenarios involving Gaussian blur and JPEG compression, where its BA is significantly higher. This validates the method’s robust protection of watermarks in complex environments.

Finally, in terms of computational cost, the method achieves a low execution time, with an average execution time of 0.37 s, making it highly efficient for practical applications. Compared to other similar approaches, both the watermark embedding and extraction times are notably reduced, further enhancing the method’s practicality.

Although the proposed method shows impressive performance in securing remote sensing images, it does have limitations, such as being a non-blind watermarking method that requires the host image for extraction. Future work will focus on optimizing the model to support a broader range of watermark types and exploring the potential of blind watermarking techniques for securing remote sensing images.

We also recognize the ethical implications of digital watermarking technologies when applied in sensitive domains such as remote sensing. While our proposed SIR-DCGAN framework is designed solely for lawful and academic purposes, including copyright protection, provenance verification, and secure information embedding, we emphasize that responsible deployment is essential. To mitigate the risks of misuse, we discourage application in contexts involving deceptive image manipulation or non-consensual watermark embedding. In future work, we also plan to explore reversible watermarking and access-controlled watermark operations to enhance transparency, traceability, and user accountability.

In addition to the strong empirical performance in terms of runtime, we attribute the significant improvement in computational efficiency to several key architectural optimizations. These include the use of a lightweight DCGAN-based backbone that reduces convolutional depth without compromising feature expressiveness, as well as the incorporation of the IR-FFM feature fusion module and the SE-AM attention mechanism. Together, these components streamline the embedding and extraction processes by minimizing redundant computations and targeting perceptually insensitive regions, resulting in both faster inference and reduced resource consumption.

In summary, SIR-DCGGAN provides an efficient and robust solution for the security of remote sensing images. It offers promising applications in remote sensing data copyright protection, content authentication, and secure transmission, laying a solid technological foundation for these fields.

Author Contributions

Conceptualization, S.P., X.Y., M.D. and P.L.; methodology, S.P. and X.Y.; software, S.P.; validation, S.P., X.Y. and P.L.; formal analysis, S.P.; investigation, S.P.; resources, S.P.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, S.P.; visualization, M.D.; supervision, P.L.; project administration, S.P. and X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of the Xinjiang Production and Construction Corps (Bingtuan), under the project titled “Research on Spatial Optimization Methods for Multi-functional Ecological Protection Forests in Southern Xinjiang” (Approval Number: S2022AB6909; Task Order Number: 2023CB008-22). The project was undertaken by Shihezi University from 10 May 2023, to 10 May 2026.

Data Availability Statement

The original data presented in the study are openly available in the following datasets: [Cats and Dogs dataset] accessed at 14 March 2025 [https://www.microsoft.com/en-us/download/details.aspx?id=54765]; [SIRI-WHU Remote Sensing Image dataset] accessed at 14 March 2025 [https://irip-buaa.github.io/posts/SIRI-WHU/] accessed at 14 March 2025 [https://cocodataset.org/]; [VOC2012 dataset] accessed at 14 March 2025 [http://host.robots.ox.ac.uk/pascal/VOC/voc2012/]; [CelebA dataset] accessed at 14 March 2025 [https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html].

Acknowledgments

This research was supported by the Shihezi University under grant number 2023CB008-22. We are grateful for their financial support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AM	Attention Mechanism
BA	Bit Accuracy (Watermark Embedding Accuracy)
NC	Normalized Correlation (Watermark Extraction Correlation)
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity Index Measure
CNN	Convolutional Neural Network
FFM	Feature Fusion Module
GAN	Generative Adversarial Network
DCGAN	Deep Convolutional Generative Adversarial Network
SE-AM	Squeeze-and-Excitation Attention Mechanism
IR-FFM	Information-Retaining Feature Fusion Module
SIR-DCGGAN	Spatial Information Retention Deep Convolutional Generative Adversarial Network

References

Ye, C.; Tan, S.; Wang, J.; Shi, L.; Zuo, Q.; Feng, W. Social Image Security with Encryption and Watermarking in Hybrid Domains. Entropy 2025, 27, 276. [Google Scholar] [CrossRef]
Ferik, B.; Laimeche, L.; Meraoumia, A.; Laouid, A.; AlShaikh, M.; Chait, K.; Hammoudeh, M. An Efficient Semi-blind Watermarking Technique Based on ACM and DWT for Mitigating Integrity Attacks. Arab. J. Sci. Eng. 2025, 1–21. [Google Scholar] [CrossRef]
Kricha, A.; Kricha, Z.; Sakly, A. Robust image watermarking in DCT domain using optimal inter-block difference. Concurr. Comput. Pract. Exp. 2023, 35, e7857. [Google Scholar] [CrossRef]
Hebbache, K.; Aiadi, O.; Khaldi, B.; Benziane, A. Blind Medical Image Watermarking Based on LBP–DWT for Telemedicine Applications. Circuits Syst. Signal Process. 2025, 1–26. [Google Scholar] [CrossRef]
Alshoura, W.H.; Alawida, M. Secure and flexible image watermarking using IWT, SVD, and chaos models for robustness and imperceptibility. Sci. Rep. 2025, 15, 7231. [Google Scholar] [CrossRef] [PubMed]
Cong, R.; Yang, N.; Li, C.; Fu, H.; Zhao, Y.; Huang, Q.; Kwong, S. Global-and-local collaborative learning for co-salient object detection. IEEE Trans. Cybern. 2022, 53, 1920–1931. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Zhang, Z.; Tan, C.; Liu, X.; Lei, Y. Unsupervised multiple change detection in remote sensing images via generative representation learning network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 5001505. [Google Scholar] [CrossRef]
Haribabu, K.; Subrahmanyam, G.R.K.S.; Mishra, D. A robust digital image watermarking technique using auto encoder based convolutional neural networks. In Proceedings of the 2015 IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (WCI), Chennai, India, 10–11 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
Mun, S.M.; Nam, S.H.; Jang, H.; Kim, D.; Lee, H. Finding robust domain from attacks: A learning framework for blind watermarking. Neurocomputing 2019, 337, 191–202. [Google Scholar] [CrossRef]
Ahmadi, M.; Norouzi, A.; Karimi, N.; Samavi, S.; Emami, A. ReDMark: Framework for residual diffusion watermarking based on deep networks. Expert Syst. Appl. 2020, 146, 113157. [Google Scholar] [CrossRef]
Luo, X.; Zhan, R.; Chang, H.; Yang, F.; Milanfar, P. Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13548–13557. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, J.; Zhang, Y.; Shan, H. DU-GAN: Generative adversarial networks with dual-domain U-Net-based discriminators for low-dose CT denoising. IEEE Trans. Instrum. Meas. 2021, 71, 4500512. [Google Scholar] [CrossRef]
Zhu, J.; Kaplan, R.; Johnson, J.; Li, F.-F. Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 657–672. [Google Scholar] [CrossRef]
Hao, K.; Feng, G.; Zhang, X. Robust image watermarking based on generative adversarial network. China Commun. 2020, 17, 131–140. [Google Scholar] [CrossRef]
Liu, Y.; Guo, M.; Zhang, J.; Zhu, Y.; Xie, X. A novel two-stage separable deep learning framework for practical blind water-marking. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1509–1517. [Google Scholar] [CrossRef]
Xu, S.; Li, Z.; Zhang, Z.; Liu, J. An End-to-End Robust Video Steganography Model Based on a Multi-Scale Neural Network. Electronics 2022, 11, 4102. [Google Scholar] [CrossRef]
Ma, R.; Guo, M.; Hou, Y.; Yang, F.; Li, Y.; Jia, H.; Xie, X. Towards blind watermarking: Combining invertible and non-invertible mechanisms. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1532–1542. [Google Scholar] [CrossRef]
Zhong, X.; Huang, P.C.; Mastorakis, S.; Shih, F.Y. An automated and robust image watermarking scheme based on deep neural networks. IEEE Trans. Multimed. 2020, 23, 1951–1961. [Google Scholar] [CrossRef]
Aberdam, A.; Sulam, J.; Elad, M. Multi-layer sparse coding: The holistic way. SIAM J. Math. Data Sci. 2019, 1, 46–77. [Google Scholar] [CrossRef]
Jia, Z.; Fang, H.; Zhang, W. MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 41–49. [Google Scholar] [CrossRef]
Huang, J.; Luo, T.; Li, L.; Yang, G.; Chen, B. ARWGAN: Attention-guided robust image watermarking model based on GAN. IEEE Trans. Instrum. Meas. 2023, 72, 5018417. [Google Scholar] [CrossRef]
Fernandez, P.; Sablayrolles, A.; Furon, T.; Jégou, H.; Douze, M. Watermarking images in self-supervised latent spaces. In Proceedings of the ICASSP 2022—IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 3054–3058. [Google Scholar] [CrossRef]
Sürücü, S.; Diri, B. A hybrid approach for the detection of images generated with multi generator MS-DCGAN. Eng. Sci. Technol. Int. J. 2025, 63, 101969. [Google Scholar] [CrossRef]
Wu, M.; Jin, X.; Jiang, Q.; Lee, S.-J.; Liang, W.; Lin, G.; Yao, S. Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space. Vis. Comput. 2021, 37, 1707–1729. [Google Scholar] [CrossRef]
Jia, N.; Tian, X.; Gao, W.; Jiao, L. Deep graph-convolutional generative adversarial network for semi-supervised learning on graphs. Remote Sens. 2023, 15, 3172. [Google Scholar] [CrossRef]
Zhu, H.; Lu, Z.; Zhang, C.; Yang, Y.; Zhu, G.; Zhang, Y.; Liu, H. Remote sensing classification of offshore seaweed aquaculture farms on sample dataset amplification and semantic segmentation model. Remote Sens. 2023, 15, 4423. [Google Scholar] [CrossRef]
Shanmugam, P.; Amali, S.A.M.J. Dual-discriminator conditional generative adversarial network optimized with hybrid manta ray foraging optimization and volcano eruption algorithm for hyperspectral anomaly detection. Expert Syst. Appl. 2024, 238, 122058. [Google Scholar] [CrossRef]
Liu, B.; Song, W.; Zheng, M.; Fu, C.; Chen, J.; Wang, X. Semantically enhanced selective image encryption scheme with parallel computing. Expert Syst. Appl. 2025, 279, 127404. [Google Scholar] [CrossRef]
Zhang, H.; Kone, M.M.K.; Ma, X.-Q.; Zhou, N.-R. Frequency-domain attention-guided adaptive robust watermarking model. J. Frankl. Inst. 2025, 362, 107511. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. Proc. AAAI Conf. Artif. Intell. 2017, 31, 1. [Google Scholar] [CrossRef]
Elson, J.; Douceur, J.R.; Howell, J.; Saul, J. Asirra: A CAPTCHA that exploits interest-aligned manual image categorization. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS), Alexandria, VA, USA, 28–31 October 2007; pp. 366–374. [Google Scholar] [CrossRef]
Yuezhong, C.; Jiaqing, W.; Heng, L. Self-Attention Multilayer Feature Fusion Based on Long Connection. Adv. Multimed. 2022, 2022, 9973814. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar] [CrossRef]
Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Large-scale CelebFaces Attributes (CelebA) Dataset. Available online: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html (accessed on 15 August 2018).
Larson, L.R.; Barger, B.; Ogletree, S.; Torquati, J.; Rosenberg, S.; Gaither, C.J.; Bartz, J.M.; Gardner, A.; Moody, E.; Schutte, A. Gray space and green space proximity associated with higher anxiety in youth with autism. Health Place 2018, 53, 94–102. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Amrit, P.; Singh, A.K.; Singh, M.P.; Husain, A.; Khan, R.A. EmbedR-Net: Using CNN to embed mark with recovery through deep convolutional GAN for secure e-health systems. IEEE Trans. Consum. Electron. 2023, 69, 1017–1022. [Google Scholar] [CrossRef]
Madhu, B.; Holi, G. CNN approach for medical image authentication. Indian J. Sci. Technol. 2021, 14, 351–360. [Google Scholar] [CrossRef]
Wei, Q.; Wang, H.; Zhang, G. A robust image watermarking approach using cycle variational autoencoder. Secur. Commun. Netw. 2020, 2020, 8869096. [Google Scholar] [CrossRef]
Mahapatra, D.; Amrit, P.; Singh, O.P.; Singh, A.K.; Agrawal, A.K. Autoencoder-Convolutional Neural Network-Based Embedding and Extraction Model for Image Watermarking. J. Electron. Imaging 2022, 32, 021604. [Google Scholar] [CrossRef]
Huynh-Thu, Q.; Ghanbari, M. The Accuracy of PSNR in Predicting Video Quality for Different Video Scenes and Frame Rates. Telecommun. Syst. 2012, 49, 35–48. [Google Scholar] [CrossRef]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
Weisberg, M. Robustness Analysis. Philos. Sci. 2006, 73, 730–742. [Google Scholar] [CrossRef]
Samee, M.K.; Gotze, J. Increased robustness and security of digital watermarking using DS-CDMA. In Proceedings of the 2007 IEEE International Symposium on Signal Processing and Information Technology, Giza, Egypt, 15–18 December 2007; pp. 185–189. [Google Scholar] [CrossRef]
Yu, J.J.; Wang, F.L.; Zhao, L.T. A low power and complexity watermarking algorithm in DS-CDMA communication. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; Volume 9, pp. 547–551. [Google Scholar] [CrossRef]
Bhandari, A.K.; Soni, V.; Kumar, A.; Singh, G. Cuckoo search algorithm based satellite image contrast and brightness enhancement using DWT–SVD. ISA Trans. 2014, 53, 1286–1296. [Google Scholar] [CrossRef]
Xiang, Y.; Nurmemet, I.; Lv, X.; Yu, X.; Gu, A.; Aihaiti, A.; Li, S. Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data. Land 2025, 14, 649. [Google Scholar] [CrossRef]

Figure 1. Architecture of SIR-DCGGAN.

Figure 2. Overview of the encoder architecture, with x and y indicating input and output sizes, respectively.

Figure 3. Structure of the decoder.

Figure 4. Structure of the discriminator.

Figure 5. Watermark embedding illustrations demonstrating the subjective invisibility of SIR-DCGGAN. (a) Original image; (b) watermark; (c) attention CAM visualization; (d) watermarked image.

Figure 6. Illustrations of different noise attacks. Watermarked image

I_{e n}

after watermark embedding into the host image under joint attacks from multiple noise layers; noisy image

I_{c o}

after the watermark image

I_{e n}

undergoes noise layer attacks; difference image

| I_{e n} - I_{c o} |

, with grayscale space appropriately magnified.

Figure 6. Illustrations of different noise attacks. Watermarked image

I_{e n}

after watermark embedding into the host image under joint attacks from multiple noise layers; noisy image

I_{c o}

after the watermark image

I_{e n}

undergoes noise layer attacks; difference image

| I_{e n} - I_{c o} |

, with grayscale space appropriately magnified.

Figure 7. Joint PSNR-SSIM distribution. The high-density red regions indicate a strong correlation between PSNR and SSIM, demonstrating that the watermarked images maintain high invisibility and quality. PSNR values range from 35 to 44, and SSIM values are mostly between 0.94 and 1.0, indicating strong structural similarity with the host image.

Figure 8. Watermarking accuracy (BA) under different noise types: (a) Gaussian blur, (b) Resize, (c) JPEG compression, (d) salt-and-pepper, and (e) speckle. The proposed model outperforms existing methods, especially in Gaussian blur, Resize, and JPEG compression, showing superior robustness, as demonstrated by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].

Figure 9. Robustness of the proposed model under combined noise with different noise intensities. (a) JPEG(50) + cropping, (b) JPEG(50) + deletion, (c) JPEG(50) + Gaussian blur, and (d) JPEG(50) + Resize. The results demonstrate that the proposed model outperforms existing methods in most combined noise scenarios, exhibiting robust performance even under complex distortion conditions as shown by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].

Figure 10. The normalized correlation (NC) scores of the proposed watermarking approach across 20 distinct image processing attacks. To improve clarity, the results are arranged in ascending order of NC. The method shows remarkable robustness, maintaining NC values above 0.85 in all instances, with the majority of cases exceeding 0.90, even as distortion types and intensities vary.

Figure 11. NC under a JPEG attack for different architectural variants.

Table 1. Generator architecture of the proposed SIR-DCGAN model.

Layer	Type	Kernel/Stride/Padding	Input Size	Output Size	Activation	Notes
Input z	Latent vector	-	(100,)	-	-	Random noise input
Dense + Reshape	Dense + Reshape	-	100	4 × 4 × 512	ReLU	Fully connected, reshape to feature map
Deconv1	ConvTranspose2D	4 × 4/2/1	4 × 4 × 512	8 × 8 × 256	ReLU
IR-FFM Module	Custom Block	Multi-scale	8 × 8 × 256	8 × 8 × 256	ReLU	Proposed feature fusion module
Deconv2	ConvTranspose2D	4 × 4/2/1	8 × 8 × 256	16 × 16 × 128	ReLU
SE-AM Module	Attention Block	-	16 × 16 × 128	16 × 16 × 128	Sigmoid	Proposed attention mechanism
Deconv3	ConvTranspose2D	4 × 4/2/1	16 × 16 × 128	32 × 32 × 64	ReLU
Deconv4	ConvTranspose2D	4 × 4/2/1	32 × 32 × 64	64 × 64 × 3	Tanh	Final output image (RGB, normalized)

Table 2. Discriminator architecture of the proposed SIR-DCGAN model.

Layer	Type	Kernel/Stride/Padding	Input Size	Output Size	Activation	Notes
Input	Image	-	64 × 64 × 3	-	-	Watermarked or original image
Conv1	Conv2D	4 × 4/2/1	64 × 64 × 3	32 × 32 × 64	LeakyReLU (0.2)	Basic feature extraction
Conv2	Conv2D	4 × 4/2/1	32 × 32 × 64	16 × 16 × 128	LeakyReLU (0.2)	BatchNorm included
Conv3	Conv2D	4 × 4/2/1	16 × 16 × 128	8 × 8 × 256	LeakyReLU (0.2)	BatchNorm included
Conv4	Conv2D	4 × 4/2/1	8 × 8 × 256	4 × 4 × 512	LeakyReLU (0.2)	BatchNorm included
Flatten + Output	Dense (Linear)	-	4 × 4 × 512	1	Sigmoid	Binary real/fake classification output

Table 3. Hyperparameters used in network training.

Parameter Name	Parameter Value	Description
Optimizer	ADAM	ADAM is an adaptive learning rate optimization algorithm that combines the advantages of both AdaGrad and RMSProp. It adjusts the learning rate based on first-order momentum and second-order acceleration.
Learning Rate	0.0015	The learning rate controls how much to change the model in response to the estimated error each time the model weights are updated.
Beta 1	0.9	Beta 1 controls the momentum of the gradient in the ADAM optimizer by adjusting the rate of decay for the first moment estimate.
Beta 2	0.999	Similarly, Beta 2 regulates the decay rate for the second moment estimate, contributing to the stability of the learning process.
Loss Function	MSE and MAE	The MSE (mean squared error) measures the average squared difference between predicted and actual values. The MAE (mean absolute error) measures the average absolute difference between predicted and actual values.
Training Epochs	1000	The batch size refers to the number of training samples processed in each forward and backward pass during training.
Batch size	32	The number of training examples utilized in one forward/backward pass through the network during training.

Table 4. Comparison of PSNR and SSIM metrics between the proposed watermarking method and current leading techniques.

Method	Dataset		Existing Methods		Ours
Method	Host Image	Watermark Image	PSNR	PSNR	PSNR	SSIM
Ahmadi et al. [10]	COCO	Random	35.93	0.9660	39.43	0.9911
Jia et al. [20]	VOC2012	Binary	36.64	0.8938	39.71	0.9897
Huang et al. [21]	SIRI-WHU	Random	35.87	0.9688	40.93	0.9957
Madhu et al. [39]	SIRI-WHU	Random	37.54	0.9807	39.99	0.9958
Wei et al. [40]	CelebA	CIFAR-10	37.91	0.979	38.99	0.9819
Mahapatra et al. [41]	Cats and Dogs	Random	31.33	0.9941	38.89	0.9881

Table 5. Comparison of NC robustness scores.

Attack	NC Value
Attack	Mahapatra et al. [39]	Jia et al. [20]	Madhu et al. [37]	Ours
Salt and Pepper (0.001)	0.9896	—	0.9353	0.9993
Speckle (0.001)	0.9918	—	—	0.990
Motion Blur	—	0.9480	—	0.9473
Histogram Equalization	—	0.9250	0.9830	0.9903
Sharpening	—	0.9607	—	0.9891
Rotation (45)	0.3895	0.7657	—	0.9387
Median filter (3 × 3)	0.9877	0.979	0.9176	0.9574
Gaussian Noise (0.001)	0.9866	—	0.9328	0.9987
Gaussian Low-Pass Filter (3 × 3)	—	0.951	—	0.9494

Table 6. Comparison of execution time.

Scheme	Execution Time
Huang et al. [21]	4.3
Samee et al. [45]	16.1
Yu et al. [46]	4.5
Ours	0.38

Table 7. Comparison of parameter count and FLOPs across different watermarking methods.

Method	Parameters (Millions)	FLOPs (GigaOps)
Huang et al. [21]	5.71	21.38
Samee et al. [45]	6.12	23.94
Yu et al. [46]	7.05	26.78
SIR-DCGAN (Ours)	4.21	12.47

Table 8. Ablation study of the IR-FFM and SE-AM modules on the AID dataset (256 × 256).

Model Variant	PSNR (dB)	SSIM	NC
w/o IR-FFM	39.58	0.963	0.9814
w/o SE-AM	39.81	0.966	0.9841
w/o both (baseline DCGAN)	39.02	0.959	0.9753
Full SIR-DCGAN	40.63	0.973	0.9967

Table 9. Cross-dataset and high-resolution generalization results of SIR-DCGAN.

Dataset	Image Size	PSNR (dB)	SSIM	NC
UC Merced (baseline)	128 × 128	41.72	0.978	0.9982
AID	256 × 256	40.63	0.973	0.9967
NWPU-RESISC45	256 × 256	40.81	0.975	0.9973

Table 10. Comparison with other methods on the AID and NWPU-RESISC45 datasets (256 × 256).

Method	Dataset	PSNR (dB)	SSIM	NC
DWT-SVD [47]	AID	38.12	0.942	0.9736
U-Net [48]	AID	39.64	0.954	0.9815
ARWGAN [21]	AID	40.21	0.961	0.9887
SIR-DCGAN	AID	40.63	0.973	0.9967
DWT-SVD [47]	NWPU-RESISC45	38.41	0.945	0.9758
U-Net [48]	NWPU-RESISC45	39.73	0.958	0.9832
ARWGAN [21]	NWPU-RESISC45	40.32	0.964	0.9902
SIR-DCGAN	NWPU-RESISC45	40.81	0.975	0.9973

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, S.; Yin, X.; Ding, M.; Liu, P. SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics 2025, 14, 1853. https://doi.org/10.3390/electronics14091853

AMA Style

Pan S, Yin X, Ding M, Liu P. SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics. 2025; 14(9):1853. https://doi.org/10.3390/electronics14091853

Chicago/Turabian Style

Pan, Shaoliang, Xiaojun Yin, Mingrui Ding, and Pengshuai Liu. 2025. "SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks" Electronics 14, no. 9: 1853. https://doi.org/10.3390/electronics14091853

APA Style

Pan, S., Yin, X., Ding, M., & Liu, P. (2025). SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics, 14(9), 1853. https://doi.org/10.3390/electronics14091853

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks

Abstract

1. Introduction

2. Methods

2.1. Encoder

2.2. Noise Network

2.3. Decoder

2.4. Discriminator

3. Experiments

4. Results and Discussion

4.1. Invisibility

4.2. Robustness

4.3. Computational Cost Analysis

4.3.1. Evaluation Conditions

4.3.2. Ablation Study on the IR-FFM and SE-AM Modules

4.4. Validation of Commonality of Multiple Remote Sensing Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI